]> git.ipfire.org Git - thirdparty/mdadm.git/log
thirdparty/mdadm.git
10 months agomdadm: add xmalloc.h
Mariusz Tkaczyk [Wed, 25 Sep 2024 11:16:10 +0000 (13:16 +0200)] 
mdadm: add xmalloc.h

Move memory declaration helpers outside mdadm.h. They seems to be
useful so keep them but include separatelly. Rework them to not reffer
to Name[] declared internally in mdadm/mdmon.

This is first step to start decomplexing mdadm.h.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agoMdmonitor: Fix startup with missing directory
Anna Sztukowska [Tue, 3 Sep 2024 11:01:04 +0000 (13:01 +0200)] 
Mdmonitor: Fix startup with missing directory

Commit 0a07dea8d3b78 ("Mdmonitor: Refactor check_one_sharer() for
better error handling") introduced an issue, if directory /run/mdadm
is missing, monitor fails to start. Move the directory creation
earlier to ensure it is always created.

Signed-off-by: Anna Sztukowska <anna.sztukowska@intel.com>
10 months agosysfs: add function for writing to sysfs fd
Mariusz Tkaczyk [Tue, 24 Sep 2024 13:53:18 +0000 (15:53 +0200)] 
sysfs: add function for writing to sysfs fd

Proposed function sysfs_wrte_descriptor() unifies error handling for
write() done to sysfs files. Main purpose is to use it with MD sysfs
file but it can be used elsewhere.

No functional changes.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agoIncremental: Rename IncrementalRemove
Mariusz Tkaczyk [Mon, 23 Sep 2024 12:15:31 +0000 (14:15 +0200)] 
Incremental: Rename IncrementalRemove

Rename it to Incremental_remove for better readability.
No functional changes.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agoCI: do not install unnecessary packages
Kinga Stefaniuk [Wed, 25 Sep 2024 12:30:23 +0000 (14:30 +0200)] 
CI: do not install unnecessary packages

Updating all of the packages every time is not needed and costs a lot of
resources. Install only necessary packages and their dependencies.

Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
10 months agoRemove INSTALL and dev/null
Mariusz Tkaczyk [Mon, 23 Sep 2024 09:12:53 +0000 (11:12 +0200)] 
Remove INSTALL and dev/null

INSTALL is not needed because it added to README.md
dev/null was created accidentally.

Remove them.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agomdadm/Manage: record errno
Xiao Ni [Wed, 11 Sep 2024 08:54:32 +0000 (16:54 +0800)] 
mdadm/Manage: record errno

Sometimes it reports:
mdadm: failed to stop array /dev/md0: Success
It's the reason the errno is reset. So record errno during the loop.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agomdadm/tests: remove 09imsm-assemble.broken
Xiao Ni [Wed, 11 Sep 2024 08:54:31 +0000 (16:54 +0800)] 
mdadm/tests: remove 09imsm-assemble.broken

09imsm-assemble can run successfully.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agomdadm/tests: 07testreshape5 fix
Xiao Ni [Wed, 11 Sep 2024 08:54:30 +0000 (16:54 +0800)] 
mdadm/tests: 07testreshape5 fix

Init dir to avoid test failure.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agomdadm/tests: Remove 07reshape5intr.broken
Xiao Ni [Wed, 11 Sep 2024 08:54:29 +0000 (16:54 +0800)] 
mdadm/tests: Remove 07reshape5intr.broken

07reshape5intr can run successfully now.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agomdadm/tests: 07changelevels fix
Xiao Ni [Wed, 11 Sep 2024 08:54:28 +0000 (16:54 +0800)] 
mdadm/tests: 07changelevels fix

There are five changes to this case.

1. remove testdev check. It can't work anymore and check if it's a
block device directly.

2. It can't change level and chunk size at the same time

3. Sleep more than 10s before check wait.
The test devices are small. Sometimes it can finish so quickly once
the reshape just starts. mdadm will be stuck before it waits reshape
to start. So the sync speed is limited. And it restores the sync speed
when it waits reshape to finish. It's good for case without backup
file.

It uses systemd service mdadm-grow-continue to monitor reshape
progress when specifying backup file. If reshape finishes so quickly
before it starts monitoring reshape progress, the daemon will be stuck
too. Because reshape_progress is 0 which means the reshape hasn't been
started. So give more time to let service can get right information
from kernel space.

But before getting these information. It needs to suspend array. At
the same time the reshape is running. The kernel reshape daemon will
update metadata 10s. So it needs to limit the sync speed more than 10s
before restoring sync speed. Then systemd service can suspend array
and start monitoring reshape progress.

4. Wait until mdadm-grow-continue service exits
mdadm --wait doesn't wait systemd service. For the case that needs
backup file, systemd service deletes the backup file after reshape
finishes. In this test case, it runs next case when reshape finishes.
And it fails because it can't create backup file because the backup
file exits.

5. Don't reshape from raid5 to raid1. It can't work now.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agomdadm/tests: wait until level changes
Xiao Ni [Wed, 11 Sep 2024 08:54:27 +0000 (16:54 +0800)] 
mdadm/tests: wait until level changes

check wait waits reshape finishes, but it doesn't wait level changes.
The level change happens in a forked child progress. So we need to
search the child progress and monitor it.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agomdadm/Grow: sleep a while after removing disk in impose_level
Xiao Ni [Wed, 11 Sep 2024 08:54:26 +0000 (16:54 +0800)] 
mdadm/Grow: sleep a while after removing disk in impose_level

It needs to remove disks when reshaping from raid456 to raid0. In
kernel space it sets MD_RECOVERY_RUNNING. And it will fail to change
level. So wait sometime to let md thread to clear this flag.

This is found by test case 05r6tor0.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agomdadm/Grow: Can't open raid when running --grow --continue
Xiao Ni [Wed, 11 Sep 2024 08:54:25 +0000 (16:54 +0800)] 
mdadm/Grow: Can't open raid when running --grow --continue

It passes 'array' as devname in Grow_continue. So it fails to
open raid device. Use mdinfo to open raid device.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agomdadm/Grow: Update reshape_progress to need_back after reshape finishes
Xiao Ni [Wed, 11 Sep 2024 08:54:24 +0000 (16:54 +0800)] 
mdadm/Grow: Update reshape_progress to need_back after reshape finishes

It tries to update data offset when kicking off reshape. If it can't
change data offset, it needs to use child_monitor to monitor reshape
progress and do back up job. And it needs to update reshape_progress
to need_back when reshape finishes. If not, it will be in a infinite
loop.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agomdadm/Grow: Update new level when starting reshape
Xiao Ni [Wed, 11 Sep 2024 08:54:23 +0000 (16:54 +0800)] 
mdadm/Grow: Update new level when starting reshape

Reshape needs to specify a backup file when it can't update data offset
of member disks. For this situation, first, it starts reshape and then
it kicks off mdadm-grow-continue service which does backup job and
monitors the reshape process. The service is a new process, so it needs
to read superblock from member disks to get information.

But in the first step, it doesn't update new level in superblock. So
it can't change level after reshape finishes, because the new level is
not right. So records the new level in the first step.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
10 months agomdadm: Add compilation process to README.md
Anna Sztukowska [Tue, 6 Aug 2024 08:44:01 +0000 (10:44 +0200)] 
mdadm: Add compilation process to README.md

Add compilation process and dependencies to README.md.

Signed-off-by: Anna Sztukowska <anna.sztukowska@intel.com>
10 months agoDetail.c: Fix divide_by_zero issue
Anna Sztukowska [Mon, 29 Jul 2024 05:47:39 +0000 (07:47 +0200)] 
Detail.c: Fix divide_by_zero issue

Fix divide_by_zero issue reported by SAST analysis in Detail.c when
calling enough() from util.c. Also add missing spaces for better code
readability.

Signed-off-by: Anna Sztukowska <anna.sztukowska@intel.com>
11 months agoIncremental: support devnode in IncrementalRemove.
Mariusz Tkaczyk [Tue, 25 Jun 2024 10:53:46 +0000 (12:53 +0200)] 
Incremental: support devnode in IncrementalRemove.

There are no reasons to keep this interface different than others.
Allow to use devnode but keep old way for backward compatibility.
Method is added to verify that only devnode or kernel name is used.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
11 months agodlink.h: Fix checkpatch warnings for function args
Anna Sztukowska [Mon, 9 Sep 2024 07:36:47 +0000 (09:36 +0200)] 
dlink.h: Fix checkpatch warnings for function args

Checkpatch issued a warning due to missing function argument names.
Add the names to resolve the warnings.

Signed-off-by: Anna Sztukowska <anna.sztukowska@intel.com>
11 months agoExamine.c: Fix memory leaks in Examine()
Anna Sztukowska [Thu, 8 Aug 2024 15:02:38 +0000 (17:02 +0200)] 
Examine.c: Fix memory leaks in Examine()

Fix memory leaks in Examine() reported by SAST analysis. Implement a
method to traverse and free all the nodes of the doubly linked list.
Replace for loop with while loop in order to improve redability of the
code and free allocated memory correctly.

Signed-off-by: Anna Sztukowska <anna.sztukowska@intel.com>
11 months agoimsm: save checkpoint prior to exit
Mateusz Kusiak [Mon, 2 Sep 2024 16:27:56 +0000 (12:27 -0400)] 
imsm: save checkpoint prior to exit

If reshape (eg. chunksize migration) is gracefully stopped via SIGTERM
the checkpoint is not saved and reshape cannot be resumed due to "data
being present in copy area". This is because UNIT_SRC_NORMAL isn't set
if SIGTERM occurred.

Move SIGTERM handling at the end of the loop to allow saving checkpoint
(and state) so reshapes can be properly resumed.

Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
11 months agomdadm: Increase number limit in md device name to 1024.
Shminderjit Singh [Mon, 26 Aug 2024 10:06:50 +0000 (10:06 +0000)] 
mdadm: Increase number limit in md device name to 1024.

Updated the maximum device number in md device names from 127 to 1024.
The previous limit was causing issues in the automation framework.
This change ensures backward compatibility and allows for future
scalability.

Fixes: 25aa7329141c ("mdadm: numbered names verification")
Signed-off-by: Shminderjit Singh <shminderjit.singh@oracle.com>
11 months agoimsm: add IMSM_OROM_CAPABILITIES_TPV to nvme orom
Mariusz Tkaczyk [Thu, 22 Aug 2024 10:18:06 +0000 (12:18 +0200)] 
imsm: add IMSM_OROM_CAPABILITIES_TPV to nvme orom

Add it to avoid excluding. It has some value for users even if it is
always true for nvme virtual orom.

Rework detail-platform printing code, move printing 3rd party nvmes
to print_imsm_capability (as it should be), but keep it meaningful
only for nvme controllers (NVME and VMD hba types). Pass whole
orom_entry instead of orom there.

Squash code responsible for printing NVME and VMD hbas.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
11 months agoimsm: Remove warning and refactor add_to_super_imsm code
Mariusz Tkaczyk [Thu, 22 Aug 2024 09:55:15 +0000 (11:55 +0200)] 
imsm: Remove warning and refactor add_to_super_imsm code

Intel x8 drives are not supported, remove unnecessary warning and
refactor add_to_super_imsm code.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
11 months agomdadm: Change displaying of devices in --detail
Anna Sztukowska [Wed, 28 Aug 2024 10:04:35 +0000 (12:04 +0200)] 
mdadm: Change displaying of devices in --detail

The counts of active, working, failed and spare devices were not
printed when the number was zero.

Refactor the code to always display the counts of all device types,
regardless of their number. This way, it is more reliable for users.

Signed-off-by: Anna Sztukowska <anna.sztukowska@intel.com>
11 months agoplatform-intel: refactor path_attached_to_hba()
Mateusz Kusiak [Tue, 7 May 2024 16:05:43 +0000 (12:05 -0400)] 
platform-intel: refactor path_attached_to_hba()

dprintf() call in path_attached_to_hba() is too noisy. Remove the call
and refactor the function. Remove obsolete env variables check.

Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
11 months agoimsm: get bus from VMD driver directory
Mariusz Tkaczyk [Thu, 8 Aug 2024 11:07:50 +0000 (13:07 +0200)] 
imsm: get bus from VMD driver directory

Enumeration of VMD child devices is started early, kernel is not waiting
for VMD enumeration to finish. It causes that:
/sys/bus/pci/drivers/vmd/{dev}/domain/device link might be not yet ready.

With PCI gen5 devices we can observe that mdadm is failing to start IMSM
raid arrays because of that. In that case, it needs to find bus path
manually.

Look for bus device in VMD driver directory if realpath() failed with
ENOENT.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
11 months agoimsm: add read OROM form ACPI UEFI tables
Blazej Kucman [Thu, 11 Jul 2024 16:45:41 +0000 (18:45 +0200)] 
imsm: add read OROM form ACPI UEFI tables

OROM - IMSM hardware capabilities

EFI vars depends on userspace, they need to be mounted to be accessible.
Sporadic problems have been observed with availability at an early
assemble stage. It is not possible to fully synchronize EFI vars mounts
with udev rules processing.

For the reason above, read of IMSM OROM from ACPI tables as secondary
option is added. This method will be used for SATA and VMD family
controllers.

ACPI tables are generated by sysfs, earlier in the boot process, before
the stage of RAID assembly. The way of loading OROM via EFI vars is
retained, ACPI tables will be a backup way.

Two paths will be maintained, because IMSM hardware capabilities are
necessary for RAID assembly during booting, so access to them must be
provided.

Signed-off-by: Blazej Kucman <blazej.kucman@intel.com>
11 months agomdadm: sysfs.c fix coverity issues
Nigel Croxon [Thu, 18 Jul 2024 17:05:57 +0000 (13:05 -0400)] 
mdadm: sysfs.c fix coverity issues

Fixing the following coding errors the coverity tools found:

* Event fixed_size_dest: You might overrun the 32-character
fixed-size string "mdi->sys_name" by copying "devnm" without
checking the length

* Event fixed_size_dest: You might overrun the 50-character
fixed-size string "sra->text_version" by copying "buf + 9"
without checking the length.

* Event string_overflow: You might overrun the 32-character
destination string "dev->sys_name" by writing 256 characters
from "de->d_name".

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
11 months agomdadm: util.c fix coverity issues
Nigel Croxon [Wed, 7 Aug 2024 15:33:23 +0000 (11:33 -0400)] 
mdadm: util.c fix coverity issues

Fixing the following coding errors the coverity tools found:

* Event check_return: Calling "open" without checking return value
* Event check_return: Calling "lseek(fd, sector_size, 0)" without
checking return value.
* Event leaked_handle: Handle variable "fd" going out of scope leaks
the handle.
* Event leaked_storage: Variable "dir" going out of scope leaks the
storage it points to.
* Event fixed_size_dest: You might overrun the 32-character fixed-size
string "st->devnm" by copying "_devnm" without checking the length.
* Event fixed_size_dest: You might overrun the 32-character fixed-size
string "container" by copying "dev" without checking the length.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
11 months agomd.4: replace wrong word
Nicolas Roeser [Sun, 4 Aug 2024 12:34:44 +0000 (14:34 +0200)] 
md.4: replace wrong word

There is a wrong word in the md(4) man page, this commit corrects it.

Signed-off-by: Nicolas Roeser <nicolas.roeser@alumni.uni-ulm.de>
12 months agomdstat: fix list detach issues
Mariusz Tkaczyk [Tue, 6 Aug 2024 14:11:18 +0000 (16:11 +0200)] 
mdstat: fix list detach issues

Move ent = ent->next; to while. It was outside the loop so if there
are more than 2 elements and we are looking for 3rd element it causes
infinite loop..

Fix el->next zeroing. It causes segfault in mdstat_free(). Theses
issues were not visible in my testing because I had only 2 MD devices.

Fixes: 4b3644ab4ce6 ("mdstat: Rework mdstat external arrays handling")
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agoGrow_reshape: set only component_size for size grow
Kinga Stefaniuk [Fri, 26 Apr 2024 06:33:00 +0000 (08:33 +0200)] 
Grow_reshape: set only component_size for size grow

Component_size couldn't be set using ioctl when new drive size is big
(e.g. 5TB). Command value is bigger than 32 bits and error is reported
- it is known ioctl limitation. Remove updating array properties using
ioctl, use sysfs instead. Sysfs was introduced in 3.10, so now it is old
enough to be safely used. Array_size in sysfs should be set for every
size change for external metadata, when grow is performed without
errors.

Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
12 months agoCI: add new gcc 14
Kinga Stefaniuk [Wed, 31 Jul 2024 14:24:37 +0000 (16:24 +0200)] 
CI: add new gcc 14

Add new released gcc to compilation test during GH action.
Change runner to Ubuntu 24.04 which supports gcc versions up to 14.
Previously ubuntu-latest was used (22.04) which didn't support gcc 13
and 14. Add verification if correct gcc was installed during test.

Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
12 months agosuper-intel: add define for migr_state
Kinga Stefaniuk [Wed, 31 Jul 2024 13:06:42 +0000 (15:06 +0200)] 
super-intel: add define for migr_state

Represent migr_state with the define, which helps in code readability.
Add new values for Normal and Migration states.

Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
12 months agosuper-intel: fix compilation error
Kinga Stefaniuk [Tue, 6 Aug 2024 09:14:02 +0000 (11:14 +0200)] 
super-intel: fix compilation error

Fix compilation error:

super-intel.c: In function â€˜end_migration’:
super-intel.c:4360:29: error: writing 2 bytes into a region
of size 0 [-Werror=stringop-overflow=]
 4360 |         dev->vol.migr_state = 0;
      |         ~~~~~~~~~~~~~~~~~~~~^~~
cc1: note: destination object is likely at address zero
cc1: all warnings being treated as errors
make: *** [Makefile:232: super-intel.o] Error 1

reported, when GCC 14 is used. Return when dev is NULL, to avoid it.

Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
12 months agosuper-gpt.c: Fix check_return issue in load_gpt()
Anna Sztukowska [Wed, 24 Jul 2024 09:46:57 +0000 (11:46 +0200)] 
super-gpt.c: Fix check_return issue in load_gpt()

Fix check_return issue in load_gpt() reported by SAST analysis in
super-gpt.c.

Signed-off-by: Anna Sztukowska <anna.sztukowska@intel.com>
12 months agopolicy.c: Fix check_return issue in Write_rules()
Anna Sztukowska [Thu, 11 Jul 2024 12:31:57 +0000 (14:31 +0200)] 
policy.c: Fix check_return issue in Write_rules()

Refactor Write_rules() in policy.c to eliminate check_return issue found
by SAST analysis. Create udev rules file directly using rule_name
instead of creating temporary file and renaming it.

Signed-off-by: Anna Sztukowska <anna.sztukowska@intel.com>
12 months agomdadm/super1: fix coverity issue RESOURCE_LEAK
Xiao Ni [Fri, 26 Jul 2024 07:14:16 +0000 (15:14 +0800)] 
mdadm/super1: fix coverity issue RESOURCE_LEAK

Fix resource leak problems in super1.c

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/super1: fix coverity issue EVALUATION_ORDER
Xiao Ni [Fri, 26 Jul 2024 07:14:15 +0000 (15:14 +0800)] 
mdadm/super1: fix coverity issue EVALUATION_ORDER

Fix evaluation order problems in super1.c

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/super1: fix coverity issue DEADCODE
Xiao Ni [Fri, 26 Jul 2024 07:14:14 +0000 (15:14 +0800)] 
mdadm/super1: fix coverity issue DEADCODE

optimal_space is at most 2046. So space can't be larger than UINT16_MAX.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/super1: fix coverity issue CHECKED_RETURN
Xiao Ni [Fri, 26 Jul 2024 07:14:13 +0000 (15:14 +0800)] 
mdadm/super1: fix coverity issue CHECKED_RETURN

It needs to check return value when functions return value.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/super0: fix coverity issue CHECKED_RETURN and EVALUATION_ORDER
Xiao Ni [Fri, 26 Jul 2024 07:14:12 +0000 (15:14 +0800)] 
mdadm/super0: fix coverity issue CHECKED_RETURN and EVALUATION_ORDER

Fix coverity problems in super0. It needs to check return value when
functions return value. And fix EVALUATION_ORDER problems in super0.c

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/mdstat: fix coverity issue CHECKED_RETURN
Xiao Ni [Fri, 26 Jul 2024 07:14:11 +0000 (15:14 +0800)] 
mdadm/mdstat: fix coverity issue CHECKED_RETURN

It needs to check return values when functions return value.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/mdopen: fix coverity issue STRING_OVERFLOW
Xiao Ni [Fri, 26 Jul 2024 07:14:10 +0000 (15:14 +0800)] 
mdadm/mdopen: fix coverity issue STRING_OVERFLOW

Fix string overflow problems in mdopen.c

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/mdopen: fix coverity issue CHECKED_RETURN
Xiao Ni [Fri, 26 Jul 2024 07:14:09 +0000 (15:14 +0800)] 
mdadm/mdopen: fix coverity issue CHECKED_RETURN

It needs to check return values when functions return value.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/mdmon: fix coverity issue RESOURCE_LEAK
Xiao Ni [Fri, 26 Jul 2024 07:14:08 +0000 (15:14 +0800)] 
mdadm/mdmon: fix coverity issue RESOURCE_LEAK

Fix resource leak problem in mdmon.c

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/mdmon: fix coverity issue CHECKED_RETURN
Xiao Ni [Fri, 26 Jul 2024 07:14:07 +0000 (15:14 +0800)] 
mdadm/mdmon: fix coverity issue CHECKED_RETURN

It needs to check return values when functions have return value.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/Incremental: fix coverity issues.
Xiao Ni [Fri, 26 Jul 2024 07:14:06 +0000 (15:14 +0800)] 
mdadm/Incremental: fix coverity issues.

There are two issues PW.PARAMETER_HIDDEN (declaration hides
parameter 'devname') and INTEGER_OVERFLOW.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/Grow: fix coverity issue STRING_OVERFLOW
Xiao Ni [Fri, 26 Jul 2024 07:14:05 +0000 (15:14 +0800)] 
mdadm/Grow: fix coverity issue STRING_OVERFLOW

Fix string overflow problems in Grow.c

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/Grow: fix coverity issue RESOURCE_LEAK
Xiao Ni [Fri, 26 Jul 2024 07:14:04 +0000 (15:14 +0800)] 
mdadm/Grow: fix coverity issue RESOURCE_LEAK

Fix some resource leak problems.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm/Grow: fix coverity issue CHECKED_RETURN
Xiao Ni [Fri, 26 Jul 2024 07:14:03 +0000 (15:14 +0800)] 
mdadm/Grow: fix coverity issue CHECKED_RETURN

It needs to check return value when functions have return value.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agoimsm: refactor chunk size print
Blazej Kucman [Wed, 24 Jul 2024 20:17:42 +0000 (22:17 +0200)] 
imsm: refactor chunk size print

- add imsm_chunk_ops struct for better code readability,
- move chunk size mapping to string into array,
- add function to print supported chunk sizes by IMSM controller.

Signed-off-by: Blazej Kucman <blazej.kucman@intel.com>
12 months agomdadm: msg.c fix coverity issues
Nigel Croxon [Wed, 24 Jul 2024 13:20:28 +0000 (09:20 -0400)] 
mdadm: msg.c fix coverity issues

Fixing the following coding errors the coverity tools found:

* Event check_return: Calling "fcntl(sfd, 4, fl)" without
checking return value. This library function may fail and
return an error code.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
12 months agomdadm: managemon.c fix coverity issues
Nigel Croxon [Wed, 24 Jul 2024 13:04:08 +0000 (09:04 -0400)] 
mdadm: managemon.c fix coverity issues

Fixing the following coding errors the coverity tools found:

* Event check_return: Calling "fcntl(fd, 4, fl)" without checking
return value. This library function may fail and return an error code.

* Event check_after_deref: Null-checking "new" suggests that it may
be null, but it has already been dereferenced on all paths leading
to the check.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
12 months agomdstat: Rework mdstat external arrays handling
Mariusz Tkaczyk [Fri, 5 Jul 2024 08:49:27 +0000 (10:49 +0200)] 
mdstat: Rework mdstat external arrays handling

To avoid repeating mdstat_read() in IncrementalRemove(), new function
mdstat_find_by_member_name() has been proposed. With that,
IncrementalRemove() handles own copy of mdstat content and there is no
need to repeat reading for external stop.

Additionally, It proposed few helper to avoid repeating
mdstat_ent->metadata_version checks across code.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agoreview.yml: fix typo in DEBIAN compiler flag
Kinga Stefaniuk [Thu, 4 Jul 2024 16:41:01 +0000 (18:41 +0200)] 
review.yml: fix typo in DEBIAN compiler flag

Fix typo in -DEBIAN flag in review.yml file.

Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
12 months agoMakefile: add more compiler flags
Kinga Stefaniuk [Thu, 4 Jul 2024 13:01:06 +0000 (15:01 +0200)] 
Makefile: add more compiler flags

It is essential to avoid vulnerabilities in code as much
as possible using safe compilation flags. It is easier if
they are added to the Makefile and applied during compilation.
Add new gcc flags and make them configurable, because they
may not be supported for some compilers.
Set FORTIFY_SOURCE with the highest supported value for platform.

Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
12 months agosuper0: use define for char array in examine_super0
Kinga Stefaniuk [Thu, 4 Jul 2024 12:53:35 +0000 (14:53 +0200)] 
super0: use define for char array in examine_super0

Using nb with 11 length may cause format-truncation errors,
because it was possible to use snprintf with 12 length input
and write it to 11 length output. Added new define and use it
to avoid this error.

Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
12 months agodrive_encryption: Fix ata passthrough12 verify
Blazej Kucman [Tue, 23 Jul 2024 10:45:10 +0000 (12:45 +0200)] 
drive_encryption: Fix ata passthrough12 verify

Based on documentation SCSI Primary Commands - 4 (SPC-4) only first 7 bits
of first byte in sense data are used to store response code. The current
verification uses all 8 bits for comparison of response code.

Incorrect verification may make impossible to use SATA disks with IMSM,
because IMSM requires verification of the encryption state before use.

There was issue in kernel libata [1]. This issue hides bug in mdadm because
last bit was not set.

Example output with affected mdadm:

          Port3 : /dev/sde (BTPR212503EK120LGN)
mdadm: Failed ata passthrough12 ioctl. Device: /dev/sde.
mdadm: Failed to get drive encryption information

The fix is use the first 7 bits of Byte 0, to compare with the expected
values.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/libata/linux.git/commit/?id=38dab832c3f4

Fixes: df38df3052c3 ("Add reading SATA encryption information")
Signed-off-by: Blazej Kucman <blazej.kucman@intel.com>
12 months agoDetail: fix --detail --export for uuid_zero
Kinga Stefaniuk [Tue, 23 Jul 2024 13:38:41 +0000 (15:38 +0200)] 
Detail: fix --detail --export for uuid_zero

Mentioned commit (see Fixes) causes that devices with UUID
equal to uuid_zero was not recognized properly. For few devices
the first one was taken always, and the same information was
printed. It caused regression, when few containers were created,
symlinks were generated only for the first one.

Add checking if uuid is uuid_zero and, if yes, use devname to
differentiate devices.

Fixes: 60c19530dd7c ("Detail: remove duplicated code")
Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
12 months agoCI: fetch all of the changes in repository
Kinga Stefaniuk [Tue, 23 Jul 2024 13:53:29 +0000 (15:53 +0200)] 
CI: fetch all of the changes in repository

GH action is using checkout plugin, which takes fetch-depth
as a parameter to specify number of commits to fetch. Setting it
to 0 to fetch all of the history of all branches and tags.

Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
12 months agomdadm: do not allow leading dot in MD device name
Mariusz Tkaczyk [Mon, 15 Jul 2024 10:29:24 +0000 (12:29 +0200)] 
mdadm: do not allow leading dot in MD device name

Do not allow to use '.' on first place for named MD device.
Having leading dot might be confusing, MD device cannot be hidden.
It also removes possibility to create md device with name '.'.

Additionally, code optimalizations are done.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agomdadm: lib.c fix coverity issues
Nigel Croxon [Tue, 16 Jul 2024 11:20:10 +0000 (07:20 -0400)] 
mdadm: lib.c fix coverity issues

Fixing the following coding errors the coverity tools found:

* Event fixed_size_dest: You might overrun the 32-character fixed-size
string "devnm" by copying "cp + 1" without checking the length.

* Event fixed_size_dest: You might overrun the 32-character fixed-size
string "devnm" by copying "cp" without checking the length.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
12 months agomdadm: Query.c fix coverity issues
Nigel Croxon [Tue, 16 Jul 2024 11:19:34 +0000 (07:19 -0400)] 
mdadm: Query.c fix coverity issues

Fixing the following coding errors the coverity tools found:

* Event leaked_storage: Variable "sra" going out of scope leaks the
storage it points to.

* Event uninit_use_in_call: Using uninitialized value "larray_size" when
calling "human_size_brief".

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
12 months agomdadm: Monitor.c fix coverity issues
Nigel Croxon [Mon, 15 Jul 2024 14:13:46 +0000 (10:13 -0400)] 
mdadm: Monitor.c fix coverity issues

Fixing the following coding errors the coverity tools found:

* Event check_return: Calling "fcntl(fd, 2, 1)" without checking
return value. This library function may fail and return an error code.

* Dereferencing "sl", which is known to be "NULL".

* Event fixed_size_dest: You might overrun the 32-character fixed-size
string "devnm" by copying "tmp" without checking the length.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
12 months agoimsm: add indent for encryption details
Mariusz Tkaczyk [Mon, 15 Jul 2024 10:21:19 +0000 (12:21 +0200)] 
imsm: add indent for encryption details

Improve readability of the output.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
12 months agoManage: fix is_remove_safe()
Mariusz Tkaczyk [Tue, 16 Jul 2024 13:37:34 +0000 (15:37 +0200)] 
Manage: fix is_remove_safe()

Fix for to make --set-faulty working.

Fixes: 1b4b73fd535a ("mdadm: Manage.c fix coverity issues")
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
13 months agomdadm: Manage.c fix coverity issues
Nigel Croxon [Wed, 10 Jul 2024 12:55:08 +0000 (08:55 -0400)] 
mdadm: Manage.c fix coverity issues

Fixing the following coding errors the coverity tools found:

* Event parameter_hidden: declaration hides parameter "dv".
* Event leaked_storage: Variable "mdi" going out of scope leaks the storage
it points to.
* Event overwrite_var: Overwriting "mdi" in "mdi = mdi->devs" leaks the
storage that "mdi" points to.
* Event leaked_handle: Handle variable "lfd" going out of scope leaks
the handle.
* Event leaked_handle: Returning without closing handle "fd" leaks it.
* Event fixed_size_dest: You might overrun the 32-character fixed-sizei
string "devnm" by copying the return value of "fd2devnm" without
checking the length.
* Event fixed_size_dest: You might overrun the 32-character fixed-size
string "nm" by copying "nmp" without checking the length.
* Event fixed_size_dest: You might overrun the 32-character fixed-size
string "devnm" by copying the return value of "fd2devnm" without
checking the length.
* Event assigned_value: Assigning value "-1" to "tfd" here, but that
stored value is overwritten before it can be used.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
13 months agomapfile.c: Fix STRING_OVERFLOW issue
Anna Sztukowska [Wed, 3 Jul 2024 12:11:58 +0000 (14:11 +0200)] 
mapfile.c: Fix STRING_OVERFLOW issue

Fix STRING_OVERFLOW issue found by SAST analysis in map_add() and
map_update() in mapfile.c.

Signed-off-by: Anna Sztukowska <anna.sztukowska@intel.com>
13 months agomdadm/clustermd_tests: adjust test cases to support md module changes
Heming Zhao [Tue, 9 Jul 2024 12:04:52 +0000 (20:04 +0800)] 
mdadm/clustermd_tests: adjust test cases to support md module changes

Since kernel commit db5e653d7c9f ("md: delay choosing sync action to
md_start_sync()") delays the start of the sync action, clustermd
array sync/resync jobs can happen on any leg of the array. This
commit adjusts the test cases to follow the new kernel layer behavior.

Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
13 months agomdadm/clustermd_tests: add some APIs in func.sh to support running the tests without...
Heming Zhao [Tue, 9 Jul 2024 12:04:51 +0000 (20:04 +0800)] 
mdadm/clustermd_tests: add some APIs in func.sh to support running the tests without errors

clustermd_tests/func.sh lacks some APIs to run, this patch makes
clustermd_tests runnable from the test suite.

Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
13 months agomdadm: super-ddf.c fix coverity issues
Nigel Croxon [Tue, 2 Jul 2024 14:11:26 +0000 (10:11 -0400)] 
mdadm: super-ddf.c fix coverity issues

Fixing the following coding errors the coverity tools found:

* Calling "lseek64" without checking return value. This library function may
fail and return an error code.
* Overrunning array "anchor->pad2" of 3 bytes by passing it to a function
which accesses it at byte offset 398 using argument "399UL".
* Event leaked_storage: Variable "sra" going out of scope leaks the storage
it points to.
* Event leaked_storage: Variable "super" going out of scope leaks the storage
it points to.
* Event leaked_handle: Handle variable "dfd" going out of scope leaks the
handle.
* Event leaked_storage: Variable "dl1" going out of scope leaks the storage
it points to
* Event leaked_handle: Handle variable "cfd" going out of scope leaks the
handle.
* Variable "avail" going out of scope leaks the storage it points to.
* Passing unterminated string "super->anchor.revision" to "fprintf", which
expects a null-terminated string.
* You might overrun the 32-character fixed-size string "st->container_devnm"
by copying the return value of "fd2devnm" without checking the length.
* Event fixed_size_dest: You might overrun the 33-character fixed-size string
"dev->name" by copying "(*d).devname" without checking the length.
* Event uninit_use_in_call: Using uninitialized value "info.array.raid_disks"
when calling "getinfo_super_ddf"

V2: clean up validate_geometry_ddf() routine with Mariusz Tkaczyk recommendations.
V3: clean up spaces with Blazej Kucman recommendations.
V4: clean up recommended by Mariusz Tkaczyk.
V5: clean up recommended by Mariusz Tkaczyk.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
13 months agomdadm: Create.c fix coverity issues
Nigel Croxon [Fri, 5 Jul 2024 12:45:32 +0000 (08:45 -0400)] 
mdadm: Create.c fix coverity issues

* Event negative_returns: "fd" is passed to a parameter that cannot be negative. Which
is set to -1 to start.

* Event open_fn: Returning handle opened by "open_dev_excl".
* Event var_assign: Assigning: "container_fd" = handle returned from
"open_dev_excl(st->container_devnm)"
* Event leaked_handle: Handle variable "container_fd" going out of scope leaks the handle

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
13 months agomdadm: Build.c fix coverity issues
Nigel Croxon [Tue, 2 Jul 2024 13:49:13 +0000 (09:49 -0400)] 
mdadm: Build.c fix coverity issues

Event leaked_handle: Handle variable "bitmap_fd" going out of
scope leaks the handle.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
13 months agoconfig.c: Fix memory leak in load_containers()
Anna Sztukowska [Fri, 28 Jun 2024 10:32:16 +0000 (12:32 +0200)] 
config.c: Fix memory leak in load_containers()

Fix memory leak in load_containers() in config.c reported by SAST
analysis.

Signed-off-by: Anna Sztukowska <anna.sztukowska@intel.com>
13 months agoCI: use prepared checkpatch.conf file only for GH actions
Kinga Stefaniuk [Mon, 1 Jul 2024 14:31:32 +0000 (16:31 +0200)] 
CI: use prepared checkpatch.conf file only for GH actions

Configuration file .checkpatch.conf is working properly only with
GH actions, because flags from GH plugin are used there. This file
shall not be placed in main repo directory, because it causes errors
while using checkpatch from Linux. Add step to review.yml to copy
this file before checkpatch action is started.

Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
13 months agomdadm: Fix socket connection failure when mdmon runs in foreground mode.
Shminderjit Singh [Mon, 24 Jun 2024 08:58:51 +0000 (08:58 +0000)] 
mdadm: Fix socket connection failure when mdmon runs in foreground mode.

While creating an IMSM RAID, mdadm will wait for the mdmon main process
to finish if mdmon runs in forking mode. This is because with
"Type=forking" in the mdmon service unit file, "systemctl start service"
will block until the main process of mdmon exits. At that moment, mdmon
has already created the socket, so the subsequent socket connect from
mdadm will succeed.

However, when mdmon runs in foreground mode (without "Type=forking" in
the service unit file), "systemctl start service" will return once the
mdmon process starts. This causes mdadm and mdmon to run in parallel,
which may lead to a socket connection failure since mdmon has not yet
initialized the socket when mdadm tries to connect. If the next
instruction/command is to access this device and try to  write to it, a
permission error will occur since mdmon has not yet set the array to RW
mode.

Signed-off-by: Shminderjit Singh <shminderjit.singh@oracle.com>
13 months agoCI: fix excluded files in checkpatch.conf
Kinga Stefaniuk [Tue, 25 Jun 2024 08:48:33 +0000 (10:48 +0200)] 
CI: fix excluded files in checkpatch.conf

--exclude flag in checkpatch.conf is configured to work on directories
only. When checkpatch.conf contains files, checkpatch scan is not started.
Remove file names and keep only directories which should be excluded.

Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
13 months agomdadm: Assemble.c fix coverity issues
Nigel Croxon [Tue, 25 Jun 2024 11:57:28 +0000 (07:57 -0400)] 
mdadm: Assemble.c fix coverity issues

Fixing the following coding errors the coverity tools found:

* Event dereference: Dereferencing "pre_exist", which is known to be "NULL".
* Event parameter_hidden: Declaration hides parameter "c".
* Event leaked_storage: Variable "pre_exist" going out of scope leaks the
  storage it points to.
* Event leaked_storage: Variable "avail" going out of scope leaks the
  storage it points to.

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
13 months agoRevert "mdadm: Fix socket connection failure when mdmon runs in foreground mode."
Mariusz Tkaczyk [Thu, 20 Jun 2024 13:22:50 +0000 (15:22 +0200)] 
Revert "mdadm: Fix socket connection failure when mdmon runs in foreground mode."

This reverts commit 66a54b266f6c579e5f37b6253820903a55c3346c.

connect_monitor() is called from ping_monitor() but this function is often
used as advice, without verification that mdmon is really working. This
produces hangs in many scenarios.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
13 months agomdadm/tests: judge foreign array in test cases
Xiao Ni [Fri, 14 Jun 2024 02:45:01 +0000 (10:45 +0800)] 
mdadm/tests: judge foreign array in test cases

It needs to use array name when judging if one array is foreign or not.
So calling is_raid_foreign in test cases which need it.

Fixes: 41706a915684 ('mdadm/tests: names_template enhance')
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
13 months agoMakefile: Do not call gcc directly
Gwendal Grignou [Wed, 15 May 2024 21:30:59 +0000 (14:30 -0700)] 
Makefile: Do not call gcc directly

When mdadm is compiled with clang, direct gcc will fail.
Make sure to use $(CC) variable instead.

Note that Clang does not support --help=warnings,
--print-diagnostic-options should be used instead.
So with Clang, the compilation will go through, but the
extra warning flags will never be added.

Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
13 months agomdadm: Fix socket connection failure when mdmon runs in foreground mode.
Shminderjit Singh [Tue, 4 Jun 2024 07:46:03 +0000 (07:46 +0000)] 
mdadm: Fix socket connection failure when mdmon runs in foreground mode.

While creating an IMSM RAID, mdadm will wait for the mdmon main process
to finish if mdmon runs in forking mode. This is because with
"Type=forking" in the mdmon service unit file, "systemctl start service"
will block until the main process of mdmon exits. At that moment, mdmon
has already created the socket, so the subsequent socket connect from
mdadm will succeed.

However, when mdmon runs in foreground mode (without "Type=forking" in
the service unit file), "systemctl start service" will return once the
mdmon process starts. This causes mdadm and mdmon to run in parallel,
which may lead to a socket connection failure since mdmon has not yet
initialized the socket when mdadm tries to connect. If the next
instruction/command is to access this device and try to write to it, a
permission error will occur since mdmon has not yet set the array to RW
mode.

Signed-off-by: Shminderjit Singh <shminderjit.singh@oracle.com>
13 months agotest: pass flags to services
Mateusz Kusiak [Fri, 15 Mar 2024 20:03:09 +0000 (16:03 -0400)] 
test: pass flags to services

Commit 4c12714d1ca0 ("test: run tests on system level mdadm") removed
MDADM_NO_SYSTEMCTL flag from test suite. This causes imsm tests to fail
as mdadm no longer triggers mdmon and flags exists only within session.

Use systemd set/unset-environment to pass necessary flags.

Introduce colors to grab users attention to warnings and key messages.

Make test suite setup systemd environment.
Add setup/clean_systemd_env() functions.
Warn user about altering systemd environment.

Add colors to success/fail messages and warnings.

Signed-off-by: Mateusz Kusiak <mateusz.kusiak@intel.com>
13 months agomdadm: Block SIGCHLD processes before starting children
Logan Gunthorpe [Tue, 4 Jun 2024 16:38:37 +0000 (10:38 -0600)] 
mdadm: Block SIGCHLD processes before starting children

There is a small race condition noticed during code review, but
never actully hit in practice, with the write_zero feature.

If a write zeros fork finishes quickly before wait_for_zero_forks()
gets called, then the SIGCHLD will be delivered before the signalfd
is setup.

While this is only theoretical, fix this by blocking the SIGCHLD
signal before forking any children.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
13 months agomdadm: Fix hang race condition in wait_for_zero_forks()
Logan Gunthorpe [Tue, 4 Jun 2024 16:38:36 +0000 (10:38 -0600)] 
mdadm: Fix hang race condition in wait_for_zero_forks()

Running a create operation with --write-zeros can randomly hang
forever waiting for child processes. This happens roughly on in
ten runs with when running with small (20MB) loop devices.

The bug is caused by the fact that signals can be coallesced into
one if they are not read by signalfd quick enough. So if two children
finish at exactly the same time, only one SIGCHLD will be received
by the parent.

To fix this, wait on all processes with WNOHANG every time a SIGCHLD
is received and exit when all processes have been waited on.

Reported-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
14 months agoimsm: make freesize required to volume autolayout
Kinga Stefaniuk [Tue, 11 Jun 2024 05:58:49 +0000 (07:58 +0200)] 
imsm: make freesize required to volume autolayout

Autolayout_imsm() shall be executed when IMSM_NO_PLATFORM=1 is set.
It was fixed by listed commit, checking super->orom was removed, but
also checking freesize. Freesize is not set for operations on RAID
volume with no size update, that's why it is not required to have
this value and always run autolayout_imsm().
Fix it by making autolayout_imsm() dependent on freesize.

Fixes: 46f192 ("imsm: fix first volume autolayout with IMSM_NO_PLATFORM")
Signed-off-by: Kinga Stefaniuk <kinga.stefaniuk@intel.com>
14 months agoimsm: fix first volume autolayout with IMSM_NO_PLATFORM
Mariusz Tkaczyk [Thu, 23 May 2024 10:06:36 +0000 (12:06 +0200)] 
imsm: fix first volume autolayout with IMSM_NO_PLATFORM

Autolayout_imsm() is not executed if IMSM_NO_PLATFORM=1 is set.
This causes that first volume cannot be created. Disk for new volume are
never configured.

Fix it by making autolayout_imsm() independent from super->orom because
NULL there means that IMSM_NO_PLATFORM=1 is set. There are not platform
restrictions to create volume, we just analyze drives. It is safe.

Fixes: 6d4d9ab295de ("imsm: use same slot across container")
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
14 months agomdadm.h: provide basename if GLIBC is not avialable
Mariusz Tkaczyk [Tue, 21 May 2024 14:26:33 +0000 (16:26 +0200)] 
mdadm.h: provide basename if GLIBC is not avialable

If GNU basename is not avilable, define it. It is safer to use that
rather than include libgen.h with XPG basename() definition.

Fixes:#12

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
14 months agomdadm/tests: remove strace test
Xiao Ni [Tue, 28 May 2024 13:51:50 +0000 (21:51 +0800)] 
mdadm/tests: remove strace test

Some tests will fail if the test env doesn't have strace
commands. So remove the dependency.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
14 months agomdadm/tests: 05r1-re-add-nosuper
Xiao Ni [Tue, 28 May 2024 13:51:49 +0000 (21:51 +0800)] 
mdadm/tests: 05r1-re-add-nosuper

Patch 50b100768a11('mdadm: deprecate bitmap custom file') needs to confirm when
creating raid device with bitmap file.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
14 months agomdadm/tests: 04update-uuid
Xiao Ni [Tue, 28 May 2024 13:51:48 +0000 (21:51 +0800)] 
mdadm/tests: 04update-uuid

Patch 50b100768a11('mdadm: deprecate bitmap custom file') needs to confirm when
creating raid device with bitmap file.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
14 months agomdadm/tests: bitmap cases enhance
Xiao Ni [Tue, 28 May 2024 13:51:47 +0000 (21:51 +0800)] 
mdadm/tests: bitmap cases enhance

It fails because bitmap dirty number is smaller than 400 sometimes. It's not
good to compare bitmap dirty bits with a number. It depends on the test
machine, it can flush soon before checking the number. So remove related codes.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
14 months agomdadm/platform-intel: buffer overflow detected
Xiao Ni [Tue, 28 May 2024 08:44:39 +0000 (16:44 +0800)] 
mdadm/platform-intel: buffer overflow detected

mdadm -CR /dev/md0 -l1 -n2 /dev/nvme0n1 /dev/nvme2n1
*** buffer overflow detected ***: terminated
Aborted (core dumped)

It doesn't happen 100% and it depends on the building environment.
It can be fixed by replacing sprintf with snprintf.

Fixes: d835518b6b53 ('imsm: nvme multipath support')
Reported-by: Guang Wu <guazhang@redhat.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
14 months agomdadm/tests: disable selinux
Xiao Ni [Wed, 22 May 2024 08:50:56 +0000 (16:50 +0800)] 
mdadm/tests: disable selinux

Sometimes systemd service fails because selinux. Disable selinux
during testing now. We can enable it in future when having a better
method.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
14 months agomdadm/tests: 07changelevelintr
Xiao Ni [Wed, 22 May 2024 08:50:55 +0000 (16:50 +0800)] 
mdadm/tests: 07changelevelintr

It needs to specify a 2 powered array size when updating array size.
If not, it can't change chunksize.

And sometimes it reports error reshape doesn't happen. In fact the
reshape has finished. It doesn't need to wait before checking
reshape action. Because check function waits itself.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
14 months agomdadm/tests: 07autodetect.broken can be removed
Xiao Ni [Wed, 22 May 2024 08:50:54 +0000 (16:50 +0800)] 
mdadm/tests: 07autodetect.broken can be removed

07autodetect can run successfully without error in kernel 6.9.0-rc5.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
14 months agomdadm/tests: 07autoassemble
Xiao Ni [Wed, 22 May 2024 08:50:53 +0000 (16:50 +0800)] 
mdadm/tests: 07autoassemble

This test is used to test stacked array auto assemble.

There are two different cases depends on if array is foreign or not.
If the array is foreign, the stacked array (md0 is on md1 and md2)
can't be assembled with name md0. Because udev rule will run when md1
and md2 are assembled and mdadm -I doesn't specify homehost. So it
will treat stacked array (md0) as foreign array and choose md127 as
the device node name (/dev/md127)

Add the case that stacked array is local.

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>