Theodore Ts'o [Thu, 2 May 2024 16:23:44 +0000 (12:23 -0400)]
libext2fs: fix potential divide by zero bug caused by a lxcfs bug
If sysconf(_SC_NPROCESSORS_CONF) returns zero, this can cause a divide
by zero. Make ext2fs_rw_bitmaps() more robust defaulting to 4 threads
if _SC_NPROCESSORS_CONF returns an invalid value.
Theodore Ts'o [Wed, 1 May 2024 21:22:55 +0000 (17:22 -0400)]
e4defrag: use snprintf to assure that there can't be a buffer overflow
The size of msg_buffer is carefully calculated so it can never
overflow, but it triggers a Coverity warning. Use snprintf instead of
sprintf to silence the Coverity warning.
Theodore Ts'o [Wed, 1 May 2024 20:58:50 +0000 (16:58 -0400)]
libsupport: use explicit type widths instead of time_t
The in-memory data structures used time_t for the grace period (which
is a delta timestamp denominated in seconds), as well as the soft
limit expiration time (which is an actual time_t). Use an explicit
__u32 for the former, and the __u64 for the latter.
This silences a Coverity warning, but more importantly, using an
explicit __u64 for the expiration time means that running e2fsck on a
platform with a 32-bit time_t, and it needs to read and then modify a
quota structure, we won't lose the high 32-bits of the quota
expiration time.
Theodore Ts'o [Wed, 1 May 2024 04:24:52 +0000 (00:24 -0400)]
e2fsck: check the error return from the forced rewrite write
If read of a block fails, we offer the user the opportunity to force a
rewrite to that sector to force the storage device to remap the LBA to
its spare block pool. Check that write so if it fails, we can let the
user know.
Theodore Ts'o [Wed, 1 May 2024 04:20:10 +0000 (00:20 -0400)]
resize2fs: mark that the error return is deliberately ignored
When moving the inode table, if writing the (partially overlapping)
inode table fails, we need to write it back in its original location
before bailing out. If that write unding the initial write fails,
there's nothing we can do, so we ignore it. Mark this to avoid a
false positive from Coverity.
Theodore Ts'o [Wed, 1 May 2024 03:54:26 +0000 (23:54 -0400)]
e2scrub: test for the presence of systemd using test -e /run/systemd/system
Debian has a package called "systemctl" which provides a systemctl
executable to "manage services without systemd". So test for whether
we have a fully functional systemd system by checking for the
existence of /run/systemd/system instead testing for the presence of
the command named systemctl.
The problem with explicitly setting _FILE_OFFSET_BITS is that
it's not necessarily a no-op on a 64-bit platform with a 64-bit off_t.
Apparently glibc's mips64el which end up using a different structure
definition for struct stat, and this causes a compatibility problem
with libarchive. It's not needed on mips64el, since off_t is 64-bits,
but it actually causes problems.
So remove it, since we now use the autoconf's AC_SYS_LARGEFILE, which
will set _FILE_OFFSET_BITS when it is necessary (such as on a 32-bit
i386 Linux platform), and will skip it when it is unnecessary.
The libarchive functionality in "mke2fs -d foo.tar" is breaking the
regression test[1]. Since this is working everywhere _except_
mips64el, as a short-term workaround disable libarchive support on
this platform until it can be fixed.
The e2scrub scripts rely on systemd, which isn't present on non-Linux
systems, so they aren't built. So we need to skip trying to run
dh_installsystemd since it will fail on the Hurd build since the
requisite files aren't being built.
Teach configure the --without-libarchive option, which forcibly
disables use of the libarchive library.
The option --with-libarchive=direct will disable the use of dlopen,
and will link mke2fs with -larchive directly. This doesn't work when
building mke2f.static, since -larchive has a large number of
depedencies, and even "pkgconf --libs --static libarchive" doesn't
provide all of the appropriate library dependencies. :-(
debian: add a note in debian/changelog regarding features being re-enabled
The metadata_csum_seed and orphan_file features were disabled before
Debian Bookworm was released, but now that it's released, we are now
re-enabling those features for Debian testing and the next version of
Debian stable (trixie).
Manually count the number free clusters in the last block group since
it might not be a multiple of 8, and using ext2fs_bitcount() might not
work if bitmap isn't properly padding out.
In addition, when setting up the block bitmap for the resized file
system, resize2fs was setting up the "real end" of the bitmap in units
of blocks instead of clusters.
We didn't notice this problem earlier because of a test failure which
caused the test to be skipped.
Prevent i_dtime from being mistaken for an inode number post-2038 wraparound
We explicitly decided not to reserve space for a 64-bit dtime, since
it's never displayed or exposed to userspace. The dtime field is used
a linked list for the ophan list, and for forensic purposes when
trying to determine when an inode was deleted. So right after the
2038 epoch, a deleted inode might end up with a dtime which is zero or
smaller than the number of inodes, which will result in e2fsck
reporting a potential problems. So when we set the dtime, make sure
that the dtime won't be mistaken for an inode number.
libextr2fs: handle short reads/writes while creating the qcow file
This issue was flagged by Coverity, although its analysis was
incorrect. This isn't actually a memory overrun / security issue, but
rather a functional correctness issue since POSIX allows reads and
writes to be partially completed, and in those cases qcow2_copy_data()
could result in a corrutped qcow2 file.
configure: Use FORTIFY_SOURCE=3 when hardening is enabled
FORTIFY_SOURCE=3 provides much more robust checks for buffer overruns
and other memory bugs[1]. It requires gcc 12 and glibc 2.34 which
should be available on most modern distributions (which are the ones
that use --enable-hardening).
mke2fs: implement timestamp clamping if SOURCE_DATE_EPOCH is set
When copying files to the newly created file system using "mke2fs -d",
and there are timestamps greater than what is specified by
SOURCE_DATE_EPOCH, clamp the timestamp to the SOURCE_DATE_EPOCH
timestamp.
libext2fs: use a safe_getenv() function everywhere
Hoist safe_getenv() from test_io.c and unix_io.c to a globally
exported ext2fs_safe_getenv() and use it instead of getenv() in
libext2fs. This provides a bit more safety if e2fsprogs programs are
used in setuid contexts.
Fix coverity false positives introduced by the post-2038 changes
Commit ca8bc9240a00 ("Add post-2038 timestamp support...") did things
like casting a 64-bit unsigned integer into a signed 32-bit integer
deliberately; but Coverity thinks this is a bug. So mask off the bits
to make it clear this was deliberate.
e2fsck: make sure get_backup_sb() works when ctx is NULL
The print_e2fsck_message() function can call get_backup_sb() with the
ctx variable set to NULL. In that case, we can't dereference
ctx->filesystem_name; instead, we can get the size of the file system
from the ext2fs_block_count(fs->super).
Align function prototypes for libss's request handler function
Clang 17's Undefined Behaviour Sanitizer will throw run-time warnings
if a function pointer is dereferenced with a different function
signature than one in the pointer --- even if the difference is a
missing const qualifier. To fix regression test failures, change
declarations of argv to use ss_argv_t instead of an inconsistently
open-coded type.
The mkgnutar.pl file only works if the developer had a specific
username and uid. In addition, if it is used, the round-trip from tar
to an ext4 file system and back to tar isn't properly tested. So only
use mkgnutar.pl if the system doesn't have GNU TAR.
In addition, make sure all of the temp files created by the test are
deleted when the test is completed.
FreeBSD 14 has changed the definition of qsort_r to align it with the
POSIX, but it did this with a #define. So when sort_r.h tries to
provide a function prototype, surround the function name with
parenthesis so it doesn't get expanded by FreeBSD's #define.
Debugfs's stat command called ext2fs_inode_xtime_get() with a struct
inode * instead of a struct large_inode *. As a result, printing
inode timestamps will be incorrect if the time value is larger than
2**32.
Fixes: ca8bc9240a00 ("Add post-2038 timestamp support to e2fsprogs") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
misc: update mke2fs's man page regarding the default inode size
Since a23b50cd ("mke2fs: warn about missing y2038 support when
formatting fresh ext4 fs"), the default inode size is 256 bytes
for all filesystems, including small and floppy, except for the
Hurd since it currently only supports 128-byte inodes.
How timestamps are encoded in inodes and superblocks are different.
Unfortunately, commit ca8bc9240a00 which added post-2038 timestamps
was (a) overwriting adjacent superblock fields and/or attempting
unaligned writes to a 8-bit field from a 32-bit pointer, and (b) using
the incorrect encoding for timestamps stored in inodes. Fix both of
these issues, which were found thanks to UBSAN.
Fixes: ca8bc9240a00 ("Add post-2038 timestamp support to e2fsprogs") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
mke2fs: the -d option can now handle tarball input
If archive.h is available during compilation, enable mke2fs to read a
tarball as input. Since libarchive.so.13 is opened with dlopen,
libarchive is not a hard library dependency of the resulting binary.
In comparison with feeding a directory tree to mke2fs via -d this has
the following advantages:
- no superuser privileges, nor fakeroot, nor unshared user namespaces
are needed to create filesystems with arbitrary ownership information
and special files like device nodes which otherwise require being root
- by reading a tarball from standard input, no temporary files need to
be written out first as mke2fs can be used as part of a shell pipeline
which reduces disk usage and makes the conversion independent of the
underlying file system
A round-trip from tarball to ext4 to tarball yields bit-by-bit identical
results
Signed-off-by: Johannes Schauer Marin Rodrigues <josch@mister-muffin.de>
Commit ca8bc9240a00 ("Add post-2038 timestamp support to e2fsprogs")
was never built or tested on a 32-bit. It introduced some build
problems when time_t is a 32-bit integer, and it exposed some test
bugs. Fix them.
Fixes: ca8bc9240a00 ("Add post-2038 timestamp support to e2fsprogs") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck: don't try backup superblocks beyond the size of the device
Commit f7ef5f3e356d ("e2fsck: check all sparse_super backups") tries
to limit the number of block groups to search for backup superblocks
based on ctx->num_blocks. Unfortunately, get_backup_sb() gets called
before ctx->num_blocks is set, so we try all block groups up to 2**32
- 1. Not only does this waste time trying to read from blocks that
don't exist, it triggers the UBSAN checker when multiplying a very
large number by the block size.
Fix this by using ext2fs_get_Device_size(), and if that isn't
available, arbitrarily cap things so that we search block groups up to
128.
Sam James [Tue, 7 Nov 2023 23:31:20 +0000 (23:31 +0000)]
ext2fs: fix -Walloc-size
GCC 14 introduces a new -Walloc-size included in -Wextra which gives:
```
lib/ext2fs/hashmap.c:37:36: warning: allocation of insufficient size ‘1’ for type ‘struct ext2fs_hashmap’ with size ‘20’ [-Walloc-size]
```
The calloc prototype is:
```
void *calloc(size_t nmemb, size_t size);
```
So, just swap the number of members and size arguments to match the prototype, as
we're initialising 1 struct of size `sizeof(...)`. GCC then sees we're not
doing anything wrong.
Wenchao Hao [Fri, 17 Nov 2023 10:23:15 +0000 (18:23 +0800)]
debugfs: fix infinite loop while dumping the journal
There are 2 scenarios which would trigger infinite loop:
1. None log is recorded, then dumplog with "-n", for example:
debugfs -R "logdump -O -n 10" /dev/xxx
while /dev/xxx has no valid log recorded.
2. The log area is full and cycle write is triggered, then dumplog with
debugfs -R "logdump -aOS" /dev/xxx
This patch add a new flag "wrapped_flag" to mark if logdump has
reached to tail of logarea set in macro WRAP().
If wrapped_flag is true, and we comes to first_transaction_blocknr
again, just break the logdump loop.
[ Renamed reverse_flag to wrapped_flag to make it clearer what it is. -- TYT ]
Anssi Hannula [Tue, 7 Nov 2023 09:46:53 +0000 (11:46 +0200)]
resize2fs: avoid constantly flushing while moving blocks
resize2fs block_mover() flushes data after each extent and, curiously,
only if progress indicator is enabled, every inode_blocks_per_group
blocks.
This significantly affects performance, e.g. on a tested large
filesystem on top of MD-RAID6+LVM+dm-crypt these flush calls reduce the
operation rate from approx. 500MB/s to 5MB/s, causing extremely long
shrinking times for large size deltas (70TB in my case).
Since this step performs just plain data copying and does not e.g. save
any progress/checkpoint information or similar metadata, it seems like
this flushing is of very limited usefulness, especially when considering
the (in some cases) 100x performance impact.
Remove the mid-operation flushes and only flush after all blocks have
been moved.
tests: new test to check quota after a bad inode deallocation
This new test validates e2fsck by verifying that quota is updated after a bad
inode is deallocated. It mimics fstest ext4/019 by including a filesystem image
where a symbolic link was created to an existing file, using a long symlink
name. This symbolic link was then wiped with:
tests: new test to check quota after directory optimization
This new test validates e2fsck by verifying that quota data is updated after a
directory optimization is performed. This issue was initially found by fstest
ext4/014, and this test was based on it. It includes a filesystem image where
the lost+found directory is unlinked after a new link to it is created:
e2fsck: update quota when deallocating a bad inode
If a bad inode is found it will be deallocated. However, if the filesystem has
quota enabled, the quota information isn't being updated accordingly. This
issue was detected by running fstest ext4/019.
This patch fixes the issue by decreasing the inode count from the
quota and, if blocks are also being released, also subtract them as well.
While there, and as suggested by Andreas Dilger, the deallocate_inode()
function documentation is also updated by this patch to make it clear what
that function really does.
e2fsck: update quota accounting after directory optimization
In "Pass 3A: Optimizing directories", a directory may have it's size reduced.
If that happens and quota is enabled in the filesystem, the quota information
will be incorrect because it doesn't take the rehash into account. This issue
was detected by running fstest ext4/014.
This patch simply updates the quota data accordingly, after the directory is
written and it's size has been updated.
According to the mke2fs man page, the supported cluster-size values
for an ext4 filesystem are 2048 to 256M bytes. However, this is not
the case.
When mkfs is run to create a filesystem with following specifications:
* 1k blocksize and cluster-size greater than 32M
* 2k blocksize and cluster-size greater than 64M
* 4k blocksize and cluster-size greater than 128M
mkfs fails with "Invalid argument passed to ext2 library while trying
to create journal" error. In general, when the cluster-size to blocksize
ratio is greater than 32k, mkfs fails with this error.
Went through the code and found out that the function
`ext2fs_new_range()` is the source of this error. This is because when
the cluster-size to blocksize ratio exceeds 32k, the length argument
to the function `ext2fs_new_range()` results in 0. Hence, the error.
This patch corrects the valid cluster-size values.
Li Dongyang [Mon, 25 Sep 2023 06:08:01 +0000 (16:08 +1000)]
mke2fs: do not set the BLOCK_UNINIT on groups has GDT
This patch prepares the expansion of GDT blocks beyond a
single group, by make mke2fs to not set BLOCK_UNINIT on
groups with GDT blocks, block/inode bitmaps, or inode table
blocks allocated.
Otherwise, we still rely on kernel side to initialize the
block bitmap if the groups has BLOCK_UNINIT set, and the
kernel doesn't know a group could have GDT blocks allocated,
so it would make an bad block bitmap.
As a result, expect output of several tests needs to be changed,
especially if the test uses dumpe2fs to print the group summary.
Li Dongyang [Mon, 25 Sep 2023 06:08:00 +0000 (16:08 +1000)]
mke2fs: set free blocks accurately for groups has GDT
This patch is part of the preparation required to allow
GDT blocks expand beyond a single group,
it introduces 2 new interfaces:
- ext2fs_count_used_blocks(), to return the blocks used
in the bitmap range.
- ext2fs_reserve_super_and_bgd2() to return blocks used by
superblock/GDT blocks for every group, by looking up blocks used.
Andreas Dilger [Mon, 4 Sep 2023 04:57:42 +0000 (14:57 +1000)]
e2fsck: check all sparse_super backups
Teach e2fsck to look for backup super blocks in the "sparse_super"
groups, by checking group #1 first and then powers of 3^n, 5^n,
and 7^n, up to the limit of available block groups.
Export ext2fs_list_backups() function to efficiently iterate groups
for backup sb/GDT instead of checking every group. Ensure that the
group counters do not try to overflow the 2^32-1 group limit, and
try to limit scanning to the size of the block device (if available).
Li Dongyang [Mon, 4 Sep 2023 04:58:06 +0000 (14:58 +1000)]
mke2fs: batch zeroing inode table
For flex_bg enabled fs, we could merge the
inode table blocks into a contiguous range,
this improves mke2fs time on large devices
when lazy_itable_init is disabled.
On a 977TB device, unpatched mke2fs was running
for 449m10s before getting terminated manually.
strace shows huge number of fallocate, given the
offset from fallocate it has done 41% of the inode
tables, the estimated time needed would be 1082m.
unpatched patched
real 449m10.954s 4m20.531s
user 0m18.217s 0m16.147s
sys 0m20.311s 0m8.944s
libext2fs: always refuse to open a file system with a zero s_desc_size
Commit 42c11edd0863 ("ext2fs_open[2](), return an error if s_desc_size
is too large") added a check for an insanely large s_desc_size to
prevent some failures triggered by fuzz testing. However, it would
allow e2fsck to fall back to recover the file system by using the
backup superblocks by having e2fsck pass the flag
EXT2_FLAG_IGNORE_SB_ERRORS. But by allowing an s_desc_Size of zero,
it's possible that e2fsck will die with a division of zero error.
With this fix, e2fsck will now print an error message and exit
instead.
https://github.com/tytso/e2fsprogs/issues/183
Fixes: 42c11edd0863 ("ext2fs_open[2](), return an error if s_desc_size is too large") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Wed, 27 Sep 2023 05:40:16 +0000 (23:40 -0600)]
Add post-2038 timestamp support to e2fsprogs
The ext4 kernel code implemented support for s_mtime_hi,
s_wtime_hi, and related timestamp fields to avoid timestamp
overflow in 2038, but similar handling is not in e2fsprogs.
Add helper macros for the superblock _hi timestamp fields
ext2fs_super_tstamp_get() and ext2fs_super_tstamp_set().
Add helper macro for inode _extra timestamp fields
ext2fs_inode_xtime_get() and ext2fs_inode_xtime_set().
Add helper macro ext2fs_actual_inode_size() to avoid open
coding the i_extra_isize check in multiple places.
Remove inode_time_to_string() since this is unused once callers
change to time_to_string(ext2fs_inode_xtime_get()) directly.
Fix inode_includes() macro to properly wrap "inode" parameter,
and rename to ext2fs_inode_includes() to avoid potential name
clashes. Use this to check inode field inclusion in debugfs
instead of bare constants for inode field offsets.
Use these interfaces to access timestamps in debugfs, e2fsck,
libext2fs, fuse2fs, tune2fs, and e2undo.
Eric Biggers [Wed, 1 Mar 2023 03:45:18 +0000 (19:45 -0800)]
libext2fs: fix ext2fs_get_device_size2() return value on Windows
Creating a file system on Windows without a pre-existing file stopped
working because the Windows version of ext2fs_get_device_size2() doesn't
return ENOENT if the file doesn't exist. Fix this.
(Note: Filesystem state == "clean" means that EXT2_VALID_FS is set in
the superblock s_state field; "not clean with errors" means that the
flag is not set.)
I bet the "journal only" preen doesn't actually reset the filesystem
state either:
# e2fsck -E journal_only -p /dev/sda
# dumpe2fs /dev/sda -h | grep state
dumpe2fs 1.47.1~WIP-2023-12-27 (27-Dec-2023)
Filesystem state: not clean with errors
Nope.
So now I know what happened -- when mounting an ext* filesystem that
doesn't have a journal, the driver clears EXT2_VALID_FS from the primary
superblock. This forces the system to run e2fsck after a crash, because
that's what you have to do for unjournalled filesystems.
The "e2fsck -E journal_only -p" call in e2scrub only replays the
journal. Since there is no journal, it exits almost immediately.
That's the intended behavior, but then it means that the "e2fsck -fy"
call immediately after sees that the superblock doesn't have
EXT2_VALID_FS set, sets it, and makes e2fsck return 1.
So that's why you're getting the e2scrub failures.
Contrast this to what you get when the filesystem has a journal:
Filesystems with journals retain their EXT4_VALID_FS state when they're
mounted.
Hmm. What e2scrub should do about unjournalled filesystems is a thorny
question. My initial thought is that it should skip them, because a
mounted unjournalled filesystem cannot by definition be kept consistent.
Therefore, teach e2scrub_all to avoid them and e2scrub to fail them at
the onset.
Restricting the scope of e2scrub sucks, but in the meantime at least it
means that your filesystem isn't massively corrupt. Thanks for the
metadump, it was very useful for root cause analysis.
Darrick J. Wong [Wed, 10 Jan 2024 05:57:24 +0000 (21:57 -0800)]
debian: don't restart e2scrub_all when upgrading package
When installing or upgrading the e2fsprogs package, only start the
e2scrub_all timer and the reaping service. Don't restart e2scrub_all
itself, because that will kill any scrubs in progress, which will
trigger the failure reporting.
Darrick J. Wong [Sun, 31 Dec 2023 20:39:03 +0000 (12:39 -0800)]
e2scrub_fail: move executable script to /usr/libexec
Per FHS 3.0, non-PATH executable binaries are supposed to live under
/usr/libexec, not /usr/lib. e2scrub_fail is an executable script, so
move it to libexec in case some distro some day tries to mount /usr/lib
as noexec or something. Also, there's no reason why these scripts need
to be put under an arch-dependent path.