Theodore Ts'o [Wed, 1 May 2024 04:24:52 +0000 (00:24 -0400)]
e2fsck: check the error return from the forced rewrite write
If read of a block fails, we offer the user the opportunity to force a
rewrite to that sector to force the storage device to remap the LBA to
its spare block pool. Check that write so if it fails, we can let the
user know.
Theodore Ts'o [Wed, 1 May 2024 04:20:10 +0000 (00:20 -0400)]
resize2fs: mark that the error return is deliberately ignored
When moving the inode table, if writing the (partially overlapping)
inode table fails, we need to write it back in its original location
before bailing out. If that write unding the initial write fails,
there's nothing we can do, so we ignore it. Mark this to avoid a
false positive from Coverity.
Theodore Ts'o [Wed, 1 May 2024 03:54:26 +0000 (23:54 -0400)]
e2scrub: test for the presence of systemd using test -e /run/systemd/system
Debian has a package called "systemctl" which provides a systemctl
executable to "manage services without systemd". So test for whether
we have a fully functional systemd system by checking for the
existence of /run/systemd/system instead testing for the presence of
the command named systemctl.
The problem with explicitly setting _FILE_OFFSET_BITS is that
it's not necessarily a no-op on a 64-bit platform with a 64-bit off_t.
Apparently glibc's mips64el which end up using a different structure
definition for struct stat, and this causes a compatibility problem
with libarchive. It's not needed on mips64el, since off_t is 64-bits,
but it actually causes problems.
So remove it, since we now use the autoconf's AC_SYS_LARGEFILE, which
will set _FILE_OFFSET_BITS when it is necessary (such as on a 32-bit
i386 Linux platform), and will skip it when it is unnecessary.
The libarchive functionality in "mke2fs -d foo.tar" is breaking the
regression test[1]. Since this is working everywhere _except_
mips64el, as a short-term workaround disable libarchive support on
this platform until it can be fixed.
The e2scrub scripts rely on systemd, which isn't present on non-Linux
systems, so they aren't built. So we need to skip trying to run
dh_installsystemd since it will fail on the Hurd build since the
requisite files aren't being built.
Teach configure the --without-libarchive option, which forcibly
disables use of the libarchive library.
The option --with-libarchive=direct will disable the use of dlopen,
and will link mke2fs with -larchive directly. This doesn't work when
building mke2f.static, since -larchive has a large number of
depedencies, and even "pkgconf --libs --static libarchive" doesn't
provide all of the appropriate library dependencies. :-(
debian: add a note in debian/changelog regarding features being re-enabled
The metadata_csum_seed and orphan_file features were disabled before
Debian Bookworm was released, but now that it's released, we are now
re-enabling those features for Debian testing and the next version of
Debian stable (trixie).
Manually count the number free clusters in the last block group since
it might not be a multiple of 8, and using ext2fs_bitcount() might not
work if bitmap isn't properly padding out.
In addition, when setting up the block bitmap for the resized file
system, resize2fs was setting up the "real end" of the bitmap in units
of blocks instead of clusters.
We didn't notice this problem earlier because of a test failure which
caused the test to be skipped.
* Re-enable metadata_csum_seed and orphan_file by default now that
Debian Bookworm is released
* New upstream version
* Add support for post-2038 timestamps on platforms with 64-bit time_t.
* Mke2fs -d can now support an input tar file if the libarchive library
is available.
* Install a udev rule to inihibit ext4 file systesm from being
automounted by udisks.
* Debugfs's 'hash' command has been enhanced to use the hash seed and
algorithm from a superblock if a file system is opened and to display
the hash seed and algorithm if the -v flag is given.
* Teach mke2fs with a new extended options, root_perms, which overrides
the permissions for the root directory for the new file system.
* Preserve any error indicator in the superblock when replaying the
journal so that a subsequent fsck can repair the file system afterwards.
* Fix potential mke2fs failures when creating a file system with an
orphan file when the storage device has a previous file system and
does not support discard/trim commands.
* E2fsck will clear the orphan_present feature silently in preen mode.
* Fix potential checksum failures when performing an online resize when
the mounted file system is actively modifying the superblock.
* Fix a bug where a checksum failure in an htree directory can cause
e2fsck's preen mode to abort unnecessarily.
* Fix e2fsck's handling of an invalid symlink in an inline_data
directory.
* Fix e4crypt from issuing a spurious "success" error message when
trying to set a policy on a non-directory.
* Fix a potential infinite loop in debugfs's logdump command in some
edge cases.
* Fix e2fsck to correctly update quota usage after optimizing
directories or deleting corrupted inodes.
* Fix fuse2fs so that directories are created with the correct
permissions instead of having the other and group write permissions
masked off.
* Fix a potential e2fsck divide by zero crash caused by a maliciously
fuzzed file system.
* Fix dumpe2fs to report free block ranges correctly for bigalloc file
file systems.
* Fix resize2fs where resizing a bigalloc file system can result in the
free cluster count in the last block group and the total free clusters
count to be incorrect.
* Avoid spurious e2scrub failures caused by trying to scrub file
syustems that do not have the journal enabled, and by aborting scrub
runs while upgrading the e2fsprogs package
* Teach tune2fs to detect a file system which is mounted but is not
mentioned in the mount namespace where tune2fs is run by treating a
block device which is busy as if it is mounted.
* If tune2fs can't find the mountpoint for a file system which is
apparently mounted (perhaps because it's not present in the current
mount namespace) when attempting to set the label or UUID in the
superblock, fall back to the old method of modifying block device and
silence printing any error messages.
* If both the primary superblock and first block group's backup
superblock are corrupted, e2fsck will now try additional backup
superblocks if they are available.
* Avoid mke2fs from creating an invalid file system with an insufficient
number of inodes when creating a file system which is very small
(100k), a block size of 1k, and an inode size of 256 bytes.
* Fix a potential deadlock caused by e2fsck being run in Direct I/O mode
with the threading optimization enabled.
* Update and clarify various man pages. (Closes: #1038286)
* Add support for SOURCE_DATE_EPOCH environment variable
* Improve resize2fs's performance by eliminating extra cache flushes.
* Improve mke2fs's performance when zeroing a large number of inode
table blocks (when lazy inode table initialization is not enabled) by
batching calls to ext2fs_zero_blocks.
* Use a safe_getenv function for all calls to fetch the environment
variable in libext2fs.
* Upgrade fuse2fs to use fuse v3.
* Build the binaries using FORTIFY_SOURCE=3 for better hardening
* Add Romainian translation.
* Update Malay translation.
Prevent i_dtime from being mistaken for an inode number post-2038 wraparound
We explicitly decided not to reserve space for a 64-bit dtime, since
it's never displayed or exposed to userspace. The dtime field is used
a linked list for the ophan list, and for forensic purposes when
trying to determine when an inode was deleted. So right after the
2038 epoch, a deleted inode might end up with a dtime which is zero or
smaller than the number of inodes, which will result in e2fsck
reporting a potential problems. So when we set the dtime, make sure
that the dtime won't be mistaken for an inode number.
libextr2fs: handle short reads/writes while creating the qcow file
This issue was flagged by Coverity, although its analysis was
incorrect. This isn't actually a memory overrun / security issue, but
rather a functional correctness issue since POSIX allows reads and
writes to be partially completed, and in those cases qcow2_copy_data()
could result in a corrutped qcow2 file.
configure: Use FORTIFY_SOURCE=3 when hardening is enabled
FORTIFY_SOURCE=3 provides much more robust checks for buffer overruns
and other memory bugs[1]. It requires gcc 12 and glibc 2.34 which
should be available on most modern distributions (which are the ones
that use --enable-hardening).
mke2fs: implement timestamp clamping if SOURCE_DATE_EPOCH is set
When copying files to the newly created file system using "mke2fs -d",
and there are timestamps greater than what is specified by
SOURCE_DATE_EPOCH, clamp the timestamp to the SOURCE_DATE_EPOCH
timestamp.
libext2fs: use a safe_getenv() function everywhere
Hoist safe_getenv() from test_io.c and unix_io.c to a globally
exported ext2fs_safe_getenv() and use it instead of getenv() in
libext2fs. This provides a bit more safety if e2fsprogs programs are
used in setuid contexts.
Fix coverity false positives introduced by the post-2038 changes
Commit ca8bc9240a00 ("Add post-2038 timestamp support...") did things
like casting a 64-bit unsigned integer into a signed 32-bit integer
deliberately; but Coverity thinks this is a bug. So mask off the bits
to make it clear this was deliberate.
e2fsck: make sure get_backup_sb() works when ctx is NULL
The print_e2fsck_message() function can call get_backup_sb() with the
ctx variable set to NULL. In that case, we can't dereference
ctx->filesystem_name; instead, we can get the size of the file system
from the ext2fs_block_count(fs->super).
Align function prototypes for libss's request handler function
Clang 17's Undefined Behaviour Sanitizer will throw run-time warnings
if a function pointer is dereferenced with a different function
signature than one in the pointer --- even if the difference is a
missing const qualifier. To fix regression test failures, change
declarations of argv to use ss_argv_t instead of an inconsistently
open-coded type.
The mkgnutar.pl file only works if the developer had a specific
username and uid. In addition, if it is used, the round-trip from tar
to an ext4 file system and back to tar isn't properly tested. So only
use mkgnutar.pl if the system doesn't have GNU TAR.
In addition, make sure all of the temp files created by the test are
deleted when the test is completed.
FreeBSD 14 has changed the definition of qsort_r to align it with the
POSIX, but it did this with a #define. So when sort_r.h tries to
provide a function prototype, surround the function name with
parenthesis so it doesn't get expanded by FreeBSD's #define.
Debugfs's stat command called ext2fs_inode_xtime_get() with a struct
inode * instead of a struct large_inode *. As a result, printing
inode timestamps will be incorrect if the time value is larger than
2**32.
Fixes: ca8bc9240a00 ("Add post-2038 timestamp support to e2fsprogs") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
misc: update mke2fs's man page regarding the default inode size
Since a23b50cd ("mke2fs: warn about missing y2038 support when
formatting fresh ext4 fs"), the default inode size is 256 bytes
for all filesystems, including small and floppy, except for the
Hurd since it currently only supports 128-byte inodes.
How timestamps are encoded in inodes and superblocks are different.
Unfortunately, commit ca8bc9240a00 which added post-2038 timestamps
was (a) overwriting adjacent superblock fields and/or attempting
unaligned writes to a 8-bit field from a 32-bit pointer, and (b) using
the incorrect encoding for timestamps stored in inodes. Fix both of
these issues, which were found thanks to UBSAN.
Fixes: ca8bc9240a00 ("Add post-2038 timestamp support to e2fsprogs") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
mke2fs: the -d option can now handle tarball input
If archive.h is available during compilation, enable mke2fs to read a
tarball as input. Since libarchive.so.13 is opened with dlopen,
libarchive is not a hard library dependency of the resulting binary.
In comparison with feeding a directory tree to mke2fs via -d this has
the following advantages:
- no superuser privileges, nor fakeroot, nor unshared user namespaces
are needed to create filesystems with arbitrary ownership information
and special files like device nodes which otherwise require being root
- by reading a tarball from standard input, no temporary files need to
be written out first as mke2fs can be used as part of a shell pipeline
which reduces disk usage and makes the conversion independent of the
underlying file system
A round-trip from tarball to ext4 to tarball yields bit-by-bit identical
results
Signed-off-by: Johannes Schauer Marin Rodrigues <josch@mister-muffin.de>
Commit ca8bc9240a00 ("Add post-2038 timestamp support to e2fsprogs")
was never built or tested on a 32-bit. It introduced some build
problems when time_t is a 32-bit integer, and it exposed some test
bugs. Fix them.
Fixes: ca8bc9240a00 ("Add post-2038 timestamp support to e2fsprogs") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck: don't try backup superblocks beyond the size of the device
Commit f7ef5f3e356d ("e2fsck: check all sparse_super backups") tries
to limit the number of block groups to search for backup superblocks
based on ctx->num_blocks. Unfortunately, get_backup_sb() gets called
before ctx->num_blocks is set, so we try all block groups up to 2**32
- 1. Not only does this waste time trying to read from blocks that
don't exist, it triggers the UBSAN checker when multiplying a very
large number by the block size.
Fix this by using ext2fs_get_Device_size(), and if that isn't
available, arbitrarily cap things so that we search block groups up to
128.
Sam James [Tue, 7 Nov 2023 23:31:20 +0000 (23:31 +0000)]
ext2fs: fix -Walloc-size
GCC 14 introduces a new -Walloc-size included in -Wextra which gives:
```
lib/ext2fs/hashmap.c:37:36: warning: allocation of insufficient size ‘1’ for type ‘struct ext2fs_hashmap’ with size ‘20’ [-Walloc-size]
```
The calloc prototype is:
```
void *calloc(size_t nmemb, size_t size);
```
So, just swap the number of members and size arguments to match the prototype, as
we're initialising 1 struct of size `sizeof(...)`. GCC then sees we're not
doing anything wrong.
Wenchao Hao [Fri, 17 Nov 2023 10:23:15 +0000 (18:23 +0800)]
debugfs: fix infinite loop while dumping the journal
There are 2 scenarios which would trigger infinite loop:
1. None log is recorded, then dumplog with "-n", for example:
debugfs -R "logdump -O -n 10" /dev/xxx
while /dev/xxx has no valid log recorded.
2. The log area is full and cycle write is triggered, then dumplog with
debugfs -R "logdump -aOS" /dev/xxx
This patch add a new flag "wrapped_flag" to mark if logdump has
reached to tail of logarea set in macro WRAP().
If wrapped_flag is true, and we comes to first_transaction_blocknr
again, just break the logdump loop.
[ Renamed reverse_flag to wrapped_flag to make it clearer what it is. -- TYT ]
Anssi Hannula [Tue, 7 Nov 2023 09:46:53 +0000 (11:46 +0200)]
resize2fs: avoid constantly flushing while moving blocks
resize2fs block_mover() flushes data after each extent and, curiously,
only if progress indicator is enabled, every inode_blocks_per_group
blocks.
This significantly affects performance, e.g. on a tested large
filesystem on top of MD-RAID6+LVM+dm-crypt these flush calls reduce the
operation rate from approx. 500MB/s to 5MB/s, causing extremely long
shrinking times for large size deltas (70TB in my case).
Since this step performs just plain data copying and does not e.g. save
any progress/checkpoint information or similar metadata, it seems like
this flushing is of very limited usefulness, especially when considering
the (in some cases) 100x performance impact.
Remove the mid-operation flushes and only flush after all blocks have
been moved.
tests: new test to check quota after a bad inode deallocation
This new test validates e2fsck by verifying that quota is updated after a bad
inode is deallocated. It mimics fstest ext4/019 by including a filesystem image
where a symbolic link was created to an existing file, using a long symlink
name. This symbolic link was then wiped with:
tests: new test to check quota after directory optimization
This new test validates e2fsck by verifying that quota data is updated after a
directory optimization is performed. This issue was initially found by fstest
ext4/014, and this test was based on it. It includes a filesystem image where
the lost+found directory is unlinked after a new link to it is created:
e2fsck: update quota when deallocating a bad inode
If a bad inode is found it will be deallocated. However, if the filesystem has
quota enabled, the quota information isn't being updated accordingly. This
issue was detected by running fstest ext4/019.
This patch fixes the issue by decreasing the inode count from the
quota and, if blocks are also being released, also subtract them as well.
While there, and as suggested by Andreas Dilger, the deallocate_inode()
function documentation is also updated by this patch to make it clear what
that function really does.
e2fsck: update quota accounting after directory optimization
In "Pass 3A: Optimizing directories", a directory may have it's size reduced.
If that happens and quota is enabled in the filesystem, the quota information
will be incorrect because it doesn't take the rehash into account. This issue
was detected by running fstest ext4/014.
This patch simply updates the quota data accordingly, after the directory is
written and it's size has been updated.
According to the mke2fs man page, the supported cluster-size values
for an ext4 filesystem are 2048 to 256M bytes. However, this is not
the case.
When mkfs is run to create a filesystem with following specifications:
* 1k blocksize and cluster-size greater than 32M
* 2k blocksize and cluster-size greater than 64M
* 4k blocksize and cluster-size greater than 128M
mkfs fails with "Invalid argument passed to ext2 library while trying
to create journal" error. In general, when the cluster-size to blocksize
ratio is greater than 32k, mkfs fails with this error.
Went through the code and found out that the function
`ext2fs_new_range()` is the source of this error. This is because when
the cluster-size to blocksize ratio exceeds 32k, the length argument
to the function `ext2fs_new_range()` results in 0. Hence, the error.
This patch corrects the valid cluster-size values.
Li Dongyang [Mon, 25 Sep 2023 06:08:01 +0000 (16:08 +1000)]
mke2fs: do not set the BLOCK_UNINIT on groups has GDT
This patch prepares the expansion of GDT blocks beyond a
single group, by make mke2fs to not set BLOCK_UNINIT on
groups with GDT blocks, block/inode bitmaps, or inode table
blocks allocated.
Otherwise, we still rely on kernel side to initialize the
block bitmap if the groups has BLOCK_UNINIT set, and the
kernel doesn't know a group could have GDT blocks allocated,
so it would make an bad block bitmap.
As a result, expect output of several tests needs to be changed,
especially if the test uses dumpe2fs to print the group summary.
Li Dongyang [Mon, 25 Sep 2023 06:08:00 +0000 (16:08 +1000)]
mke2fs: set free blocks accurately for groups has GDT
This patch is part of the preparation required to allow
GDT blocks expand beyond a single group,
it introduces 2 new interfaces:
- ext2fs_count_used_blocks(), to return the blocks used
in the bitmap range.
- ext2fs_reserve_super_and_bgd2() to return blocks used by
superblock/GDT blocks for every group, by looking up blocks used.
Andreas Dilger [Mon, 4 Sep 2023 04:57:42 +0000 (14:57 +1000)]
e2fsck: check all sparse_super backups
Teach e2fsck to look for backup super blocks in the "sparse_super"
groups, by checking group #1 first and then powers of 3^n, 5^n,
and 7^n, up to the limit of available block groups.
Export ext2fs_list_backups() function to efficiently iterate groups
for backup sb/GDT instead of checking every group. Ensure that the
group counters do not try to overflow the 2^32-1 group limit, and
try to limit scanning to the size of the block device (if available).
Li Dongyang [Mon, 4 Sep 2023 04:58:06 +0000 (14:58 +1000)]
mke2fs: batch zeroing inode table
For flex_bg enabled fs, we could merge the
inode table blocks into a contiguous range,
this improves mke2fs time on large devices
when lazy_itable_init is disabled.
On a 977TB device, unpatched mke2fs was running
for 449m10s before getting terminated manually.
strace shows huge number of fallocate, given the
offset from fallocate it has done 41% of the inode
tables, the estimated time needed would be 1082m.
unpatched patched
real 449m10.954s 4m20.531s
user 0m18.217s 0m16.147s
sys 0m20.311s 0m8.944s