Instead of calling casefold two times and memcmp the result, which
require allocating a temporary buffer for the casefolded version, add a
strcasecmp-like method to perform the comparison of each code-point
during the casefold itself.
This method is exposed because it needs to be used directly by fsck.
tune2fs: allow enabling casefold feature after fs creation
The main reason we didn't allow this before was because !CASEFOLDED
directories were expected to be normalized(). Since this is no longer
the case, and as long as the encrypt feature is not enabled, it should
be safe to enable this feature.
Disabling the feature is trickier, since we need to make sure there are
no existing +F directories in the filesystem. Leave that for a future
patch.
Also, enabling strict mode requires some filesystem-wide verification,
so ignore that for now.
Fast commit replay needs to rewrite the entire extent tree for inodes
found in fast commit area. This patch makes e2fsck's rewrite extent
tree path visible.
Count the total number of blocks occupied by inode including
intermediate extent tree nodes.
extern errcode_t ext2fs_count_blocks(ext2_filsys fs, ext2_ino_t ino,
struct ext2_inode *inode,
blk64_t *ret_count);
Convert the on-disk reputation of an extent to the in-memory
representation.
extern errcode_t ext2fs_decode_extent(struct ext2fs_extent *to, void *from,
int len);
Theodore Ts'o [Sat, 23 Jan 2021 05:57:18 +0000 (00:57 -0500)]
Fix clang warnings
Clang gets unhappy when passing an unsigned char to string functions.
For better or for worse we use __u8[] in the definition of the
superblock. So cast them these to "char *" to prevent clang
build-time warnings.
Theodore Ts'o [Fri, 22 Jan 2021 04:27:00 +0000 (23:27 -0500)]
libext2fs: fix UBSAN warning in ext2fs_mmp_new_seq()
Left shifting the pid by 16 bits can cause a UBSAN warning if the pid
is greater than or equal to 2**16. It doesn't matter since we're just
using the pid to seed for a pseudo-random number generator, but
silence the warning by just swapping the high and low 16 bits of the
pid instead.
Hauke Mehrtens [Thu, 21 Jan 2021 23:11:30 +0000 (18:11 -0500)]
build: Add SYSLIBS to e4crypt linking
The $(SYSLIBS) was missing when linking the e4crypt application. This is
available in the e4crypt.profiled variant, so I assume this was just
missing in the normal variant and is not left out intentionally.
Theodore Ts'o [Thu, 21 Jan 2021 22:08:40 +0000 (17:08 -0500)]
tune2fs: abort clearing the dir_index when the fs needs to be fsck'ed first
We were not checking the return value of check_fsck_needed() when
checking to clear the dir_index feature. As a result, tune2fs would
print that the file system needed to be checked first, but then go
ahead and clear the dir_index flag.
This commit also adds fast_commit.h that contains the necessary
helpers needed for fast commit replay. Note that this file is also
byte by byte identical with kernel's fast_commit.h.
ext4: fix tests to account for new dumpe2fs output
dumpe2fs tool now is capable of reporting number of fast commit
blocks. There were slight changes in the output of dumpe2fs outside of
fast commits. This patch fixes the regression tests to expect the new
output.
libext2fs: provide APIs to configure fast commit blocks
This patch adds new libext2fs that allow configuring number of fast
commit blocks in journal superblock. We also add a struct
ext2fs_journal_params which contains number of fast commit blocks and
number of normal journal blocks. With this patch, the preferred way
for configuring number of blocks with and without fast commits is:
In order to make recovery.c identical with kernel, we need endianness
conversion macros (such as cpu_to_be32 and friends) defined in
e2fsprogs. This patch defines these macros and also fixes recovery.c
to use these. These macros are also needed for fast commit recovery
patches later in this series.
ext2fs: move calculate_summary_stats to ext2fs lib
The function calculate_summary_stats sets the global metadata of the
file system. Tune2fs had this function defined statically in
tune2fs.c. Fast commit replay needs this function to set global
metadata at the end of the replay phase. So, move this function to
libext2fs.
Wang Shilong [Thu, 14 Jan 2021 00:27:22 +0000 (16:27 -0800)]
ext2fs: parallel bitmap loading
In our benchmarking for PiB size filesystem, pass5 takes
10446s to finish and 99.5% of time takes on reading bitmaps.
It makes sense to reading bitmaps using multiple threads,
a quickly benchmark show 10446s to 626s with 64 threads.
[ This has all of many bug fixes for rw_bitmaps.c from the original
luster patch set collapsed into a single commit. In addition it has
the new ext2fs_rw_bitmaps() api proposed by Ted. ]
Theodore Ts'o [Thu, 14 Jan 2021 00:27:20 +0000 (16:27 -0800)]
libext2fs: add threading support to the I/O manager abstraction
Add initial implementation support for the unix_io manager.
Applications which want to use threading should pass in
IO_FLAG_THREADS when opening the channel. Channels which support
threading (which as of this commit is unix_io and test_io if the
backing io_manager supports threading) will set the
CHANNEL_FLAGS_THREADS bit in io->flags. Library code or applications
can test if threading is enabled by checking this flag.
Applications using libext2fs can pass in EXT2_FLAG_THREADS to
ext2fs_open() or ext2fs_open2() to request threading support.
Theodore Ts'o [Thu, 14 Jan 2021 00:27:19 +0000 (16:27 -0800)]
Add configure and build support for the pthreads library
Support for pthreads can be forcibly disabled by passing
"--without-pthread" to the configure script.
The actual changes in this commit are in configure.ac and MCONFIG.in;
the other files were generated as a result of running aclocal,
autoconf, and autoheader on a Debian testing system.
Note: the autoconf-archive package must now be installed before
rerunning aclocal, to supply the AX_PTHREAD macro.
Lukas Czerner [Mon, 2 Nov 2020 14:26:31 +0000 (15:26 +0100)]
mke2fs: Escape double quotes when parsing mke2fs.conf
Currently, when constructing the <default> configuration pseudo-file using
the profile-to-c.awk script we will just pass the double quotes as they
appear in the mke2fs.conf.
This is problematic, because the resulting default_profile.c will either
fail to compile because of syntax error, or leave the resulting
configuration invalid.
It can be reproduced by adding the following line somewhere into
mke2fs.conf configuration and forcing mke2fs to use the <default>
configuration by specifying nonexistent mke2fs.conf
Romain Naour [Mon, 2 Nov 2020 13:03:19 +0000 (14:03 +0100)]
libext2fs: add gnu.translator support
The support of setting (and reading) of passive translators from
GNU/Linux has been added to the Linux kernel by the commit [1].
The name index '10' has been reserved for GNU/Hurd.
Hurd passive translators are stored as a xattr value with name
"gnu.translator" [2].
If "gnu.translator" xattr value has been set before calling
mkfs.ext2, it will segfault since "gnu." is not present in
ea_names[].
Luis Henriques [Wed, 28 Oct 2020 15:55:50 +0000 (15:55 +0000)]
filefrag: handle invalid st_dev and blksize cases
It is possible to crash filefrag with a "Floating point exception" in
two different scenarios:
1. When fstat() returns a device ID set to 0
2. When FIGETBSZ ioctl returns a blocksize of 0
In both scenarios a divide-by-zero will occur in frag_report() because
variable blksize will be set to zero.
I've managed to trigger this crash with an old CephFS kernel client,
using xfstest generic/519. The first scenario has been fixed by kernel
commit 75c9627efb72 ("ceph: map snapid to anonymous bdev ID"). The
second scenario is also fixed with commit 8f97d1e99149 ("vfs: fix
FIGETBSZ ioctl on an overlayfs file").
However, it is desirable to handle these two scenarios gracefully by
checking these conditions explicitly.
Signed-off-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Antoine Tenart [Fri, 17 Jul 2020 10:08:46 +0000 (12:08 +0200)]
create_inode: set xattrs to the root directory as well
populate_fs do copy the xattrs for all files and directories, but the
root directory is skipped and as a result its extended attributes aren't
set. This is an issue when using mkfs to build a full system image that
can be used with SElinux in enforcing mode without making any runtime
fix at first boot.
This patch adds logic to set the root directory's extended attributes.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Wed, 25 Nov 2020 13:47:54 +0000 (08:47 -0500)]
po: reapply local e2fsprogs chages to po/Makefile.in.in
These are the changes which are needed after running gettextize to
update to gettext 0.19.8 in the previous commit.
* Add support for maintainer mode (which doesn't do as much given
that gettext now has settings in Makevars which allows us to
suppress automatic updates of the po and gmo files)
* Add support to expand the '@' abbreviations in e2fsck/problem.c
and give an explanation of how they work for translators
* Add support for configure --enable-verbose-makecmds and default to
"kernel-style" quieter make output --- this makes it easier
to see warnings and errors by suppressing the distracting
details.
* Teach the makefile where to find the generated error table C files
in the build directory.
* Add make targets (e.g., all-static, fullcheck, coverage.txt) which
are required by the top-level Makefile.
Theodore Ts'o [Wed, 25 Nov 2020 04:00:57 +0000 (23:00 -0500)]
Update gettext files to version 0.19.8
This also removes the built-in "intl" directory as this is now
considered deprecated by the gettext package. This means that we
won't try to use an internal version of gettext if it's not installed
on the build system. We will simply disable NLS support in that case.
Theodore Ts'o [Tue, 6 Oct 2020 12:29:09 +0000 (08:29 -0400)]
debugfs: fix parse_uint for 64-bit fields
The logic for handling 64-bit structure elements was reversed, which
caused attempts to set fields like kbytes_written to fail:
% debugfs -w /tmp/foo.img
debugfs 1.45.6 (20-Mar-2020)
debugfs: set_super_value kbytes_written 1024
64-bit field kbytes_written has a second 64-bit field
defined; BUG?!?
Theodore Ts'o [Mon, 5 Oct 2020 03:05:01 +0000 (23:05 -0400)]
Define MKDIR_P in the Makefile.in files instead in MCONFIG.in
In the case where mkdir -p is not thread-safe (for example, if the
build environment is using busybox's mkdir) the configure script will
fall back to the slow (but safe) install-sh script. In that case
MKDIR_P will be using a relative pathname; so we can't use speed
optimization of defining configure substitutions in MCONFIG.in, since
the substitution will be different depending on depth of the
subdirectory in the Makefile.in file.
Theodore Ts'o [Fri, 2 Oct 2020 18:47:25 +0000 (14:47 -0400)]
resize2fs: prevent block group descriptors from overflowing the first bg
For 1k block file systems, resizing a file system larger than 1073610752 blocks will result in the size of the block group
descriptors to be so large that it will overlap with the backup
superblock in block group #1. This problem can be reproduced via:
Since commit [382ed4a1 e2fsck: use proper types for variables][1]
applied, it used ext2_ino_t instead of ino_t for referencing inode
numbers, but the type of is_hardlink's `ino' should not be instead,
The ext2_ino_t is 32bit, if inode > 0xFFFFFFFF, its value will be
truncated.
Add a debug printf to show the value of inode, when it check for hardlink
files, it will always return false if inode > 0xFFFFFFFF
|--- a/misc/create_inode.c
|+++ b/misc/create_inode.c
|@@ -605,6 +605,7 @@ static int is_hardlink(struct hdlinks_s *hdlinks, dev_t dev, ext2_ino_t ino)
| {
| int i;
|
|+ printf("%s %d, %lX, %lX\n", __FUNCTION__, __LINE__, hdlinks->hdl[i].src_ino, ino);
| for (i = 0; i < hdlinks->count; i++) {
| if (hdlinks->hdl[i].src_dev == dev &&
| hdlinks->hdl[i].src_ino == ino)
Jan Kara [Thu, 9 Jul 2020 14:40:57 +0000 (16:40 +0200)]
mke2fs: Warn if fs block size is incompatible with DAX
If we are creating filesystem on DAX capable device, warn if set block
size is incompatible with DAX to give admin some hint why DAX might not
be available.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e4crypt: if salt is explicitly provided to add_key, then use it
Providing -S and a path to 'add_key' previously exhibited an
unintuitive behavior: instead of using the salt explicitly provided by
the user, e4crypt would use the salt obtained via
EXT4_IOC_GET_ENCRYPTION_PWSALT on the path. This was because
set_policy() was still called with NULL as salt.
With this change we now remember the explicitly provided salt (if any)
and use it as argument for set_policy().
Eventually
e4crypt add_key -S s:my-spicy-salt /foo
will now actually use 'my-spicy-salt' and not something else as salt
for the policy set on /foo.
Andreas Dilger [Wed, 17 Jun 2020 11:40:49 +0000 (05:40 -0600)]
tune2fs: reset MMP state on error exit
If tune2fs cannot perform the requested change, ensure that the MMP
block is reset to the unused state before exiting. Otherwise, the
filesystem will be left with mmp_seq = EXT4_MMP_SEQ_FSCK set, which
prevents it from being mounted afterward:
EXT4-fs warning (device dm-9): ext4_multi_mount_protect:311:
fsck is running on the filesystem
Add a test to try some failed tune2fs operations and verify that the
MMP block is left in a clean state afterward.
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13672 Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Lukas Czerner [Fri, 5 Jun 2020 08:14:40 +0000 (10:14 +0200)]
e2fsck: use size_t instead of int in string_copy()
len argument in string_copy() is int, but it is used with malloc(),
strlen(), strncpy() and some callers use sizeof() to pass value in. So
it really ought to be size_t rather than int. Fix it.
Theodore Ts'o [Wed, 26 Aug 2020 20:29:29 +0000 (16:29 -0400)]
libext2fs: fix potential buffer overrun in __get_dirent_tail()
If the file system is corrupted, there is a potential of a read-only
buffer overrun. Fortunately, we don't actually use the result of that
pointer dereference, and the overrun is at most 64k.
Google-Bug-Id: #158564737 Fixes: eb88b751745b ("libext2fs: make ext2fs_dirent_has_tail() more strict") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Thu, 18 Jun 2020 01:43:37 +0000 (21:43 -0400)]
debugfs: fix building rdebugfs (with READ_ONLY define)
Fix bitrot for building a restricted version of debugfs, which does
not require read/write access to the file system, and which only
allows access to the file system metadata.
Theodore Ts'o [Mon, 18 May 2020 03:05:11 +0000 (23:05 -0400)]
libext2fs: retry reading superblock on open when checksum is bad
When opening a file system which is mounted, it's possible that when
ext2fs_open2() is racing with the kernel modifying the orphaned inode
list, the superblock's checksum could be incorrect. So retry reading
the superblock in the hopes that the problem will self-correct.
When allocating blocks for an indirect block mapped file, accumulate
blocks to be zero'ed and then call ext2fs_zero_blocks2() to zero them
in large chunks instead of block by block.
This significantly speeds up mkfs.ext3 since we don't send a large
number of ZERO_RANGE requests to the kernel, and while the kernel does
batch write requests, it is not batching ZERO_RANGE requests. It's
more efficient to batch in userspace in any case, since it avoids
unnecessary system calls.
Reported-by: Mario Schuknecht <mario.schuknecht@dresearch-fe.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Biggers [Wed, 1 Apr 2020 20:32:37 +0000 (13:32 -0700)]
tune2fs: prevent stable_inodes feature from being cleared
Similar to encrypt and verity, once the stable_inodes feature has been
enabled there may be files anywhere on the filesystem that require this
feature. Therefore, in general it's unsafe to allow clearing it. Don't
allow tune2fs to do so. Like encrypt and verity, it can still be
cleared with debugfs if someone really knows what they're doing.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Biggers [Wed, 1 Apr 2020 20:32:36 +0000 (13:32 -0700)]
tune2fs: prevent changing UUID of fs with stable_inodes feature
The stable_inodes feature is intended to indicate that it's safe to use
IV_INO_LBLK_64 encryption policies, where the encryption depends on the
inode numbers and thus filesystem shrinking is not allowed. However
since inode numbers are not unique across filesystems, the encryption
also depends on the filesystem UUID, and I missed that there is a
supported way to change the filesystem UUID (tune2fs -U).
So, make 'tune2fs -U' report an error if stable_inodes is set.
We could add a separate stable_uuid feature flag, but it seems unlikely
it would be useful enough on its own to warrant another flag.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Mon, 30 Mar 2020 09:09:32 +0000 (11:09 +0200)]
ext2fs: fix off-by-one in dx_grow_tree()
There is an off-by-one error in dx_grow_tree() when checking whether we
can add another level to the tree. Thus we can grow tree too much
leading to possible crashes in the library or corrupted filesystem. Fix
the bug.
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Mon, 30 Mar 2020 09:09:31 +0000 (11:09 +0200)]
ext2fs: fix error checking in dx_link()
dx_lookup() uses errcode_t return values. As such anything non-zero is
an error, not values less than zero. Fix the error checking to avoid
crashes on corrupted filesystems.
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
David Anderson [Fri, 14 Feb 2020 20:44:48 +0000 (12:44 -0800)]
AOSP: e2fsdroid: Don't skip unusable blocks in BaseFS.
Currently, basefs_allocator will iterate through blocks owned by an
inode until it finds a block that is free. This effectively ignores the
logical to physical block mapping, which can lead to a bigger delta in
the final image.
An example of how this can happen is if the BaseFS has a deduplicated
block (D), that is not deduplicated in the new image:
Old image: 1 2 3 D 4 5
New image: 1 2 3 ? 4 5
If the allocator sees that "D" is not usable, and skips to block "4",
we will have a non-ideal assignment.
Bad image: 1 2 3 4 5 ?
This patch refactors get_next_block() to acquire at most one block. It's
called a single time, and then only called in a loop if absolutely no
blocks can be acquired from anywhere else.
In a Virtual A/B simulation, this reduces the COW snapshot size by about
90MB.
David Anderson [Fri, 14 Feb 2020 03:20:32 +0000 (19:20 -0800)]
AOSP: e2fsdroid: Fix logical block sequencing in BaseFS.
By iterating over blocks to write BaseFS, holes in the extent tree are
skipped. This is problematic because the purpose of BaseFS is to
preserve the logical to physical block assignment between builds. By not
preserving the location of holes, the assignment can be incorrect.
For example, consider the following block list for a file:
1 2 3 0 4 5
If this is recorded as:
1 2 3 4 5
If the first block changes to a hole, the intended mapping will not be
preserved at all:
0 1 2 0 3
This patch makes two changes to e2fsdroid to fix this. The first change
is that holes are now recorded in BaseFS, by iterating over the extent
tree rather than the block list, and inserting zeroes where appropriate.
The second change is that the block allocator now recognizes when blocks
have been skipped (either to deduplication or to holes), and skips the
same number of logical blocks in BaseFS as well.
In a Virtual A/B simulation, this reduces the COW snapshot size by
approximately 100MB.
David Anderson [Wed, 29 Jan 2020 23:31:14 +0000 (15:31 -0800)]
AOSP: e2fsdroid: Properly free the dedup block map.
When BaseFS specifies the same block for two files, it gets added to a
separate "dedup" bitmap, and removed from the free block bitmap. If the
new build does not use every block in this bitmap, there will be an
inconsistency: the block bitmap marks blocks as in-use when they are
actually free. Although this doesn't matter for AOSP's read-only file
systems, it does cause e2fsck to complain, which breaks the build.
Fix the inconsistency by properly freeing all unused blocks within the
dedup block set.
Elliott Hughes [Thu, 23 Jan 2020 23:44:10 +0000 (15:44 -0800)]
AOSP: Add -e2fsprogs to the e2fsprogs chattr and lsattr.
We want to start shipping the toybox chattr and lsattr on the device all
the time, so the build system rightly complains that then we'd have two
modules with the same name.
I went with a suffix rather than a prefix so that tab completion works
for folks still wanting to use the e2fsprogs versions.
Kousik Kumar [Fri, 10 Jan 2020 00:15:30 +0000 (16:15 -0800)]
AOSP: Change #define to _BLKID_TYPES_H
blkid_types.h and ext_types.h having the exact same content results in
mismatches in remote RBE builds. Given blkid_types.h is actually
supposed to be different, changing this to remove the mismatch.
Test: Ran a build, and all e2fsprogs mismatches went away between
local/remote.
Theodore Ts'o [Fri, 20 Mar 2020 19:24:18 +0000 (15:24 -0400)]
libext2fs: fix the {set_get}_bitmap_range functions when bitmap->start > 7
The bitmap array's set/get bitmap_range functions were not subtracting
out bitmap->start. This doesn't matter for normal file systems, since
the bitmap->start is zero or one, and the passed-in starting range is
a multiple of eight, and the starting range is then divided by 8.
But with a non-standard/fuzzed file system, bitmap->start could be
significantly larger, and this could then lead to a array out of
bounds memory reference.
Jan Kara [Thu, 13 Feb 2020 10:15:56 +0000 (11:15 +0100)]
e2fsck: clarify overflow link count error message
When directory link count is set to overflow value (1) but during pass 4
we find out the exact link count would fit, we either silently fix this
(which is not great because e2fsck then reports the fs was modified but
output doesn't indicate why in any way), or we report that link count is
wrong and ask whether we should fix it (in case -n option was
specified). The second case is even more misleading because it suggests
non-trivial fs corruption which then gets silently fixed on the next
run. Similarly to how we fix up other non-problems, just create a new
error message for the case directory link count is not overflown anymore
and always report it to clarify what is going on.
Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 4ebce13292f54c96f43dcb1bd1d5b8df5dc8749d)