Jiyong Park [Tue, 5 Jan 2021 05:43:46 +0000 (14:43 +0900)]
AOSP: Add assemble_cvd to com.android.virt
assemble_cvd directly or indirectly depends on these modules. To add
assemble_cvd to the com.anroid.virt APEX, these modules are marked as
being available to the APEX.
Justin Yun [Wed, 11 Nov 2020 07:15:33 +0000 (16:15 +0900)]
AOSP: Add "product_available" to product available modules
"vendor_available" modules were available to product modules.
However, not all "vendor_available" modules are required to be
available to product modules. Some modules want to be available only
to product modules but not vendor modules.
To cover the requirement, we separate "product_available" from
"vendor_available".
"vendor_available" will not provide product available module.
Adds support for EXT2_HASH_SIPHASH, and reading the hash from disk in
that case. We cannot compute the siphash without the key, so we must
not modify the names of any encrypted and casefolded directories,
which limits some recovery options, and we must assume the hashes
stored in dirents are correct.
This is in preparation for upcoming kernel support for encryption and
casefolding at the same time.
Google-Bug-Id: 138322712
Test: Create fs with casefold and encryption enabled via mke2fs and
tune2fs, run fsck after creating casefolded + encrypted folder
Daniel Rosenberg [Thu, 11 Jun 2020 04:22:05 +0000 (21:22 -0700)]
AOSP: ANDROID: mke2fs: Support encrypt+casefold
In preparation for upcoming kernel changes that will make the kernel
support both encryption and casefolding at the same time, allow mke2fs
to enable both features at the same time.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Google-Bug-Id: 138322712
Test: Create fs with casefold and encryption enabled via mke2fs
Change-Id: I4e2350e43e21cffb3d972310cd74df1e662bf87e
From AOSP commit: f8fc427df385260f3424e1e9d5485c8640606920
Daniel Rosenberg [Thu, 11 Jun 2020 04:21:42 +0000 (21:21 -0700)]
AOSP: ANDROID: tune2fs: Support encrypt+casefold
In preparation for upcoming kernel changes that will make the kernel
support both encryption and casefolding at the same time, allow tune2fs
to enable both features at the same time.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Google-Bug-Id: 138322712
Test: Create fs with casefold and encryption enabled via tune2fs
Daniel Rosenberg [Fri, 12 Jun 2020 11:11:38 +0000 (04:11 -0700)]
AOSP: ANDROID: e2fsck: Do not mutate encrypted names
We can't mutate a name without the key, as this will at best cause the
name to become gibberish, and at worst may introduce invalid characters
or even fail to be unique after decoding, so drop duplicates instead.
Files lost in this way will be reconnected to lost+found
Fixes: dbff534ec685 ("e2fsck: suppress bad name checks for encrypted directories") Signed-off-by: Daniel Rosenberg <drosen@google.com>
Google-Bug-Id: 138322712
Test: f_dup_de_crypt
Change-Id: I8d6cc3984872868a845fafabc554abdd86351fcc
From AOSP commit: 80b85f8a0b2ba7090a927f692ff9d2097ffd8d1f
Daniel Rosenberg [Thu, 11 Jun 2020 04:18:44 +0000 (21:18 -0700)]
AOSP: ANDROID: tune2fs: Allow setting the casefold feature
This allows tune2fs to enable casefolding on an existing filesystem.
At the moment, casefolding is incompatible with encryption.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Google-Bug-Id: 138322712
Test: Create fs without casefold and enable it via tune2fs
Change-Id: Ic9ed63180ef28c36e083cee85ade432e4bfcc654
From AOSP commit: eb5b168decac07058e90ead191350be80c75aff4
Theodore Ts'o [Wed, 27 Jan 2021 21:41:05 +0000 (16:41 -0500)]
e2fsck: declare the size of bh->b_data to be 4096 in jfs_user.h
When allocating buffer_heads in e2fsck and debugfs the actual size of
the memory which is requested is based on the file system block size.
So the actual size of b_data in struct buffer_head doesn't actually
matter, except that it can triggers a UBSAN error when running the
e2fsck regression test. So change it to be 4096 to avoid this false
positive.
Theodore Ts'o [Sat, 23 Jan 2021 05:57:18 +0000 (00:57 -0500)]
Fix clang warnings
Clang gets unhappy when passing an unsigned char to string functions.
For better or for worse we use __u8[] in the definition of the
superblock. So cast them these to "char *" to prevent clang
build-time warnings.
Theodore Ts'o [Fri, 22 Jan 2021 04:27:00 +0000 (23:27 -0500)]
libext2fs: fix UBSAN warning in ext2fs_mmp_new_seq()
Left shifting the pid by 16 bits can cause a UBSAN warning if the pid
is greater than or equal to 2**16. It doesn't matter since we're just
using the pid to seed for a pseudo-random number generator, but
silence the warning by just swapping the high and low 16 bits of the
pid instead.
Hauke Mehrtens [Thu, 21 Jan 2021 23:11:30 +0000 (18:11 -0500)]
build: Add SYSLIBS to e4crypt linking
The $(SYSLIBS) was missing when linking the e4crypt application. This is
available in the e4crypt.profiled variant, so I assume this was just
missing in the normal variant and is not left out intentionally.
Lukas Czerner [Mon, 2 Nov 2020 14:26:31 +0000 (15:26 +0100)]
mke2fs: Escape double quotes when parsing mke2fs.conf
Currently, when constructing the <default> configuration pseudo-file using
the profile-to-c.awk script we will just pass the double quotes as they
appear in the mke2fs.conf.
This is problematic, because the resulting default_profile.c will either
fail to compile because of syntax error, or leave the resulting
configuration invalid.
It can be reproduced by adding the following line somewhere into
mke2fs.conf configuration and forcing mke2fs to use the <default>
configuration by specifying nonexistent mke2fs.conf
Romain Naour [Mon, 2 Nov 2020 13:03:19 +0000 (14:03 +0100)]
libext2fs: add gnu.translator support
The support of setting (and reading) of passive translators from
GNU/Linux has been added to the Linux kernel by the commit [1].
The name index '10' has been reserved for GNU/Hurd.
Hurd passive translators are stored as a xattr value with name
"gnu.translator" [2].
If "gnu.translator" xattr value has been set before calling
mkfs.ext2, it will segfault since "gnu." is not present in
ea_names[].
Luis Henriques [Wed, 28 Oct 2020 15:55:50 +0000 (15:55 +0000)]
filefrag: handle invalid st_dev and blksize cases
It is possible to crash filefrag with a "Floating point exception" in
two different scenarios:
1. When fstat() returns a device ID set to 0
2. When FIGETBSZ ioctl returns a blocksize of 0
In both scenarios a divide-by-zero will occur in frag_report() because
variable blksize will be set to zero.
I've managed to trigger this crash with an old CephFS kernel client,
using xfstest generic/519. The first scenario has been fixed by kernel
commit 75c9627efb72 ("ceph: map snapid to anonymous bdev ID"). The
second scenario is also fixed with commit 8f97d1e99149 ("vfs: fix
FIGETBSZ ioctl on an overlayfs file").
However, it is desirable to handle these two scenarios gracefully by
checking these conditions explicitly.
Signed-off-by: Luis Henriques <lhenriques@suse.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Antoine Tenart [Fri, 17 Jul 2020 10:08:46 +0000 (12:08 +0200)]
create_inode: set xattrs to the root directory as well
populate_fs do copy the xattrs for all files and directories, but the
root directory is skipped and as a result its extended attributes aren't
set. This is an issue when using mkfs to build a full system image that
can be used with SElinux in enforcing mode without making any runtime
fix at first boot.
This patch adds logic to set the root directory's extended attributes.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Tue, 6 Oct 2020 12:29:09 +0000 (08:29 -0400)]
debugfs: fix parse_uint for 64-bit fields
The logic for handling 64-bit structure elements was reversed, which
caused attempts to set fields like kbytes_written to fail:
% debugfs -w /tmp/foo.img
debugfs 1.45.6 (20-Mar-2020)
debugfs: set_super_value kbytes_written 1024
64-bit field kbytes_written has a second 64-bit field
defined; BUG?!?
Theodore Ts'o [Mon, 5 Oct 2020 03:05:01 +0000 (23:05 -0400)]
Define MKDIR_P in the Makefile.in files instead in MCONFIG.in
In the case where mkdir -p is not thread-safe (for example, if the
build environment is using busybox's mkdir) the configure script will
fall back to the slow (but safe) install-sh script. In that case
MKDIR_P will be using a relative pathname; so we can't use speed
optimization of defining configure substitutions in MCONFIG.in, since
the substitution will be different depending on depth of the
subdirectory in the Makefile.in file.
Theodore Ts'o [Fri, 2 Oct 2020 18:47:25 +0000 (14:47 -0400)]
resize2fs: prevent block group descriptors from overflowing the first bg
For 1k block file systems, resizing a file system larger than 1073610752 blocks will result in the size of the block group
descriptors to be so large that it will overlap with the backup
superblock in block group #1. This problem can be reproduced via:
Since commit [382ed4a1 e2fsck: use proper types for variables][1]
applied, it used ext2_ino_t instead of ino_t for referencing inode
numbers, but the type of is_hardlink's `ino' should not be instead,
The ext2_ino_t is 32bit, if inode > 0xFFFFFFFF, its value will be
truncated.
Add a debug printf to show the value of inode, when it check for hardlink
files, it will always return false if inode > 0xFFFFFFFF
|--- a/misc/create_inode.c
|+++ b/misc/create_inode.c
|@@ -605,6 +605,7 @@ static int is_hardlink(struct hdlinks_s *hdlinks, dev_t dev, ext2_ino_t ino)
| {
| int i;
|
|+ printf("%s %d, %lX, %lX\n", __FUNCTION__, __LINE__, hdlinks->hdl[i].src_ino, ino);
| for (i = 0; i < hdlinks->count; i++) {
| if (hdlinks->hdl[i].src_dev == dev &&
| hdlinks->hdl[i].src_ino == ino)
Jan Kara [Thu, 9 Jul 2020 14:40:57 +0000 (16:40 +0200)]
mke2fs: Warn if fs block size is incompatible with DAX
If we are creating filesystem on DAX capable device, warn if set block
size is incompatible with DAX to give admin some hint why DAX might not
be available.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e4crypt: if salt is explicitly provided to add_key, then use it
Providing -S and a path to 'add_key' previously exhibited an
unintuitive behavior: instead of using the salt explicitly provided by
the user, e4crypt would use the salt obtained via
EXT4_IOC_GET_ENCRYPTION_PWSALT on the path. This was because
set_policy() was still called with NULL as salt.
With this change we now remember the explicitly provided salt (if any)
and use it as argument for set_policy().
Eventually
e4crypt add_key -S s:my-spicy-salt /foo
will now actually use 'my-spicy-salt' and not something else as salt
for the policy set on /foo.
Andreas Dilger [Wed, 17 Jun 2020 11:40:49 +0000 (05:40 -0600)]
tune2fs: reset MMP state on error exit
If tune2fs cannot perform the requested change, ensure that the MMP
block is reset to the unused state before exiting. Otherwise, the
filesystem will be left with mmp_seq = EXT4_MMP_SEQ_FSCK set, which
prevents it from being mounted afterward:
EXT4-fs warning (device dm-9): ext4_multi_mount_protect:311:
fsck is running on the filesystem
Add a test to try some failed tune2fs operations and verify that the
MMP block is left in a clean state afterward.
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13672 Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Lukas Czerner [Fri, 5 Jun 2020 08:14:40 +0000 (10:14 +0200)]
e2fsck: use size_t instead of int in string_copy()
len argument in string_copy() is int, but it is used with malloc(),
strlen(), strncpy() and some callers use sizeof() to pass value in. So
it really ought to be size_t rather than int. Fix it.
Theodore Ts'o [Wed, 26 Aug 2020 20:29:29 +0000 (16:29 -0400)]
libext2fs: fix potential buffer overrun in __get_dirent_tail()
If the file system is corrupted, there is a potential of a read-only
buffer overrun. Fortunately, we don't actually use the result of that
pointer dereference, and the overrun is at most 64k.
Google-Bug-Id: #158564737 Fixes: eb88b751745b ("libext2fs: make ext2fs_dirent_has_tail() more strict") Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Thu, 18 Jun 2020 01:43:37 +0000 (21:43 -0400)]
debugfs: fix building rdebugfs (with READ_ONLY define)
Fix bitrot for building a restricted version of debugfs, which does
not require read/write access to the file system, and which only
allows access to the file system metadata.
Theodore Ts'o [Mon, 18 May 2020 03:05:11 +0000 (23:05 -0400)]
libext2fs: retry reading superblock on open when checksum is bad
When opening a file system which is mounted, it's possible that when
ext2fs_open2() is racing with the kernel modifying the orphaned inode
list, the superblock's checksum could be incorrect. So retry reading
the superblock in the hopes that the problem will self-correct.
When allocating blocks for an indirect block mapped file, accumulate
blocks to be zero'ed and then call ext2fs_zero_blocks2() to zero them
in large chunks instead of block by block.
This significantly speeds up mkfs.ext3 since we don't send a large
number of ZERO_RANGE requests to the kernel, and while the kernel does
batch write requests, it is not batching ZERO_RANGE requests. It's
more efficient to batch in userspace in any case, since it avoids
unnecessary system calls.
Reported-by: Mario Schuknecht <mario.schuknecht@dresearch-fe.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
David Anderson [Fri, 14 Feb 2020 20:44:48 +0000 (12:44 -0800)]
AOSP: e2fsdroid: Don't skip unusable blocks in BaseFS.
Currently, basefs_allocator will iterate through blocks owned by an
inode until it finds a block that is free. This effectively ignores the
logical to physical block mapping, which can lead to a bigger delta in
the final image.
An example of how this can happen is if the BaseFS has a deduplicated
block (D), that is not deduplicated in the new image:
Old image: 1 2 3 D 4 5
New image: 1 2 3 ? 4 5
If the allocator sees that "D" is not usable, and skips to block "4",
we will have a non-ideal assignment.
Bad image: 1 2 3 4 5 ?
This patch refactors get_next_block() to acquire at most one block. It's
called a single time, and then only called in a loop if absolutely no
blocks can be acquired from anywhere else.
In a Virtual A/B simulation, this reduces the COW snapshot size by about
90MB.
David Anderson [Fri, 14 Feb 2020 03:20:32 +0000 (19:20 -0800)]
AOSP: e2fsdroid: Fix logical block sequencing in BaseFS.
By iterating over blocks to write BaseFS, holes in the extent tree are
skipped. This is problematic because the purpose of BaseFS is to
preserve the logical to physical block assignment between builds. By not
preserving the location of holes, the assignment can be incorrect.
For example, consider the following block list for a file:
1 2 3 0 4 5
If this is recorded as:
1 2 3 4 5
If the first block changes to a hole, the intended mapping will not be
preserved at all:
0 1 2 0 3
This patch makes two changes to e2fsdroid to fix this. The first change
is that holes are now recorded in BaseFS, by iterating over the extent
tree rather than the block list, and inserting zeroes where appropriate.
The second change is that the block allocator now recognizes when blocks
have been skipped (either to deduplication or to holes), and skips the
same number of logical blocks in BaseFS as well.
In a Virtual A/B simulation, this reduces the COW snapshot size by
approximately 100MB.
David Anderson [Wed, 29 Jan 2020 23:31:14 +0000 (15:31 -0800)]
AOSP: e2fsdroid: Properly free the dedup block map.
When BaseFS specifies the same block for two files, it gets added to a
separate "dedup" bitmap, and removed from the free block bitmap. If the
new build does not use every block in this bitmap, there will be an
inconsistency: the block bitmap marks blocks as in-use when they are
actually free. Although this doesn't matter for AOSP's read-only file
systems, it does cause e2fsck to complain, which breaks the build.
Fix the inconsistency by properly freeing all unused blocks within the
dedup block set.
Elliott Hughes [Thu, 23 Jan 2020 23:44:10 +0000 (15:44 -0800)]
AOSP: Add -e2fsprogs to the e2fsprogs chattr and lsattr.
We want to start shipping the toybox chattr and lsattr on the device all
the time, so the build system rightly complains that then we'd have two
modules with the same name.
I went with a suffix rather than a prefix so that tab completion works
for folks still wanting to use the e2fsprogs versions.
Kousik Kumar [Fri, 10 Jan 2020 00:15:30 +0000 (16:15 -0800)]
AOSP: Change #define to _BLKID_TYPES_H
blkid_types.h and ext_types.h having the exact same content results in
mismatches in remote RBE builds. Given blkid_types.h is actually
supposed to be different, changing this to remove the mismatch.
Test: Ran a build, and all e2fsprogs mismatches went away between
local/remote.
Theodore Ts'o [Fri, 20 Mar 2020 19:24:18 +0000 (15:24 -0400)]
libext2fs: fix the {set_get}_bitmap_range functions when bitmap->start > 7
The bitmap array's set/get bitmap_range functions were not subtracting
out bitmap->start. This doesn't matter for normal file systems, since
the bitmap->start is zero or one, and the passed-in starting range is
a multiple of eight, and the starting range is then divided by 8.
But with a non-standard/fuzzed file system, bitmap->start could be
significantly larger, and this could then lead to a array out of
bounds memory reference.
Jan Kara [Thu, 13 Feb 2020 10:15:56 +0000 (11:15 +0100)]
e2fsck: clarify overflow link count error message
When directory link count is set to overflow value (1) but during pass 4
we find out the exact link count would fit, we either silently fix this
(which is not great because e2fsck then reports the fs was modified but
output doesn't indicate why in any way), or we report that link count is
wrong and ask whether we should fix it (in case -n option was
specified). The second case is even more misleading because it suggests
non-trivial fs corruption which then gets silently fixed on the next
run. Similarly to how we fix up other non-problems, just create a new
error message for the case directory link count is not overflown anymore
and always report it to clarify what is going on.
Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
(cherry picked from commit 4ebce13292f54c96f43dcb1bd1d5b8df5dc8749d)
Theodore Ts'o [Sun, 15 Mar 2020 04:56:01 +0000 (00:56 -0400)]
debian: drop libattr1-dev from the build dependencies list
The libattr has stopped providing attr/xattr.h; we now use
sys/xattr.h. So there is no longer any reason to require that the
libattr1-dev package be present when building e2fsprogs, so drop it.
Theodore Ts'o [Sun, 15 Mar 2020 03:24:39 +0000 (23:24 -0400)]
libext2fs: make ext2fs_dirent_has_tail() more strict
Previously ext2fs_dirent_has_tail() would return true if the directory
was corrupted. If the directory is corrupted, then by definition it
doesn't have a valid checksum tail.
(This fixes a big-endian failure on the master branch.)
Lukas Czerner [Tue, 3 Mar 2020 13:53:48 +0000 (14:53 +0100)]
libext2fs: check open(O_EXCL) first in ismounted.c
Currently the ext2fs_check_mount_point() will use the open(O_EXCL) check
on linux after all the other checks are done. However it is not
necessary to check mntent if open(O_EXCL) succeeds because it means that
the device is not mounted.
Moreover the commit ea4d53b7 introduced a regression where a following
set of commands fails:
Theodore Ts'o [Sat, 7 Mar 2020 17:35:48 +0000 (12:35 -0500)]
mke2fs: fix permissions setting with "mke2fs -d /path/files"
Set the directory for directories in cases where the owner permissions
is not rwx. This was reported[1] by Robert Yang but we are using a
different approach to fixing the issue.
Also set the permissions in a more portable way by making a
distinction between the host OS's permissions stats and Linux's
permissions. We still assume the low 12 bits are the historical Unix
assignments, but we don't assume ST_IFMT bits are the same as Linux's.
Reported-by: Robert Yang <liezhi.yang@windriver.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Fri, 21 Feb 2020 21:40:56 +0000 (14:40 -0700)]
libext2fs: don't use O_DIRECT for files on tmpfs
If a filesystem image is on tmpfs, opening it with O_DIRECT for
reading the MMP will fail. This is unnecessary, since the image
file can't really be open on another node at this point. If the
open with O_DIRECT fails, retry without it when plausible.
Remove the special-casing of tmpfs from the mmp test cases.
Change-Id: I41f4b31657b06f62f10be8d6e524d303dd36a321 Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Wed, 12 Feb 2020 01:07:21 +0000 (18:07 -0700)]
e2fsck: avoid overflow with very large dirs
In alloc_size_dir() it multiples signed ints when allocating the
buffer for rehashing an htree-indexed directory. This will overflow
when the directory size is above 4GB, which is possible with largedir
directories having about 100M entries, assuming an average 3/4 leaf
fullness and 24-byte filenames, or fewer with longer filenames.
The same problem exisgs in get_next_block().
Similarly, the out_dir struct used a signed int for the number of
blocks in the directory, which may result in a negative size if the
directory is over 2GB (about 50M entries or fewer).
Use appropriate unsigned variables for block counts, and use larger
types for calculating the byte count for memory offsets/sizes.
Such large directories not been seen yet, but are not too far away.
The ext2fs_get_array() function will properly calculate the needed
memory allocation, and detect overflow on 32-bit systems.
Add ext2fs_resize_array() to do the same for array resize.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Fri, 7 Feb 2020 01:09:46 +0000 (18:09 -0700)]
misc: handle very large files with filefrag
Avoid overflowing the column-width calc printing files over 4B blocks.
Document the [KMG] suffixes for the "-b <blocksize>" option.
The blocksize is limited to at most 1GiB blocksize to avoid shifting
all extents down to zero GB in size. Even the use of 1GB blocksize
is unlikely, but non-ext4 filesystems may use multi-GB extents.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Fri, 7 Feb 2020 01:09:45 +0000 (18:09 -0700)]
e2fsck: consistently use ext2fs_get_mem()
Consistently use ext2fs_get_mem() and ext2fs_free_mem() instead of
calling malloc() and free() directly in e2fsck. In several places
it is possible to use ext2fs_get_memzero() instead of explicitly
calling memset() on the memory afterward.
This is just a code cleanup, and does not fix any specific bugs.
[ Fix up library dependencies in e2fsck/Makefile.in to fix "make
check" breakages. -- TYT ]
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Fri, 7 Feb 2020 01:09:44 +0000 (18:09 -0700)]
e2fsck: fix overflow if more than 4B inodes
Even though we don't have support for filesystems with over 4B inodes
in the current e2fsprogs, this may happen in the future. There are
latent overflow bugs when calculating the number of inodes in the
filesystem that can trivially be fixed now, rather than waiting for
them to be hit at some point in the future. The block number calcs
are already correct in this code.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Fri, 7 Feb 2020 01:09:41 +0000 (18:09 -0700)]
e2fsck: reduce memory usage for many directories
Pack struct dx_dir_info and dx_dirblock_info properly in memory, to
avoid holes, and fields are not larger than necessary. This reduces
the memory needed for each hashed dir, according to pahole(1) from:
Andreas Dilger [Fri, 7 Feb 2020 01:09:40 +0000 (18:09 -0700)]
e2fsck: avoid mallinfo() if over 2GB allocated
Don't use mallinfo() for determining the amount of memory used if it
is over 2GB. Otherwise, the signed ints used by this interface can
can overflow and return garbage values. This makes the actual amount
of memory used by e2fsck misleading and hard to determine.
Instead, use brk() to get the total amount of memory allocated, and print
this if the more detailed mallinfo() information is not suitable for use.
There does not appear to be a mallinfo64() variant of this function.
There does appear to be an abomination named malloc_info() that writes
XML-formatted malloc stats to a FILE stream that would need to be read
and parsed in order to get these stats, but that doesn't seem worthwhile.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Shilong Wang <wshilong@ddn.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Fri, 7 Feb 2020 01:09:38 +0000 (18:09 -0700)]
e2fsck: fix e2fsck_allocate_memory() overflow
e2fsck_allocate_memory() takes an "unsigned int size" argument, which
will overflow for allocations above 4GB. This happens for dir_info
and dx_dir_info arrays when there are more than 350M directories in a
filesystem, and for the dblist array above 180M directories.
There is also a risk of overflow during the binary search in both
e2fsck_get_dir_info() and e2fsck_get_dx_dir_info() when the midpoint
of the array is calculated, if there would be more than 2B directories
in the filesystem and working above the half way point.
Also, in some places inode numbers are "int" instead of "ext2_ino_t",
which can also cause problems with the array size calculations, and
makes it hard to identify where inode numbers are used.
Fix e2fsck_allocate_memory() to take an "unsigned long" argument to
match ext2fs_get_mem(), so that it can do single memory allocations
over 4GB.
Fix e2fsck_get_dir_info() and e2fsck_get_dx_dir_info() to temporarily
use an unsigned long long value to calculate the midpoint (which will
always fit into an ext2_ino_t again afterward).
Change variables that hold inode numbers to be ext2_ino_t, and print
them as unsigned values instead of printing negative inode numbers.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Shilong Wang <wshilong@ddn.com>
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-13197 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Lukas Czerner [Mon, 10 Feb 2020 15:24:59 +0000 (16:24 +0100)]
tst_libext2fs: Avoid multiple definition of global variables
gcc version 10 changed the default from -fcommon to -fno-common and as a
result e2fsprogs make check tests fail because tst_libext2fs.c end up
with a build error.
This is because it defines two global variables debug_prog_name and
extra_cmds that are already defined in debugfs/debugfs.c. With -fcommon
linker was able to resolve those into the same object, however with
-fno-common it's no longer able to do it and we end up with multiple
definition errors.
Fix the problem by using SKIP_GLOBDEFS macro to skip the variables
definition in debugfs.c. Note that debug_prog_name is also defined in
lib/ext2fs/extent.c when DEBUG macro is used, but this does not work even
with older gcc versions and is never used regardless so I am not going to
bother with it.
Jeremy Visser [Mon, 3 Feb 2020 02:37:41 +0000 (13:37 +1100)]
chattr.1: improve attribute formatting with labels and indented paragraphs
By convention, lists of options in man pages use a label followed by an
indented description, such as this example from the Options section:
-R Recursively change attributes of directories and
their contents.
But the Attributes section places the available attributes mid-sentence,
which makes it visually more difficult to parse:
A file with the 'a' attribute set can only be opened
in append mode for writing. [...]
When a file with the 'A' attribute set is accessed, its
atime record is not modified. [...]
This patch places a label beside each attribute description, which (in
my opinion) improves readability, especially when visually skimming the
list. For example:
a A file with the 'a' attribute set can only be
opened in append mode for writing.
A When a file with the 'A' attribute set is accessed,
its atime record is not modified.
Signed-off-by: Jeremy Visser <jeremyvisser@google.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the bad block list has been reset in the middle of an inode scan,
it's possible for bb->list[scan->bad_blocks_ptr] to result in an
out-of-bounds read access.
This is highly unlikely to happen under normal circumstances; in
particular, we generally don't use bad block inodes any more. In
addition, this would only happen if the bad block inode itself is
corrupt so e2fsck needs to wipe it out. This might cause e2fsck to
crash, but it will more likely cause a part of the inode table to be
wrongly considered invalid, causing file system to be incorrectly
fixed.
This was reported by TALOS as TALOS-2020-0974 and CVE-2020-6057, but
after closer examination, we don't believe this can be used in any way
to exploit the system or release information about the system, since
all this can do is to cause part of the inode table to be skipped when
it shouldn't be, and this can't be leveraged since any information
about the ASLR of the process is obsolete once e2fsck exits.
Andreas Dilger [Tue, 14 Jan 2020 21:42:18 +0000 (14:42 -0700)]
mmp: abstract out repeated 'sizeof(buf), buf' usage
The printf("%.*s") format requires both the buffer size and buffer
pointer to be specified for each use. Since this is repeatedly given
as "(int)sizeof(buf), (char *)buf" for mmp_nodename and mmp_bdevname
fields, with typecasts to avoid compiler warnings.
Add a helper macro EXT2_LEN_STR() to avoid repeated boilerplate code.
This can also be used for other superblock buffer fields that may not
have NUL-terminated strings (e.g. s_volume_name, s_last_mounted,
s_{first,last}_error_func, s_mount_opts) to simplify code and avoid
the need for temporary buffers for NUL-termination.
Annotate the superblock string fields that may not be NUL-terminated.
Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Tue, 14 Jan 2020 21:42:17 +0000 (14:42 -0700)]
mmp: don't assume NUL termination for MMP strings
Don't assume that mmp_nodename and mmp_bdevname are NUL terminated,
since very long node/device names may completely fill the buffers.
Limit string printing to the maximum buffer size for safety, and
change the field definitions to __u8 to make it more clear that
they are not NUL-terminated strings, as is done with other strings
in the superblock that do not have NUL termination.
Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Thu, 16 Jan 2020 20:35:29 +0000 (15:35 -0500)]
libext2fs: reserve the error code EXT2_ET_NO_GDESC
This is really only needed in the 1.46+ where the EXT2_FLAG_SUPER_ONLY
is honored by ext2fs_open to only read the superblock, so that
fs->group_desc can be NULL. We define it in the maint branch so that
we can be sure the error tables are kept in sync (in the unlikely case
that a new error code needs to be assigned in the maint branch).