This patch is a major update to how we decide where to put extended
attributes. The main motivation is to enable creating values in
extended attribute inodes. While doing this, we want to implement a
behavior that is as close to kernel as possible.
Existing set ea code deviates from kernel behavior which makes it harder
to implement ea_inode feature:
- kernel only sorts ea entries in xattr block, e2fsprogs implementation
sorts all entries on every update.
- e2fsprogs implementation shuffled things on every update so the order
of updates does not matter. Kernel does not reshuffle things.
- e2fsprogs could evacuate entries from inode body to xattr block and
vice versa. This behavior does not exist in kernel.
Such differences could lead to inconsistent behavior between fuse2fs and
a kernel mount.
With ea_inode feature, we also need to decide whether to put a value
in an inode or keep it 'inline'. In kernel implementation this
depends on current placement of entries.
To close the behavioral gap, ext2fs_xattr_set() now takes over the
decision about where to place ea values. This also allows it to raise
errors early instead of delaying them to a separate
ext2fs_xattrs_write() call later.
libext2fs: eliminate empty element holes in ext2_xattr_handle->attrs
When an extended attribute is removed, its array element is emptied.
This creates holes in the array so various places that want to walk
filled elements have to do an empty element check.
Have remove operation shift remaining filled elements to the left.
This allows a simple iteration up to ext2_xattr_handle->count to walk
all filled entries, and so empty element checks become unnecessary.
libext2fs: rename ext2_xattr_handle->length to capacity
ext2_xattr_handle has two fields 'count' and 'length' which
represent number of filled elements vs total element count.
They have close meanings so are easy to confuse, thus make code less
readable. Rename length to capacity.
Tahsin Erdogan [Fri, 30 Jun 2017 01:31:59 +0000 (18:31 -0700)]
Use i_size to determine whether a symlink is a fast symlink
Current way of determining whether a symlink is in fast symlink
format is to call ext2fs_inode_data_blocks2(). If number of data
blocks is zero and EXT4_INLINE_DATA_FL flag is not set, then symlink
data must be in inode->i_block.
This heuristic is becoming increasingly hard to maintain because
inode->i_blocks count can also be incremented for blocks used by
extended attributes. Before ea_inode feature, extra block could come
from xattr block, now more blocks can be added because of xattr
inodes.
To address the issue, add a ext2fs_is_fast_symlink() function that
gives a direct answer based on inode->i_size field. This is
equivalent to kernel's ext4_inode_is_fast_symlink() function.
This patch also fixes a few issues related to fast symlink handling:
- Both rdump_symlink() and follow_link() interpreted symlinks with
0 data blocks to always mean fast symlinks. This is incorrect
because symlinks that are stored as inline data also have
0 data blocks. Thus, they try to read everything from
inode->i_block and miss the symlink suffix in inode extra area.
- e2fsck_pass1_check_symlink() had code to handle inode with
EXT4_INLINE_DATA_FL flag twice. The first if block always returns
from the function so the second one is unreachable code.
In some cases, resize2fs needs to move inodes because their inode
number is greater than the maximum allowed. Moving extended attribute
inodes would require updating all the references to them. This
is currently not supported.
ext2fs_xattr_set() currently does not support creating xattr inodes,
so allowing fuse2fs to mount a filesystem with ea_inode feature could
lead to corruption. Refuse to mount if the ea_inode feature is set.
tune2fs: update ea_inode hashes when fs uuid changes
Extended attribute inodes maintain a crc32c hash that is used for
deduplication. The crc seed derives from uuid so ea_inode hashes
must be updated when uuid changes.
The ea_inode hash is also incorporated into the xattr entry e_hash
so the entries that reference the inode also must be updated.
When check_inode_extra_space() detects a problem with the value of
i_extra_isize, it adjusts it and then returns without further validation
of contents in the inode body. Change this so that it will proceed to
check inline extended attributes.
In original implementation of ea_inode feature, each xattr inode had
a single parent. Child inode tracked the parent by storing its inode
number in i_mtime field. Also, child's i_generation matched parent's
i_generation.
With deduplication support, xattr inodes can now be shared so a single
backpointer is not sufficient to achieve strong binding. This is now
replaced by hash validation.
Disabling ea_inode feature would require inlining all the existing
xattr values that are currently stored in external inodes. This is
not always possible. Just disallow it.
Andreas Dilger [Wed, 5 Jul 2017 03:53:59 +0000 (23:53 -0400)]
e2fsck: add support for large xattrs in external inodes
Add support for the INCOMPAT_EA_INODE feature, which stores large
extended attributes into an external inode instead of data blocks.
The inode is referenced by the e_value_inum field (formerly the
unused e_value_block field) from the extent header, and stores the
xattr data starting at byte offset 0 in the inode data block.
The xattr inode stores the referring inode number in its i_mtime,
and the parent i_generation in its own i_generation, so that there
is a solid linkage between the two that e2fsck can verify. The
xattr inode is itself marked with EXT4_EA_INODE_FL as well.
Signed-off-by: Kalpak Shah <kalpak.shah@sun.com> Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Compiling with -fsanitize=undefined -fsanitize=address causes some
warnings of C code that has undefined behavior according to the C
standard bugs. None of the warnings should cause e2fsprogs
malfunction given a sane compiler running on architectures that Linux
can support. Still, it's better to clean up to code than not.
To fix up a complaint of a negative shift in hash function, update the
very dated hash we had been using for the revoke table with the
current generic hash used by the kernel.
Marc Thomas [Mon, 26 Jun 2017 15:39:47 +0000 (16:39 +0100)]
filefrag: fix GCC7.x compiler warning
../../misc/filefrag.c:591:33: warning: comparison between pointer and
zero character constant [-Wpointer-compare]
for (cpp = argv + optind; *cpp != '\0'; cpp++) {
Signed-off-by: Marc Thomas <marc@dragonfly.plus.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Mon, 19 Jun 2017 22:39:55 +0000 (18:39 -0400)]
mke2fs: fix hugefile creation so the hugefile(s) are contiguous
Commit 4f868703f6f2: "libext2fs: use fallocate for creating journals
and hugefiles" introduced a regression for the mke2fs hugefile
feature. The problem is that the fallocate library function
intersperses the extent tree metadata blocks with the data blocks, and
this violates the hugefile guarantee that the created files should be
physically contiguous on disk.
Unfortuantely the m_hugefile regression test was flawed, and didn't
pick up the regression.
This commit fixes the regression test so that it detects the problem
before fixing mke2fs, and also fixes the mke2fs hugefile by reverting
the mke2fs hugefile portion of commit 4f868703f6f2.
Jan Kara [Wed, 7 Jun 2017 13:31:14 +0000 (15:31 +0200)]
libext2fs: fix fsync(2) detection
For some reason lib/config.h.in was missing a definition of HAVE_FSYNC
and as a result lib/config.h never had HAVE_FSYNC defined. As a result
we never called fsync(2) for example from
lib/ext2fs/unix_io.c:unix_flush() when we finished creating filesystem
and could miss IO errors happening during creating of the filesystem.
Test generic/405 exposes this problem.
Fix the problem by defining HAVE_FSYNC in lib/config.h.in.
Fixes: f47f31958578 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Sun, 4 Jun 2017 22:37:31 +0000 (18:37 -0400)]
libext2fs: correctly write up the backup superblocks in big endian systems
This bug has been around since we added support for metadata
checksums, but it was unmasked by commit bf9f3b6d5b ("e2fsck: exit
with exit status 0 if no errors were fixed"). The backup superblocks
are not supposed to have the EXT2_VALID_FS or the NEEDS_RECOVERY bits
set, and earlier 1.43.x versions of e2fsprogs were byte swapping the
shadow superblock each time it was written, so that every other backup
superblock was incorrectly byte swapped.
Fortunately the primary backup superblock was correctly written
(modulo having the VALID_FS bit set when it should not have been set)
so for the most part no one noticed. And very few architectures use
big endian byte ordering these days. (Even IBM has seen the light
with the ppcle architecture. :-)
Fortunately commit bf9f3b6d5b caused f_desc_size_bad and
f_resize_inode to fail on a big endian system, which allowed me to
notice the issue and investigate.
Darrick J. Wong [Mon, 15 May 2017 18:37:11 +0000 (11:37 -0700)]
e2freefrag: use GETFSMAP on mounted filesystems
Use GETFSMAP to query mounted filesystems for free space information.
This prevents us from reporting stale free space stats if there happen
to be uncheckpointed block bitmap updates sitting in the journal.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Wang shilong [Tue, 30 May 2017 00:36:51 +0000 (20:36 -0400)]
tune2fs: fix BUGs of tuning project quota
There are several problems for project quota enable/disable:
tune2fs -O ^project did not work, because @clear_ok_features
did not include @EXT4_FEATURE_RO_COMPAT_PROJECT.
update_feature_set() works for -O option, but tune2fs -Q prj/^prj
did not work well, because function handle_quota_options()
did not set and clear @EXT4_FEATURE_RO_COMPAT_PROJECT feature very well.
one warning message is removed, because with project feature
enabled, quota feature will be enabled automatically.
Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck: don't flush to device opened in read-only mode
If the e2fsck is called with both the -f and -n options, it will
complete with an exit status of 8 due to an error when trying to flush
the io_channel (which was opened read-only) when built on on Cygwin on
Windows 8.1 and Windows 10. Apparently Cygwin is unhappy when fsync
is called on a file descriptor opened read-only.
Theodore Ts'o [Thu, 25 May 2017 17:11:40 +0000 (13:11 -0400)]
tests: fix expected output for f_detect_junk
The expect files for f_detect_junk had gotten out of sync with the
code base, and since this test is optional (it depends on libmagic
being installed), we hadn't noticed.
Darrick J. Wong [Thu, 25 May 2017 01:56:36 +0000 (21:56 -0400)]
e2fsck: fix sparse bmap to extent conversion
When e2fsck is trying to convert a sparse block-mapped file to an extent
file, we incorrectly merge block mappings that are physically contiguous
but not logically contiguous because of insufficient checking with the
extent we're constructing. Therefore, compare the logical offsets for
contiguity as well.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Biggers [Thu, 25 May 2017 01:48:41 +0000 (21:48 -0400)]
libext2fs: correctly subtract xattr blocks on bigalloc filesystems
ext2fs_inode_data_blocks2() calculates an inode's data block count by
subtracting the external xattr block, if any, from the total blocks.
But on bigalloc filesystems, the xattr "block" is actually a whole
cluster, so ext2fs_inode_data_blocks2() would return a too-large value.
It seems this could have caused several different problems, but the one
I encountered was that xfstest generic/399 failed in the "bigalloc"
config because e2fsck incorrectly considered a symlink on the filesystem
to be corrupted at the end of the test. This happened because e2fsck
incorrectly calculated a nonzero data block count for a "fast" symlink
with an external xattr block and therefore treated it as a "slow"
symlink, which failed validation.
Fix this by updating ext2fs_inode_data_blocks2() to subtract the cluster
size rather than the block size.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Whitney [Thu, 25 May 2017 01:34:20 +0000 (21:34 -0400)]
e2fsck: fix multiply-claimed block quota accounting when deleting files
As e2fsck processes each file in pass1, the actual file system quota is
increased by the number of blocks discovered in the file. This can
include both non-multiply-claimed and multiply-claimed blocks, if the
latter exist. However, if a file containing multiply-claimed blocks
is then deleted in pass1b, those blocks are not taken into account when
decreasing the actual quota. In this case, the new quota values written
to the file system by e2fsck overstate the space actually consumed.
And, e2fsck must be run twice on the file system to fully correct
quota.
Fix this by counting multiply-claimed blocks as a debit to quota when
deleting files in pass1b.
Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Alex Deymo [Thu, 26 Jan 2017 01:47:50 +0000 (17:47 -0800)]
AOSP: Add "libc" to soong static_executable targets.
When building a static exectuable for "arm", the libgcc is automatically
included by the build system *after* libc, but libgcc has some symbol
dependencies on "libc", like for example the "raise" symbol.
libgcc, libatomic and libcompiler_rt-extras are passed in a group
(enclosed by --start-group and --end-group) so they all are included
regardless of the order inside that group. Nevertheless libc only
appears outside this group and before them, so the undefined references
from libgcc are not resolved.
This patch adds "libc" as a explicit static_libs dependency to
static_executable targets forcing it to be included in the group.
Alex Deymo [Thu, 12 Jan 2017 17:48:04 +0000 (09:48 -0800)]
AOSP: Convert e2fsprogs targets to soong.
This patch also removes all the "-host" and "_static" suffix from all
the libraries adding "unique_host_soname: true". This prevents
confusions with the host installed libraries.
A new "libext2_misc" library is introduced to export some files from
the misc/ directory to other binaries in this project.
Nick Kralevich [Wed, 18 Jan 2017 23:17:42 +0000 (15:17 -0800)]
AOSP: HACK: android: exit(1) if selabel_lookup fails
If selabel_lookup fails, the current implementation of set_selinux_xattr
returns -1, but the command line tool e2fsdroid reports success.
There's a bunch of things wrong:
1) -1 does not appear to be a legal errcode_t value. The appropriate
return value appears to be DIRENT_ABORT.
2) A return value of DIRENT_ABORT is ignored by the upper layers of the
code.
3) Attempting to fix the upper layers of the code to not ignore
DIRENT_ABORT results in complaints about not being able to create
/lost+found.
I'm honestly not sure how to fix this, so just throw an exit(1) in
there, to make sure the program dies a horrible death if
selabel_lookup() fails. This is much better than the alternative of
e2fsdroid returning success with an improperly labeled file.
Bug: 34358308
Test: Artifically modify selabel_lookup() to return a failure, and
verify Android doesn't compile.
Test: Verify Android compiles under normal circumstances.
Change-Id: I60e04bc6559a66d3f3202f2c28e2519856385ded
From AOSP commit: 87a7db7cf2ca0feecaccad94bf22f92c726000c3
Jin Qian [Mon, 19 Dec 2016 18:53:20 +0000 (10:53 -0800)]
AOSP: libext2fs: merge contiguous data blocks when writing to sparse file
Sparse IO manager allocates one block at a time. This creates many
blocks in sparse file even though most of them are contiguous. As a
result, fastboot is extremely slow writing that many blocks. Merging
contiguous blocks reduces block count and flash time significantly.
Add an option to read a base_fs file and allocate the blocks according
to the mapping provided by the file.
Test: 1/ Create a normal image and an incremental one.
Compare the number of blocks that have changed.
2/ Create an image.
Create an incremantal image.
The basefs file and the block_list file are the same.
AOSP: e2fsdroid: read and enforce android's permissions
Set the permissions and the extended attributes as defined by fs_config
and selinux.
Test: create an image with make_ext4 and with mke2fs + e2fsdroid
Compare the output of:
for f in `find . | sort`; do
xattr -l "$f"; md5sum "$f" ls -lah "$f"
done
Mike Frysinger [Fri, 19 May 2017 17:25:59 +0000 (13:25 -0400)]
include sys/sysmacros.h as needed
The minor/major/makedev macros are not entirely standard. glibc has had
the definitions in sys/sysmacros.h since the start, and wants to move away
from always defining them implicitly via sys/types.h (as this pollutes the
namespace in violation of POSIX). Other C libraries have already dropped
them. Since the configure script already checks for this header, use that
to pull in the header in files that use these macros.
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Biggers [Mon, 8 May 2017 22:47:57 +0000 (15:47 -0700)]
misc: fix 'zero_hugefiles = false' regression
When mk_hugefiles() was switched to use ext2fs_fallocate(), it was
accidentally changed to ignore the 'zero_hugefiles = false' setting,
which should cause hugefiles to be allocated without initializing their
contents. Fix this by only passing EXT2_FALLOCATE_ZERO_BLOCKS to
ext2fs_fallocate() when zero_hugefiles is true.
Google-Bug-Id: 38037607 Fixes: 4f868703f6f2 ("libext2fs: use fallocate for creating journals and hugefiles") Cc: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
scan_extent_node() did cluster alignment check for every block in an
extent. This is unnecessary and significantly slows down the runtime
when hugefile is used with bigalloc.
tests: skip running long test with "make check" and add "make fullcheck"
Don't run tests which take longer than 20 seconds to run (especially
f_large_dir, whose run time is well over ten minutes) for "make
check". The new "make fullcheck" will run all of the regression tests
for e2fsprogs.
Fix portability problems in the test script for f_large_dir. Also
clean up messages that the script prints while it runs (and
unfortunately it takes a very long time to run).
e2fsck: update quota when optimizing the extent tree
If quota is enabled, optimizing the extent tree wouldn't update the
in-memory quota statistics, so that a subsequent e2fsck run would show
that the quota usage statistics on disk were incorrect.
Test is added that recreate directory (-fD fsck option)
with 47.5k of 255-symbol name files. This amount of files
can not be stored only in 2 hevel htree, so 3 levels are used.
The INCOMPAT_LARGEDIR feature allows larger directories to
be created, both with directory sizes over 2GB and and a
maximum htree depth of 3 instead of the current limit of 2.
These features are needed in order to exceed the currently
limit of approximately 10M entries in a single directory
for 4KB blocksize (~100k for 1KB).
debugfs, e2fsck, ext2fs, mke2fs and tune2fs support is
added.
Joe Richey [Mon, 3 Apr 2017 16:52:50 +0000 (16:52 +0000)]
e4crypt: fix error handling for KEYCTL_GET_KEYRING_ID
Due to some interesting behaviour in keyctl (as described in the
comments), we use KEYCTL_GET_KEYRING_ID to translate the special value
of KEY_SPEC_SESSION_KEYRING to a real keyring id. However, how we
currently do this is flawed in two ways.
First, if KEYCTL_GET_KEYRING_ID fails, we don't detect it as it returns
-1 and zero is used for an error value in get_keyring_id. Second, if the
user specifies "-k @s" the translation never runs and the undesireable
behavior occurs.
These are both fixed by doing the translation outside of get_keyring_id.
Signed-off-by: Joe Richey <joerichey@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Whitney [Sun, 2 Apr 2017 16:57:05 +0000 (12:57 -0400)]
e2fsck: fix quota accounting to use cluster units
The quota accounting code in e2fsck's pass 1 and pass 3 uses block units
rather than cluster units when recording the allocated space consumed by
files and directories. In pass 1, this causes a large undercount of
actual quota results and test failures for xfstests generic/383, /384,
/385, and /386 on bigalloc file systems. In pass 3, this results in
quota accounting errors when the lost+found directory is either extended
or recreated on a bigalloc file system.
Use clusters rather than blocks when accounting for allocated space,
and correct a related header comment in the quota code.
Note that pass 1b also contains call sites for quota_data_sub() that
also need to be addressed. However, it appears that more than just
unit conversion may be needed, so that will be left to a future patch.
Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Biggers [Fri, 17 Mar 2017 22:38:36 +0000 (15:38 -0700)]
libext2fs: apply LDFLAGS when building tst_inline_data
If libext2fs was compiled with an external libblkid pointed to by
LDFLAGS, then linking the tst_inline_data program failed because the
requested linker flags were not used. Fix this by adding $(ALL_LDFLAGS)
to the build rule, as is done for the other test programs.
Fixes: 31253488385a ("libext2fs: add a unit test for inline data") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Whitney [Fri, 31 Mar 2017 23:21:59 +0000 (19:21 -0400)]
e2fsck: fix type mismatches in quota warning message
The conversion operations in the format control strings found in the
fprintf call used to print the quota warning message do not match the
types of their corresponding arguments. Although this probably hasn't
generally been a problem, it obfuscates a bigalloc quota accounting bug
where the reported actual quota goes negative.
Clean up the mismatches and some unnecessary casts. While we're at it,
fix a spelling nit in a related comment.
Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Thu, 2 Mar 2017 04:52:12 +0000 (20:52 -0800)]
misc: fix all the compiler warnings
Fix the various compiler warnings that have crept in, and only define
__bitwise if the system headers haven't already done so. Linux 4.10
changes the __bitwise definition so that our redefinition here is
just different enough that gcc complains.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>