Eric Sandeen [Tue, 27 Nov 2012 18:35:48 +0000 (12:35 -0600)]
tune2fs: respect quota config option
If we haven't turned --enable-quota on at config time,
I don't think tune2fs should know about the feature either.
Today we can actually tune2fs -O quota even if not
configured on, and then the rest of the tools will
refuse to touch it:
# tune2fs -O quota /dev/sda1
# tune2fs -O ^quota /dev/whatever complains
tune2fs 1.42.3 (14-May-2012)
tune2fs: Filesystem has unsupported read-only feature(s) while trying to open /dev/sda1
# fsck /dev/sda1
fsck from util-linux 2.21.2
e2fsck 1.42.3 (14-May-2012)
/dev/sda1 has unsupported feature(s): quota
e2fsck: Get a newer version of e2fsck!
Ok, so turn it off?
# tune2fs -O ^quota /dev/whatever complains
tune2fs 1.42.3 (14-May-2012)
tune2fs: Filesystem has unsupported read-only feature(s) while trying to open /dev/sda1
Nope. Debugfs? Nope.
# debugfs -w /dev/sda1
debugfs 1.42.3 (14-May-2012)
/dev/sda1: Filesystem has unsupported read-only feature(s) while opening filesystem
Theodore Ts'o [Sun, 25 Nov 2012 00:17:44 +0000 (19:17 -0500)]
e2fsck: optimize pass 5 for CPU utilization
Add a fast path optimization in e2fsck's pass 5 for the common case
where the block bitmap is correct. The optimization works by
extracting each block group's block allocation bitmap into a memory
buffer, and comparing it with the expected allocation bitmap using
memcmp(). If it matches, then we can just update the free block
counts and be on our way, and skip checking each bit individually.
Theodore Ts'o [Sat, 24 Nov 2012 20:40:17 +0000 (15:40 -0500)]
e2fsck: optimize pass1 for CPU time
Optimize e2fsck pass 1 by marking entire extents as being in use at a
time, instead of block by block. This optimization only works for
non-bigalloc file systems for now (it's tricky to handle bigalloc file
systems since this code is also responsible for dealing with blocks
that are not correctly aligned within a cluster). When the
optimization works, the CPU savings can be significant: ove a full CPU
minute for a mostly full 4T disk.
Restructure the ext2fs_get_device_size() and blkid_get_dev_size()
code to localize the variables used for different device probing
methods. This at least reduces the #ifdef mess to only one part
of the code for each method, and avoids "unused variable" compiler
warnings added when variables are declared without being #ifdef'd.
Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Sun, 14 Oct 2012 10:34:09 +0000 (06:34 -0400)]
debugfs: teach the htree and ls commands to show directory checksums
In addition, make the directory interator more robust in the case
where the file system has the metadata checksum feature enabled, but
the directory checksum is not present in a directory block.
Theodore Ts'o [Wed, 10 Oct 2012 02:45:40 +0000 (22:45 -0400)]
e2fsck: only consult inode_dir_map if needed in pass4
In e2fsck_pass4(), we were consulting inode_dir_map using
ext2fs_test_inode_bitmap2() for every single inode in the file system.
However, there were many cases where we never needed the result of the
test --- most notably if the inode is not in use.
I was a bit surprised that GCC 4.7 with CFLAGS set to "-g -O2" wasn't
able to optimize this out for us, but here is the pass 4 timing for an
empty 3T file system before this patch:
Theodore Ts'o [Sat, 6 Oct 2012 01:59:40 +0000 (21:59 -0400)]
libext2fs: further optimize rb_test_bit
Profiling shows that rb_test_bit() is now calling ext2fs_rb_next() a
lot, and this function is now the hot spot when running e2freefrag.
If we cache the results of ext2fs_rb_next(), we can eliminate those
extra calls, which further speeds up both e2freefrag and e2fsck by
reducing the amount of CPU time spent in userspace.
Theodore Ts'o [Sat, 6 Oct 2012 00:57:49 +0000 (20:57 -0400)]
libext2fs: remove pointless indirection in rbtree bitmaps
The code was previously allocating a single 4 or 8 byte pointer for
the rcursor and wcursor fields in the ext2fs_rb_private structure;
this added two extra memory allocations (which could fail), and extra
indirections, for no good reason. Removing the extra indirection also
makes the code more readable, so it's all upside and no downside.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Theodore Ts'o [Fri, 5 Oct 2012 03:38:55 +0000 (23:38 -0400)]
libext2fs: optimize rb_test_bit
Optimize testing for a bit in an rbtree-based bitmap for the case
where the calling application is scanning through the bitmap
sequentially. Previously, we did this for a set of bits which were
inside an allocated extent, but we did not optimize the case where
there was a large number of bits after an allocated extents which were
not in use.
In my tests of a roughly half-filled file system, the run time of
e2freefrag was halved, and the cpu time spent in userspace was during
e2fsck's pass 5 was reduced by a factor of 30%.
Theodore Ts'o [Fri, 5 Oct 2012 03:30:23 +0000 (23:30 -0400)]
e2freefrag: use 64-bit rbtree bitmaps
Enable the use of 64-bit bitmaps, so e2freefrag will work on file
systems with the 64-bit feature enabled. In addition, enable the
rbtree-based bitmaps, which significantly saves the amount of memory
required (from 97 megs to 1.7 megs for an empty 3T file system) at the
cost of additional CPU overhead (but we will claw back some of the
additional CPU overhead in the next commit).
mke2fs: prohibit file system features not supported by the HURD
There are certain file system features which can not be supported by
the HURD, since they use fields in the inode which have been claimed
by HURD-specific features (i.e., such as the author field). We will
mask out those features so they are not enabled by accident, but if the
user tries to explicitly specify them we will issue an error message.
mke2fs: throttle allocating groups progress as well
Throttle updates for the "Allocating Groups" progress updates to once
a second as well. We now do this throttling in libext2fs, so we don't
have to do this for each of mke2fs's progress updates, and because the
updates from ext2fs_allocate_tables() come from within libext2fs
anyway.
Andreas Dilger [Mon, 10 Sep 2012 09:04:47 +0000 (09:04 +0000)]
tests: kill debugfs on interrupted MMP test
If the f_mmp test is interrupted during its test run, then it can
leave debugfs busy-looping in the background. Since f_mmp is a
relatively long-running test, and is likely to be running during
a parallel test run, this can happen fairly often.
Set a signal trap for the f_mmp test script being killed, so that
the background debugfs command will always be killed by the test.
Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
mke2fs: disable progress reporting in mke2fs.conf for regression tests
Add a configuration knob so the regression tests can disable progress
reporting. This fixes a potential lack of predictability since the
progress reports are now time based (once a second) which is
problematic for regression tests which are comparing the expected
output of mke2fs.
mke2fs: throttle progress updates to once a second
With lazy itable initialization, the progress updates for writing the
inode table happens so quickly that on a serial console, the time to
write the progress updates can be the bottleneck. Fix this by only
updating the progress indicator once a second.
resize2fs: grow uninit_bg file systems more efficiently
If the uninit_bg feature is enabled and the kernel supports
lazy_itable_init, skip zeroing the inode table so that the resize
operation can go much more quickly. Also set the itable_unused fields
so that the first e2fsck after the resize will run faster.
resize2fs: enforce restrictions if the kernel doesn't do meta_bg resizing
Enhance the online resizing code to be more nuanced about resizing
restrictions. If the kernel supports meta_bg resizing, then we can
skip all of the restrictions. If the kernel does not support meta_bg
resizing, check more carefully to make sure there are enough reserved
gdt blocks, so that the user gets a clearer error message.
Theodore Ts'o [Fri, 31 Aug 2012 19:31:50 +0000 (15:31 -0400)]
resize2fs: allow meta_bg/64-bit file systems to be online resized
Resize2fs can't handle resizing flex_bg file systems that do not have
the resize inode, but when the kernel adds support for resizing using
the meta_bg layout, we should allow it be able to resize the file
system.
So move the flex_bg/resize_inode check to the just before we start
doing the off-line resize, instead of doing it earlier where it would
prohibit these file systems for both on-line and off-line resizes.
resize2fs: fix overhead calculation for meta_bg file systems
The file system overhead calculation in calculate_minimum_resize_size
was incorrect meta_bg file systems. This caused the minimum size to
underflow for very large file systems, which threw resize2fs into a
loop generally lasted longer than the user's patience.
resize2fs: enforce the 16TB limit on 32-bit file systems correctly
The 16TB limit must be enforced regardless of whether the new size is
specified on the command line or implied by the size of the device,
but only if the file system does not support 64-bit block sizes, or
the kernel does not advertise support of meta_bg resizing.
Previously we were unconditionally enforcing it when it was implied by
the device size, but not if the new size was specified on the command
line.
ext2fs.h: move ext2fs_init_csum_seed() outside of EXT2_CUSTOM_MEMORY_ROUTINES
The function ext2fs_init_csum_seed() has nothing to do with the
ext2fs_get_mem()/ext2fs_get_memzero()/ext2fs_get_array()/ext2fs_get_arrayzero()
functions. (This define is there so that on platforms where we need
to use the standard C functions, they can be replaced --- this is
primarily needed when trying to compile libext2fs for strange,
non-quite-standards-compliant platforms, such as Windows.)
Allow e2fsprogs to be built using the clang (LLVM) frontend
Since clang uses C99 semantics by default, the main changes required
to allow clang to build e2fsprogs was to add support the C99 inline
semantics, while still allowing us to be built when the legacy (but
still default for gcc) GNU C89 inline semantics are in force.
Akira Fujita [Thu, 30 Aug 2012 11:16:01 +0000 (20:16 +0900)]
mke2fs: recalculate the reserved blocks when the last BG is dropped
mke2fs -m option can set reserved blocks ratio up to 50%. But if the
last block group is not big enough to support the necessary data
structures, it gets dropped, we have to recalculate the number of
reserved blocks so that the reserved blocks matches the requested
percentage.
It also avoids a problem where if the user specifies a reserved blocks
of 50%, and after the last partial block group was dropped, if the
number of reserved blocks is greater than 50%, e2fsck will complain.
Steps to reproduce:
1. Create a FS which has the overhead for the last BG
and specify 50 % for reserved blocks ratio
# mke2fs -m 50 -t ext4 DEV 1025M
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
656640 inodes, 2621440 blocks 1310848 blocks (50.00%) reserved for the super user
~~~~~~~ <-- Reserved blocks exceed 50% of FS blocks count!
2. e2fsck outputs filesystem corruption
# e2fsck DEV
e2fsck 1.42.5 (29-Jul-2012)
Corruption found in superblock. (r_blocks_count = 1310848).
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 32768 <device>
The configure option --enable-relative-symlinks was incorrectly
specified in configure.in, as --enable-symlink-relative-symlinks. Fix
the configure script so that --enable-relative-symlinks works, as well
as previous incorrect command line option. We will keep the older,
incorrect --enable-symlink-relative-symlinks for at least two years
before removing it.
Andreas Dilger [Tue, 14 Aug 2012 15:33:24 +0000 (11:33 -0400)]
tests: remove unused temporary files for MMP tests
The MMP tests need to be run on a real disk instead of tmpfs, since
the MMP block access is using O_DIRECT. As such, they create their
own test files in the local testing directory instead of using the
temporary file created in /tmp by the test_one script. Delete the
tempfs file before clobbering TMPFILE, otherwise it will leave the
unused file in /tmp after the test is completed.
Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Tue, 7 Aug 2012 17:46:13 +0000 (13:46 -0400)]
libext2fs: refactor the quota feature flag in the supported flags mask
Handle EXT4_FEATURE_RO_COMPAT_QUOTA the same way we handle INCOMPAT
features, so we don't have to have two definitions for
EXT2_LIB_FEATURE_RO_COMPAT_SUPP depending on whether or not
CONFIG_QUOTA is enabled or not.
Theodore Ts'o [Wed, 15 Aug 2012 17:00:14 +0000 (13:00 -0400)]
ext4: fix rehashing of the lost+found directory
Commit 07307114dea didn't correctly handle the lost+found directory
when it added support for metadata checksums. First of all,
e2fsck_get_lost_and_found() assumed that the inode_dir_map bitmap was
initialized, and it wasn't when it was called earlier by a change in
that commit. Secondly, it's important that lost+found dirctory is
processed in case its directory checksums are incorrect, but should
preserve any empty dirctory blocks so there space available for e2fsck
to reconnect any orphan inodes.
Fix these problems, to fix test failures: f_holedir2 and f_rehash_dir
Theodore Ts'o [Tue, 7 Aug 2012 17:46:13 +0000 (13:46 -0400)]
libext2fs: refactor the quota feature flag in the supported flags mask
Handle EXT4_FEATURE_RO_COMPAT_QUOTA the same way we handle INCOMPAT
features, so we don't have to have two definitions for
EXT2_LIB_FEATURE_RO_COMPAT_SUPP depending on whether or not
CONFIG_QUOTA is enabled or not.
Jim Keniston [Mon, 6 Aug 2012 22:46:03 +0000 (18:46 -0400)]
e2fsck: fix potential segv when handling a read error in a superblock
When passed a negative count (indicating a byte count rather than
a block count) e2fsck_handle_read_error() treats the data as a full
block, causing unix_write_blk64() (which can handle negative counts
just fine) to try to write too much. Given a faulty block device,
this resulted in a SEGV when unix_write_blk64() read past the bottom
of the stack copying the data to cache. (check_backup_super_block ->
unix_read_blk64 -> raw_read_blk -> e2fsck_handle_read_error)
Reported-by: Alex Friedman <alexfr@il.ibm.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Dan Streetman <ddstreet@us.ibm.com> Reviewed-by: Mingming Cao <mcao@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Sat, 4 Aug 2012 20:56:55 +0000 (16:56 -0400)]
Put ELF_OTHER_LIBS in the right place for the linker
Commit a7c17431b9 attempted to fix a problem where the system
libraries might get used instead of local libraries for things like
-lcom_err. It tried to accomplish this by moving $(ELF_OTHER_LIBS) to
before $(LDFLAGS).
Unfortunately, this was the wrong fix; $(ELF_OTHER_LIBS) *MUST* be
after the object files, or the linker might not pull in the necessary
library and not include it into the DT_NEEDED section of the shared
library. The proper fix is to add a -L$(LIB) before $(LDFLAGS), and
then remove the -L option from all of the ELF_OTHER_LIBS definitions
in the library Makefiles.
Theodore Ts'o [Fri, 3 Aug 2012 00:47:46 +0000 (20:47 -0400)]
libext2fs: when checking the inode's checksum, allow an all-zero inode
When the kernel writes an inode where all of the other inodes in in
the inode table (itable) block are unused, it skips reading the itable
block from disk, and instead uses an all zeros block. This can cause
e2fsck to complain when it iterates over the inodes using
ext2fs_get_next_inode() since the inode apparently has an invalid
checksum. Normally the inode won't be returned at all if it is at the
end of the block group's part of the inode table, thanks to the
bg_itable_unused field. But it's possible for this situation to
happen earlier in the inode table block.
Fix this by changing ext2fs_inode_csum_verify() to allow the inode to
be all zero's; if the checksum fails, and the inode is all zero's,
treat it as a valid checksum.
Reported-by: Tao Ma <boyu.tm@taobao.com> Reported-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Darrick J. Wong [Fri, 3 Aug 2012 00:47:46 +0000 (20:47 -0400)]
mke2fs: enable metadata_csum on ext4dev filesystems
Enable full-power metadata checksumming by default on 'ext4dev'
filesystems. This should be fairly safe for now, since only
developers should be using this new feature.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Fri, 3 Aug 2012 00:47:46 +0000 (20:47 -0400)]
libext2fs: optimize the CRC32c implementation
The crc32c implementation in the kernel has been refactored a bit to
reduce the amount of code that needs to be maintained, and to speed up
tune2fs/e2fsck on PowerPC by 5-10%. Port the crc32c changes over, and
provide a crc32_be so that we can remove the duplicate functionality
from e2fsck. Also drop crc32c_be and crc32_le since neither got used.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Fri, 3 Aug 2012 00:47:45 +0000 (20:47 -0400)]
mke2fs: warn if not enabling all the features that metadata_csum wants
The metadata_csum feature works best when two features are enabled.
These features are "extents" (because the block map has no space for
checksums) and "64bit" (this enables storage of full 32-bit checksums
in certain fields). Print a warning if the user tries to create a
filesystem without those features.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Fri, 3 Aug 2012 00:47:45 +0000 (20:47 -0400)]
mke2fs: write new group descriptors with the appropriate checksum
Update mke2fs to use the helper function to determine if group
descriptors should have checksums calculated. Since metadata_csum
supersedes uninit_bg, quietly drop uninit_bg if metadata_csum is set,
so that older kernels don't get confused.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Fri, 3 Aug 2012 00:47:45 +0000 (20:47 -0400)]
e2fsck: ensure block group checksum uses
Use the helper function to determine if group descriptors have a
checksum. Ensure that metadata_csum and uninit_bg flags are not set
simultaneously, as part of pass 0.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Darrick J. Wong [Fri, 3 Aug 2012 00:47:45 +0000 (20:47 -0400)]
libext2fs: block group checksum should use metadata_csum algorithm
Change the block group algorithm to use the same algorithm as the rest
of the metadata_csum. This mostly involves providing a helper
function to tell if group descriptors should have checksums set or
verified, and modifying the gdt checksum code to use the correct
algorithm.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Fri, 3 Aug 2012 00:47:44 +0000 (20:47 -0400)]
libext2fs: calculate and verify superblock checksums
Calculate and verify the superblock checksums. Each copy of the
superblock records the number of the group it's in and the FS UUID, so
we can simply checksum the whole block.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Fri, 3 Aug 2012 00:47:35 +0000 (20:47 -0400)]
tune2fs: rebuild and checksum directories when necessary
Since all the metadata checksums depend on the fs UUID, tune2fs must
be able to rewrite the checksums of _all_ metadata. It's not that
hard to add in the bits to resize the directory block structures at
the same time.
[ Merged in fix from Zheng Liu where ctx.errcode wasn't getting
cleared in rewrite_directory(). ]
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Thu, 2 Aug 2012 21:27:43 +0000 (17:27 -0400)]
e2fsck: check directory leaf block checksums
Checks that directory leaf blocks have the necessary fake dir_entry at
the end of the block to hold a checksum and that the checksum is
valid. It will resize the block and/or rebuild the directory if
necessary.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Thu, 2 Aug 2012 21:27:30 +0000 (17:27 -0400)]
e2fsck: verify htree root/node checksums
Check htree internal node checksums. If broken, ask user to clear
the htree index and recreate it later.
[ Move the check for not rehashing the lost+found directory to pass1
so that we don't end up truncating lost+found when the metadata
checksum feature is enabled. -- TYT ]
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Darrick J. Wong [Mon, 30 Jul 2012 23:18:04 +0000 (19:18 -0400)]
e2fsck: verify extent tree blocks and clear the bad ones
When we encounter an extent tree block that passes the header check
but fails the checksum, offer to clear just that extent block instead
of failing the whole tree, which results in the entire inode being
wiped out.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>