Darrick J. Wong [Sat, 13 Sep 2014 22:12:39 +0000 (15:12 -0700)]
misc: add plausibility checks to debugfs/tune2fs/dumpe2fs/e2fsck
If any of these utilities detect a bad superblock magic, call
check_plausibility to see if blkid can identify the passed-in argument
as something else (xfs, partition, etc.) in the hopes of catching a
user error.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Fri, 19 Sep 2014 17:10:21 +0000 (13:10 -0400)]
misc: move check_plausibility into a separate file
Move check_plausibility() into a separate file so that various
programs can use it without having to declare useless global variables
that the util.c functions seem to require.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Fri, 19 Sep 2014 16:15:21 +0000 (12:15 -0400)]
ext2fs: add readahead method to improve scanning
Add a readahead method for prefetching ranges of disk blocks. This is
useful for inode table scanning, and other large contiguous ranges of
blocks, and may also prove useful for random block prefetch, since it
will allow reordering of the IO without waiting synchronously for the
reads to complete.
It is currently using the posix_fadvise(POSIX_FADV_WILLNEED)
interface, as this proved most efficient during our testing.
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck: use ext2fs_get_mem() instad of ext2fs_get_memalign()
There is no reason to request a aligned buffer in
check_{inode,block}_bitmap, and this will cause failures for dietlibc,
which doesn't have support for posix_memalign() or any other way to
request an aligned memory allocation. Fortunately, this is only
needed in very few places where direct I/O is required.
The asm_types.h file needs to include stdio.h and stdlib.h in order to
get integer types included. So add those includes into jfs_user.h to
avoid a build faliure under dietlibc.
The create_inode.h header file is pulled in by debugfs, which is not
internationalized. It had no business pulling in nls-enable.h; that
header file should only be used in specific .c files that support
internationalization.
Darrick J. Wong [Fri, 19 Sep 2014 01:46:10 +0000 (21:46 -0400)]
e2fsck: free bh when descriptor block checksum fails
Free the buffer head if the journal descriptor block fails checksum
verification. This has been patched before (see "e2fsck: free bh on
csum verify error in do_one_pass") but apparently the patch was never
committed to jbd2 in the kernel, so when we resync'd the recovery code
with 3.16, the bug came back. Sigh.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Darrick J. Wong [Fri, 19 Sep 2014 01:29:21 +0000 (21:29 -0400)]
e2fsck: fix sliding the directory block down on bigalloc
If we find a hole in a directory on a bigalloc filesystem, we need to
obey the cluster alignment rules when collapsing the gap to avoid
later complaints.
Specifically, the calculation of the new logical cluster number was
incorrect, and we need to ensure that the logical cluster alignment
respects the physical cluster alignment, since we've concluded that
the extent's logical block number is wrong.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Fri, 19 Sep 2014 01:29:19 +0000 (21:29 -0400)]
e2fsck: offer to clear overlapping extents
If in the course of iterating extents we find that an otherwise
valid-seeming second extent maps the same logical blocks as a
previously examined first extent, offer to clear the duplicate
mapping.
The test for this is already in f_extents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Fri, 19 Sep 2014 01:24:26 +0000 (21:24 -0400)]
misc: zero s_jnl_blocks when adding journal online or removing external journal
Erase s_jnl_blocks when removing an external journal, or adding an
internal journal online. We can't add the backup for the internal
journal because we have no good way to get the indirect block or ETB
addresses, so the best we can do is hope that the user runs e2fsck,
which will correct that. We are motivated to erase during external
journal removal to state emphatically that there's no journal.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: thomas_reardon@hotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
lib/ext2fs: fix Makefile to avoid a build splat when building without VPATH
When building in the source tree, the order of the includes caused the
compiling of debugfs/journal.c while in the lib/ext2fs directory to
find the version in lib/ext2fs instead of the desired version in
e2fsck/jfs_user.h.
We need to eventually get rid of this whole mess and have only one
jfs_user.h and build the journal-related functions once in an internal
library which is used only by e2fsprogs progams.
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: "Darrick J. Wong" <darrick.wong@oracle.com>
Darrick J. Wong [Thu, 11 Sep 2014 20:17:44 +0000 (13:17 -0700)]
libext2fs: check ea value offset when loading
When reading extended attributes, check e_value_offs to make sure that
it starts in the value area and not the name area. The attached test
case image will crash the kernel if it is mounted and you append more
than 4096 bytes of data to /a, due to insufficient validation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Thu, 11 Sep 2014 19:44:49 +0000 (12:44 -0700)]
e2fsck: ignore badblocks if it says badblocks inode is bad
If the badblocks list says that the badblocks inode is bad, it's quite
likely that badblocks is broken. Worse yet, if the root inode is in
the same block as the badblocks inode (likely since they're adjacent),
the filesystem becomes unfixable because pass3 notices the bad root
inode and exits.
So... if we encounter this case, just kill the badblocks inode.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Enhance disable_uninit_bg() to return error codes -- if something goes
wrong, we want to flag the FS as needing a fsck and exit. Mr. Reardon
discovered that tune2fs -O ^metadata_csum on a FS with a corrupt
bitmap would leave the FS in a weird state.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Thu, 11 Sep 2014 19:31:41 +0000 (12:31 -0700)]
tests: test e2fsck recovery of corrupt descriptor blocks
Test e2fsck' ability to deal with (a) corrupt descriptor block
checksum; (b) obviously bad journal block tid; and (c) corrupt journal
blocks. These should exercise the journal recovery infinite loop
bugfix earlier in this patchset.
This test also ensures that (with metadata_csum and journal_csum_v3)
journal replay continues past a corrupt journal block.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Mon, 8 Sep 2014 23:13:08 +0000 (16:13 -0700)]
tests: test writing and recovering 64bit csum_v3 journals
Simple tests for the 64bit journal transaction creation code when
journal and metadata_csum are enabled. We test writing (bad) block
bitmaps out through the journal and replaying them via fsck, with a
few twists:
(a) All bitmaps are committed (fs errors reported)
(b) All the bitmap blocks are revoked (no errors)
(c) The transaction is never committed (no errors)
(d) Same as (a), but debugfs gets to do the replay.
We also test:
(a) writing and replaying transactions with multiple
descriptor blocks
(b) same, but with multiple revoke blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Mon, 8 Sep 2014 23:13:01 +0000 (16:13 -0700)]
tests: test writing and recovering checksum-free 32/64bit journals
Simple tests for the journal transaction creation code. We test
writing (bad) block bitmaps out through the journal and replaying them
via fsck, with a few twists:
(a) All bitmaps are committed (fs errors reported)
(b) All the bitmap blocks are revoked (no errors)
(c) The transaction is never committed (no errors)
(d) Same as (a), but debugfs gets to do the replay.
We also test:
(a) writing and replaying transactions with multiple
descriptor blocks
(b) same, but with multiple revoke blocks.
(c) adding the 64bit flag to a journal
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Mon, 8 Sep 2014 23:12:54 +0000 (16:12 -0700)]
debugfs: add the ability to write transactions to the journal
Extend debugfs with the ability to create transactions and replay the
journal. This will eventually be used to test kernel recovery and
metadata_csum recovery.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Mon, 8 Sep 2014 23:12:48 +0000 (16:12 -0700)]
e2fsck: fix minor errors in journal handling
The journal superblock's s_sequence field seems to track the tid of
the tail (oldest) transaction in the log. Therefore, when we release
the journal, set the s_sequence to the tail_sequence, because setting
it to the transaction_sequence means that we're setting the tid to
that of the head of the log. Granted, for replay these two are
usually the same (and s_start == 0 anyway) so thus far we've gotten
lucky and nobody noticed.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Mon, 8 Sep 2014 23:12:42 +0000 (16:12 -0700)]
debugfs: create journal handling routines
Create a journal.c with routines adapted from e2fsck/journal.c to
handle opening and closing the journal, and setting up the
descriptors, and all that. Unlike e2fsck's versions which try to
identify and fix problems, the routines here have no way to repair
anything.
[ Modified by tytso to fold debugfs/jfs_user.h into e2fsck/jfs_user.h,
so we don't have to copy recovery.c and revoke.c into debugfs. --tytso ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Mon, 8 Sep 2014 23:12:35 +0000 (16:12 -0700)]
misc: zero s_jnl_blocks when removing internal journal
When we're removing the internal journal (broken journal, turning it
off, or adding an external journal), zero s_jnl_blocks so that they
can't be picked up by accident later.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Mon, 8 Sep 2014 23:12:02 +0000 (16:12 -0700)]
libext2fs: write_journal_inode should check iterate return value
When creating a journal inode, check the return value from
block_iterate3() because otherwise we fail to capture errors such as
being unable to allocate an extent tree block, which leads to e2fsck
creating broken journals.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Mon, 8 Sep 2014 23:11:49 +0000 (16:11 -0700)]
libext2fs: report bad magic over bad sb checksum
We don't want ext2fs_open2() to report bad sb checksum on something
that's not even an ext* superblock. This apparently happens pretty
easily if we try to open an XFS filesystem. Thus, make it so that a
bad magic number code always trumps the sb checksum error code.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Mon, 8 Sep 2014 23:11:43 +0000 (16:11 -0700)]
e2fsck/debugfs: fix descriptor block size handling errors with journal_csum
It turns out that there are some serious problems with the on-disk
format of journal checksum v2. The foremost is that the function to
calculate descriptor tag size returns sizes that are too big. This
causes alignment issues on some architectures and is compounded by the
fact that some parts of jbd2 use the structure size (incorrectly) to
determine the presence of a 64bit journal instead of checking the
feature flags. These errors regrettably lead to the journal
corruption reported by Mr. Reardon.
Therefore, introduce journal checksum v3, which enlarges the
descriptor block tag format to allow for full 32-bit checksums of
journal blocks, fix the journal tag function to return the correct
sizes, and fix the jbd2 recovery code to use feature flags to
determine 64bitness.
Add a few function helpers so we don't have to open-code quite so
many pieces.
Switching to a 16-byte block size was found to increase journal size
overhead by a maximum of 0.1%, to convert a 32-bit journal with no
checksumming to a 32-bit journal with checksum v3 enabled.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the reallocation of dir_info fails, we will eventually cause e2fsck
to fail with an internal error. So if the realloc fails, print a
message and bail out with a fatal error early when at the time of the
reallocation failure.
Theodore Ts'o [Wed, 27 Aug 2014 13:27:54 +0000 (09:27 -0400)]
mke2fs: complain if bigalloc and hugefiles_align_disk is incompatible
If the starting partition offset is incompatible with the bigalloc
cluster size, complain and exit, instead of creating a file which
would have a logical to physical block mapping which breaks the
cluster alignment requirement.
Darrick J. Wong [Wed, 27 Aug 2014 03:44:04 +0000 (23:44 -0400)]
e2fsck: fix infinite loop when recovering corrupt journal blocks
When recovering the journal, don't fall into an infinite loop if we
encounter a corrupt journal block. Instead, just skip the block and
proceed with the full filesystem fsck.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Mon, 25 Aug 2014 22:04:42 +0000 (18:04 -0400)]
tests/d_inline_dump: remove version dependency in the expected output
Also add the convenience macro $CLEAN_OUTPUT in test_config which can
be used to run the "sed -e $cmd_dir/filter.sed" command to clean up
e2fsprogs command output before comparing with the expected golden
output.
Theodore Ts'o [Mon, 25 Aug 2014 03:54:37 +0000 (23:54 -0400)]
mke2fs: improve the error message when a non-existent file is specified
If the user does not specify the file system size, and the file does
not exist, give an error message like this:
The file /tmp/foo.img does not exist and no size was specified.
instead of this:
Creating regular file /tmp/foo.img
mke2fs: Device size reported to be zero. Invalid partition specified, or
partition table wasn't reread after running fdisk, due to
a modified partition being busy and in use. You may need to reboot
to re-read your partition table.
Darrick J. Wong [Mon, 25 Aug 2014 02:02:49 +0000 (22:02 -0400)]
e2fsck: on BE, re-swap everything after a damaged dirent so salvage works correctly
On big-endian systems, if the dirent swap routine finds a rec_len that
it doesn't like, it continues processing the block as if rec_len == 8.
This means that the name field gets byte swapped, which means that
salvage will not detect the correct name length (unless the name has a
length that's an exact multiple of four bytes), and it'll discard the
entry (unnecessarily) and the rest of the dirent block. Therefore,
swap the rest of the block back to disk order, run salvage, and
re-swap anything after the salvaged dirent.
The test case for this is f_inlinedata_repair if you run it on a BE
system.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Mon, 25 Aug 2014 02:01:36 +0000 (22:01 -0400)]
libext2fs: fix problems with LE<->BE conversions on BE platforms
Fix more problems that I found when testing on ppc64:
- Inode swap cut and paste error leads to immutable inodes being
detected as inlinedata inodes, leading to e2fsck incorrectly barfing
on i_block[] contents.
- Superblock csum/verify must be aware of the fs->super byte order
when checking for metadata_csum feature flag. (Hint: in _openfs(),
fs->super is in LE order for the first csum verification)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Sun, 24 Aug 2014 01:55:46 +0000 (21:55 -0400)]
libext2fs: create inlinedata symlinks
Add to ext2fs_symlink the ability to create inline data symlinks.
[ Modified by tytso to add more logging to the test script ]
Suggested-by: Pu Hou <houpu.hp@alibaba-inc.com> Cc: Pu Hou <houpu.hp@alibaba-inc.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Tue, 19 Aug 2014 12:27:59 +0000 (08:27 -0400)]
debugfs: fix set_inode_field block[IND|DIND|TIND]
After we determine that we can't parse the array value as an integer,
we need to restore the square brackets to the field name, so that we
can find a match with block[IND], block[DIND], and block[TIND] in the
inode field table.
Reported-by: Jun He <jhe@cs.wisc.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Wed, 13 Aug 2014 19:59:20 +0000 (15:59 -0400)]
filefrag: fix extent count calculation when using FIBMAP
The extent count calculation works correctly with the FIBMAP ioctl in
verbose (-v) mode, but without the verbose option, the calculation was
broken because we weren't properly updating the fm_ext data structures
in non-verbose mode.
Theodore Ts'o [Tue, 12 Aug 2014 18:37:19 +0000 (14:37 -0400)]
tests: convert use of md5sum to crcsum
The following tests were using md5sum: i_e2image, u_mke2fs, and
u_tune2fs. Convert them to use crcsum for better portability (not all
environments have md5sum; some might have sha1sum instead :-)
Darrick J. Wong [Tue, 12 Aug 2014 18:19:37 +0000 (14:19 -0400)]
e2fsck: don't flush the FS unless it's actually dirty
ext2fs_flush2() unconditionally writes the block group descriptors to
disk even if the underlying FS isn't marked dirty. This causes the
following error message on a fsck -n run:
e2fsck 1.43-WIP (09-Jul-2014)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Error writing block 2 (Attempt to write block to filesystem resulted in short write). Ignore error? no
Error writing block 2 (Attempt to write block to filesystem resulted in short write). Ignore error? no
Error writing file system info: Attempt to write block to filesystem resulted in short write
Since ext2fs_close2() only calls flush if the dirty flag is set,
modify e2fsck to exhibit the same behavior so that we don't spit out
write errors for a read only check.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Sun, 10 Aug 2014 22:50:38 +0000 (18:50 -0400)]
e2fsck: don't set prev after processing '..' on an inline dir
In an inline directory, the '..' entry is compacted down to just the
inode number; there is no full '..' entry. Therefore, it makes no
sense to assign 'prev' to the fake dotdot entry we put on the stack,
as this could confuse a salvage_directory call on a corrupted next
entry into modifying stack contents (the fake dotdot entry).
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Sun, 10 Aug 2014 22:49:37 +0000 (18:49 -0400)]
e2fsck: be more careful in assuming inline_data inodes are directories
If a file is marked inline_data but its i_size isn't a multiple of
four, it probably isn't an inline directory, because directory entries
have sizes that are multiples of four.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Darrick J. Wong [Sun, 10 Aug 2014 22:46:53 +0000 (18:46 -0400)]
e2fsck: check inline dir size is a multiple of 4
Directory entries must have a size that's a multiple of 4; therefore
the inline directory structure must also have a size that is a muliple
of 4. Since e2fsck doesn't check this, we should check that now.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>