filefrag: Display the number of contiguous, not physical, extents
From a bug report filed by Ibragimov Rinat:
When filefrag uses FIEMAP ioctl its logic differs for ordinary and
verbose (-v) modes. ext4 returns extent on every 32768 block so on
large files it is possible that `filefrag large-file' tells about 4
extents while `filefrag -v large-file' finds only one.
Also when I tried to use generic_block_fiemap function to add
FIEMAP for reiserfs, every block was reported as a new extent
resulting in thousands "extents" for continuous files.
I think filefrag should merge adjacent extents even when -v is not
specified.
Darrick J. Wong [Fri, 30 Sep 2011 19:41:26 +0000 (12:41 -0700)]
libext2fs: Always swab the MMP block on big-endian systems machines
The MMP code in libext2fs tries to gate MMP block swab'ing with this
test:
if (fs->super->s_magic == ext2fs_swab16(EXT2_SUPER_MAGIC))
However, EXT2FS_ENABLE_SWAPFS never seems to be defined anywhere (all
possible existed, the field fs->super->s_magic is always in host
byteorder, so the test always fails. So, we can change the #ifdef to
WORDS_BIGENDIAN (which is conditionally defined on BE platforms) and
get rid of the broken if test.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Darrick J. Wong [Fri, 30 Sep 2011 19:40:05 +0000 (12:40 -0700)]
e2fsck: zero ctx->fs after freeing fs when restarting due to MMP
If MMP is enabled and e2fsck determines that it needs to restart
itself on account of various MMP conditions, it will close the current
fs and jump back to the start of fs checking. However, closing fs
also frees it, which means that we need to set ctx->fs to NULL to
prevent subsequent open code from accessing the old deleted pointer.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Darrick J. Wong [Fri, 30 Sep 2011 19:38:43 +0000 (12:38 -0700)]
libext2fs: Fix various bugs from the metadata checksum integration
Fix several minor errors in structure definitions, the byteswap code,
and Makefiles that result from merging the crc32c and initial parts of
the metadata checksumming patchset.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If the enable_periodic_fsck option is false in /etc/mke2fs.conf (which
is also the default), s_max_mnt_count needs to be set to -1, instead
of 0. Kernels newer than 3.0 will interpret 0 to disable periodic
checks, but older kernels will print a warning message on each mount,
which will annoy users.
All of the signals which the signal catcher tries to interpret aren't
necessarily defined on all systems. So add #ifdef's to protect
various signals to avoid compilation failures on non-x86 platforms.
debian: don't bomb out if DEB_BUILD_OPTIONS contains nostrip
The debugging packages will contain no debugging symbols (since they
are in the unstripped executables and libraries) but at least the
build won't crash.
debian/copyright: update the debian copyright file
Fix up the debian/copyright file so it contains the full information
of the licenses used by all of the libraries. Also use a single
copyright file for e2fsprogs and e2fslibs, to make sure they are kept
in sync.
Andreas Dilger [Sat, 24 Sep 2011 18:13:27 +0000 (14:13 -0400)]
e2fsck: regression tests for INCOMPAT_MMP feature
Add tests for the MMP feature - creating a filesystem with mke2fs
and MMP enabled, enable/disable MMP with tune2fs, disabling the
e2fsck MMP flag with tune2fs after a failed e2fsck, and e2fsck
checking and fixing a corrupt MMP block.
The MMP tests need to be run from a real disk, not tmpfs, because
tmpfs doesn't support O_DIRECT reads, which MMP uses to ensure
that reads from the MMP block are not filled from the page cache.
Using a local disk does not slow down the tests noticably, since
they wait to detect if the MMP block is being modified.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Andreas Dilger [Sat, 24 Sep 2011 17:48:55 +0000 (13:48 -0400)]
ext2fs: add multi-mount protection (INCOMPAT_MMP)
Multi-mount protection is feature that allows mke2fs, e2fsck, and
others to detect if the filesystem is mounted on a remote node (on
SAN disks) and avoid corrupting the filesystem. For e2fsprogs this
means that it checks the MMP block to see if the filesystem is in use,
and marks the filesystem busy while e2fsck is running on the system.
This is useful on SAN disks that are shared between high-availability
servers, or accessible by multiple nodes that aren't in HA pairs. MMP
isn't intended to serve as a primary HA exclusion mechanism, but as a
failsafe to protect against user, software, or hardware errors.
There is no requirement that e2fsck updates the MMP block at regular
intervals, but e2fsck does this occasionally to provide useful
information to the sysadmin in case of a detected conflict.
For the kernel (since Linux 3.0) MMP adds a "heartbeat" mechanism to
periodically write to disk (every few seconds by default) to notify
other nodes that the filesystem is still in use and unsafe to modify.
Originally-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Johann Lombardi <johann@whamcloud.com> Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Sat, 24 Sep 2011 17:17:05 +0000 (13:17 -0400)]
tune2fs: kill external journal if device not found
Continue to remove the external journal device even if the device
cannot be found.
Add a test to verify that the journal device/UUID are actually removed
from the superblock. It isn't possible to use a real journal device
for testing without loopback devices and such (it must be a block device)
and this would invite complexity and failures in the regression test.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Andreas Dilger [Sat, 24 Sep 2011 16:59:31 +0000 (12:59 -0400)]
misc: quiet minor compiler errors
Several compiler errors are quieted:
- zero-length gnu_printf format string
- unused variable
- uninitalized variable (though it isn't actually used for anything)
- fixed a bug in ext2fs_stat() if stat64() does not exist
Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
libext2fs: add flag to ext2fs_flush() and ext2fs_close() to avoid fsync
This adds new APIs: ext2fs_flush2 and ext2fs_close2 which take an
extra 'int flags' parameter.
This allows us to pass in an EXT2_FLAG_FLUSH_NO_SYNC flag which avoids
fsync'ing the filesystem when closing it. For the case we have in
mind where we are just constructing a throwaway ext2 filesystem in a
file in order to boot a VM, this saves over 5 seconds during the boot
process and avoids many unnecessary disk writes.
Existing code using ext2fs_flush and ext2fs_close remains unaffected
by this change.
Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Since the libquota library has namespace contamination issues, don't
build a shared library and link against it statically. Don't include
it as part of the Debian packages.
The DEFS line in MCONFIG had gotten so long that it exceeded 4k, and
this was starting to cause some tools heartburn. It also made "make
V=1" almost useless, since trying to following the individual commands
run by make was lost in the noise of all of the defines.
So fix this by putting the configure-generated defines in lib/config.h
and the directory pathnames to lib/dirpaths.h.
In addition, clean up some vestigal defines in configure.in and in the
Makefiles to further shorten the cc command lines.
Eric Sandeen [Fri, 16 Sep 2011 20:49:30 +0000 (15:49 -0500)]
filefrag: Fix uninitialized "expected" value
The "count" variable is only ever set if FIBMAP is used,
due to the -B switch, or a fiemap failure. However,
we use it unconditionally to calculate "expected" for
extN files, so we can end up printing garbage.
Initialize count to 0, and unless we go through the FIBMAP
path, expected will be 0 as well, and in that case do not
print the message.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Fri, 16 Sep 2011 20:49:29 +0000 (15:49 -0500)]
subst: Fix free of uninit pointers
in add_subst(), if the malloc of ent->name fails, we goto fail;
which will free ent->name (which is null, so OK) but also free
ent->value (which is uninitialized). There is no case where
we must free ent->value on an error (it is allocated last, and
if it fails it of course doesn't need to be freed) so just
remove it.
Also "retval" is only assigned once to the constant ENOMEM,
so we can just return that explicitly in the failure case.
Signed-off-by: Eric Saneeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Fri, 16 Sep 2011 20:49:28 +0000 (15:49 -0500)]
e2fsprogs: Fix some error cleanup path bugs
In inode_open(), if the allocation of &io fails, we go to cleanup
and dereference io to test io->name, which is a bug.
Similarly in undo_open() if allocation of &data fails, we
go to cleanup and dereference data to test data->real.
In the test_open() case we explicitly set retval to the only
possible error return from ext2fs_get_mem(), so remove that
for tidiness.
The other changes just make make earlier returns go through
the error goto for consistency.
In many cases we returned directly from the first error, but
"goto cleanup" etc for every subsequent error. In some
cases this leads to "impossible" tests such as:
if (ptr)
ext2fs_free_mem(&ptr)
on paths where ptr cannot be null because we would have
returned directly earlier, and Coverity flags this.
This isn't really indicative of an error in most cases, but
I think it can be clearer to always exit through the error goto
if it's used later in the function.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Fri, 16 Sep 2011 20:49:20 +0000 (15:49 -0500)]
libext2: move buf variable completely under ifdef
If !WORDS_BIGENDIAN, it is pointless to test whether buf
is NULL, because it is initialized to NULL and never changed.
This makes Coverity complain, so we can just move all handling
of "buf" under the #ifdef.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Fri, 16 Sep 2011 20:49:17 +0000 (15:49 -0500)]
e2fsprogs: Remove impossible name_len tests.
The name_len field in ext2_dir_entry is actually comprised of
the name length in the lower 8 bytes, and the filetype in the
high 8 bytes. So in places, we mask name_len with 0xFF to
get the actual length.
But once we have masked name_len with 0xFF, there is no point
in testing whether it is greater than EXT2_NAME_LEN, which
is 255 - or 0xFF. So all of these tests are extraneous.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Fri, 16 Sep 2011 20:49:16 +0000 (15:49 -0500)]
libext2: Fix EXT2_LIB_SOFTSUPP masking
EXT2_LIB_SOFTSUPP_INCOMPAT_* are supposed to be bitmasks
of features which can be opened even though they are
under development. The intent is that these are masked
out of the features list, so that they will be ignored
on open.
However, the code does a logical not vs. a bitwise not:
features &= !EXT2_LIB_SOFTSUPP_INCOMPAT;
which will not have the desired effect...
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
debugfs: add 64-bit support to the set_field commands
The set_fields commands (set_super_value, set_inode_field,
set_block_group) now handle fields which store in split fields on
ext4's on-disk format. For example, the superblock fields
s_blocks_count and s_blocks_count_hi.
The user can either set the low or high part of the field via
"blocks_count_lo" or "blocks_count_hi", or both parts can be set via
"blocks_count".
libext2fs: add metadata checksum and snapshot feature flags
Reserve EXT4_FEATURE_RO_COMPAT_METADATA_CSUM and
EXT2_FEATURE_COMPAT_EXCLUDE_BITMAP. Also reserve fields in the
superblock and the inode for the checksums. In the block group
descriptor, reserve the exclude bitmap field for the snapshot feature,
and checksums for the inode and block allocation bitmaps.
With this commit, the metadata checksum and exclude bitmap features
should have reserved all of the fields they need in ext4's on-disk
format.
This commit also fixes an a missing byte swap for s_overhead_blocks.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Darrick J. Wong <djwong@us.ibm.com> Cc: Amir Goldstein <amir73il@gmail.com>
Yongqiang Yang [Fri, 16 Sep 2011 13:25:51 +0000 (09:25 -0400)]
e2fsck: fix error in computing blocks of the ending group
If the blocks of a filesystem is a multiple of blocks_per_group,
blocks of the ending group is computed wrongly. Use the
new ext2fs_group_blocks_count() helper instead.
Eric Sandeen: Converted to use new blocks per group helper
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck: do not attempt to discard if -n was specified
If '-n' option is specified there should be no changes made to the file
system hence we should not attempt to discard the file system. This
commit adds a check into the e2fsck_discard_blocks() condition so it skip
discard if E2F_OPT_NO flag is set.
Currently we need to grep, list or just search for failed tests when
running 'make check' which is annoying. This commit simply prints out
the list of failed test names at the end of the output.
e2fsprogs: Use punch hole as "discard" on regular files
If e2fsprogs tools (mke2fs, e2fsck) is run on regular file instead of
on block device, we can use punch hole instead of regular discard
command which would not work on regular file anyway. This gives us
several advantages. First of all when e2fsck is run with '-E discard'
parameter it will punch out all ununsed space from the image, hence
trimming down the file system image. And secondly, when creating an
file system on regular file (with '-E discard' which is default), we
can use punch hole to clear the file content, hence we can skip inode
table initialization, because reads from sparse area returns zeros. This
will result in faster file system creation (without the need to specify
lazy_itable_init) and smaller images.
This commit also fixes some tests that would fail due to mke2fs showing
discard progress, hence the output would differ.
In many places we are using #ifdef HAVE_OPEN64 to determine if we can
use open64() but that's ugly. This commit creates two new helpers
ext2fs_open_file() for open() and ext2fs_stat() for stat(). Also we need
new typedef ext2fs_struct_stat for struct stat.
mke2fs: check that auto-detected blocksize <= sys_page_size
Block size can be specified manually via the -b option or deduced
automatically. Unfortunately, the check that it is still smaller than
the system page size is only performed right after the command line
options are parsed.
Therefore, if buggy or inappropriately installed/configured hardware
hints that larger block sizes have to be used, mkfs will silently create
a file system which can not be mounted on the system in question.
By moving the check beyond the last assignment to blocksize it is now
ensured, that mkfs will issue a warning even if inappropriate blocksize
was auto-detected.
The new behavior can be easily tested, by exporting the following
variables before running mkfs:
Amir Goldstein [Fri, 16 Sep 2011 02:23:24 +0000 (22:23 -0400)]
libext2fs: fix the range validation in bitmap_range2 funcs
The condition ((start+num) & ~0xffffffffULL) in bitmap_range2
and generic_bmap_range funcs in get_bitmap64.c was wrong and
inconsistent with the condition (start+num-1 > bmap->real_end)
in generic_bitmap_range funcs in get_bitmap.c.
I got the following error from tune2fs on a 16TB fs:
Illegal block number passed to ext2fs_unmark_block_bitmap #4294967295
for block bitmap for 16TB.img
tune2fs: Invalid argument while reading bitmaps
Fix to condition to ((start+num-1) & ~0xffffffffULL), because
the bit (start+num) is not going to be changed by the funcs.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
libext2fs: fix binary and source compatibility with the dump program
The dump program relies on fs->frag_size and the
EXT2_FRAGS_PER_BLOCK() macro. Kind of silly for it to do so, but it's
part of the kludgy way the dump program (which was originally written
for the BSD FFS was ported over to support ext2/3.) Given how it
makes assumptions about the ext2/3/4 file system being similar to the
BSD FFS, it's a bit of a miracle it works for ext4 --- or at least
appears to work...
In flush_l2_cache() we are using ext2fs_llseek() however we do not
properly detect the error code returned from the function, because we
are assigning it into ULL variable, hence we will not see negative
values.
Fix this by changing the type of the variable to ext2_loff_t which is
signed and hence will store negative values.
We are doing ext2fs_flush() twice right now at the end of the mke2fs.
First by directly calling ext2fs_flush() which is intended to write
superblock and fs accounting information. And then it is invoked again
when we are calling ext2fs_close(), only this time, because the fs is
not dirty, we are writing out only superblock.
I think it is bad to call it twice because even when writing only super
block it takes some time on bigger file systems and moreover
ext2fs_close() can fail without any reasonable explanation for the user.
Also ext2fs_flush() is printing out progress and it is confusing for the
users.
Fix all this by removing the ext2fs_flush() and leaving it all to
ext2fs_close(). However we need to introduce new variables to store
check interval and max mount count, because fs structure is freed on
ext2fs_close() and we really want to print those information as the last
info for the user.
[ Fixed type mismatch in a printf format statement -tytso]
Since e2image can be optionally compiled out, we tested to see if
e2image was built; but using "test -x $E2IMAGE" fails if e2image is
something like "valgrind --simhints=lax-ioctls ../misc/e2image".
Define and use $E2IMAGE_EXE, much like we have done with e2undo and
resize2fs.
Aditya Kali [Wed, 20 Jul 2011 18:40:05 +0000 (11:40 -0700)]
tune2fs: Add support for turning on quota feature
This patch adds support for setting the quota feature in superblock
and allows selectively creating quota inodes (user or group or both)
in the superblock. Currently, modifying the quota feature is only
supported when the filesystem is unmounted.
Also, when setting the quota feature, tune2fs will use aquota.user or
aquota.group file inode number in superblock if these files exist.
Otherwise it will initialize empty quota inodes #3 and #4 and use them.
Here is how it works:
# Set quota feature and initialize both (user and group) quota inodes
$ tune2fs -O quota /dev/ram1
# Enable only one type of quota
$ tune2fs -Q usrquota /dev/ram1
Aditya Kali [Wed, 20 Jul 2011 18:40:04 +0000 (11:40 -0700)]
mke2fs: support creation of filesystem with quota feature
mke2fs also creates quota inodes (userquota: inode# 3 and
groupquota: inode #4) inodes while creating a filesystem when 'quota'
feature is set.
# To set quota feature and initialize quota inodes during mke2fs:
$mke2fs -t ext4 -O quota /dev/ram1
Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Aditya Kali [Wed, 20 Jul 2011 18:40:06 +0000 (11:40 -0700)]
e2fsck: add support for checking the built-in quota files
This patch adds support for doing quota accounting during full
e2fsck scan if the 'quota' feature was set on the superblock.
If user-visible quota inodes are in use, they will be hidden
and converted to the reserved quota inodes.
Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Aditya Kali [Wed, 20 Jul 2011 18:40:02 +0000 (11:40 -0700)]
e2fsprogs: add quota library to e2fsprogs
This patch adds the quota library (ported form Jan Kara's quota-tools) in
e2fsprogs in order to make quotas as a first class supported feature in Ext4.
This patch also provides interface in lib/quota/mkquota.h that will be used by
mke2fs, tune2fs, e2fsck, etc. to initialize and update quota files.
This first version of the quota library does not support reading existing quota
files. This support will be added in the near future.
Thanks to Jan Kara for his work on quota-tools. Most of the files in this patch
are taken as-is from quota tools and were simply modified to work with
libext2fs in e2fsprogs.
Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Wed, 31 Aug 2011 18:27:21 +0000 (14:27 -0400)]
libext2fs: fix binary search for the icount and badblocks stores
Remove the interpolation because there is a bug in icount which can
cause a core dump if calculated range gets turned into a NaN and then
do an out-of-bounds array access. We could fix this with some more
tests, but the complexity is such that nuking all of the interpolation
code will be faster than fixing the interpolation.
Eric Sandeen [Wed, 10 Aug 2011 19:07:41 +0000 (14:07 -0500)]
libext2fs: copy cluster_bits in ext2fs_copy_generic_bmap
The f_lotsbad regression test was failing on some systems
with:
Restarting e2fsck from the beginning...
Pass 1: Checking inodes, blocks, and sizes
+Illegal block number passed to ext2fs_test_block_bitmap #0 for in-use block map
Pass 2: Checking directory structure
Entry 'termcap' in / (2) has deleted/unused inode 12. Clear? yes
Running with valgrind (./test_script --valgrind f_lotsbad) we
see:
+==31409== Conditional jump or move depends on uninitialised value(s)
+==31409== at 0x42927A: ext2fs_test_generic_bmap (gen_bitmap64.c:378)
among others.
Looking at gen_bitmap64.c:
376: arg >>= bitmap->cluster_bits;
377:
378: if ((arg < bitmap->start) || (arg > bitmap->end)) {
A little more debugging showed that it was actually
bitmap->cluster_bits which was uninitialized, because it never
gets copied over in ext2fs_copy_generic_bmap()
Patch below resolves the issue.
Reported-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Andreas Dilger [Mon, 18 Jul 2011 03:13:47 +0000 (23:13 -0400)]
mke2fs: document stripe_width, not stripe-width
For consistency with other multi-word options, document the extended
option stripe_width instead of stripe-width. This also avoids the
complexity of parsing options that have an embedded '-'.
Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck: teach e2fsck how to deal with bigalloc in non-extent-mapped inodes
Currently the bigalloc implementation in the kernel requires extents,
but this restriction might get relaxed in the future. Also, old
versions of mke2fs that supported bigalloc during early testing
created the root and lost+found directories without using
extent-mapped inodes. This makes it possible for e2fsck to better
support these old legacy file systems if it comes across them.
libext2fs: fix block iterator when the callback function modifies an extent
If the callback interator modifies a block in the middle of an extent
during a call to the block iterator, causing the extent to be split,
ext2_block_iterate3() will end up calling the callback function twice
for some number of blocks. Fix this.