Theodore Ts'o [Sat, 4 Jun 2011 14:20:47 +0000 (10:20 -0400)]
libext2fs: change fs->clustersize to fs->cluster_ratio_bits
The log2 of the ratio of cluster size to block size is far more useful
than just storing the cluster size. So make this change, and then
define basic utility macros: EXT2FS_CLUSTER_RATIO(),
EXT2FS_CLUSTER_MASK(), EXT2FS_B2C(), EXT2FS_C2B(), and
EXT2FS_NUM_B2C().
Theodore Ts'o [Sat, 4 Jun 2011 20:40:26 +0000 (16:40 -0400)]
libext2fs: change EXT2_MAX_BLOCKS_PER_GROUP() to be cluster size aware
Change the EXT2_MAX_BLOCKS_PER_GROUP so that it takes the cluster size
into account. This way we can open bigalloc file systems without
ext2fs_open() thinking that they are corrupt.
Theodore Ts'o [Sat, 4 Jun 2011 20:36:19 +0000 (16:36 -0400)]
libext2fs: require cluster size == block_size when opening a !bigalloc fs
In ext2fs_open() check to make sure the cluster size superblock field
is the same as the block size field when the bigalloc feature is not
set. This is necessary since we will start introducing calculations
based on the cluster size field.
e2fsprogs: Unify the upper limit of reserved blocks count
In e2fsprogs, the upper limit of reserved blocks count is a half of
filesystem's blocks count. This patch fixes the incorrect checks of
reserved blocks count.
Eric Sandeen [Mon, 4 Apr 2011 19:11:52 +0000 (15:11 -0400)]
mke2fs: don't set stripe/stride to 1 block
Block devices may set minimum or optimal IO hints equal to
blocksize; in this case there is really nothing for ext4
to do with this information (i.e. search for a block-aligned
allocation?) so don't set fs geometry with single-block
values.
Zeev also reported that with a block-sized stripe, the
ext4 allocator spends time spinning in ext4_mb_scan_aligned(),
oddly enough.
Theodore Ts'o [Sun, 8 May 2011 03:14:06 +0000 (23:14 -0400)]
e2fsck: make the "fs is mounted; continue?" prompt more paranoid
A user received the "file system is mounted; do you really want to
continue" prompt, and then instead of typing "n" for no, forgot that
he hadn't declined to continuation question, and typed the up-arrow
key, which in his locale, the 'A' in "^[[A" was interpreted as "yes",
and he lost data.
This was clearly the user's fault, but to make e2fsck a bit safer
against user stupidity/carelessness, we will change the "fs is
mounted; continue?" prompt to default to no, and treat the escape
character (along with the return and space characters, currently) as a
request for the default answer.
Eric Sandeen [Thu, 5 May 2011 18:21:08 +0000 (13:21 -0500)]
filefrag: count 0 extents properly when verbose
/boot/a: 0 extents found
works properly, but
Filesystem type is: ef53
Filesystem cylinder groups is approximately 61
File size of a is 0 (0 blocks, blocksize 1024)
ext logical physical expected length flags
a: 1 extent found
yields 1 extent when it should be 0.
Fix this up by special-casing no extents returned in verbose
mode; skip printing the header for the columns too, since there
are no columns to print.
Also, in nonverbose mode we can set fm_extent_count to 0
so that FIEMAP will just query the extent count without gathering
details; clarify this with a comment.
Addresses-RedHat-Bugzilla: 653234 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Fri, 18 Mar 2011 18:47:15 +0000 (14:47 -0400)]
add new superblock field: s_overhead_blocks
It turns out that it's very hard to calculate overheads in the face of
clustered allocation (bigalloc). This is because multiple metadata
blocks from different block groups can end up in the same allocation
cluster. Calculating the exact overhead requires O(all block bitmaps)
in memory, or O(number of block groups**2) in time. So we will
calculate this at mkfs time and stash it in the superblock.
Theodore Ts'o [Sat, 26 Feb 2011 02:43:54 +0000 (21:43 -0500)]
Add basic BIGALLOC support for cluster-based allocation
This adds the superblock fields needed so that dumpe2fs works and the
code points and renames the superblock fields from describing
fragments to clusters.
Aditya Kali [Tue, 15 Feb 2011 22:27:27 +0000 (14:27 -0800)]
e2fsprogs: reserving code points for new ext4 quota feature
This patch adds support for detecting the new 'quota' feature in ext4.
The patch reserves code points for usr and group quota inodes and also
for the feature flag EXT4_FEATURE_RO_COMPAT_QUOTA.
Signed-off-by: Aditya Kali <adityakali@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Thu, 17 Feb 2011 21:56:17 +0000 (15:56 -0600)]
e2fsprogs: enable user namespace xattrs by default
User namespace xattrs are generally useful, and I think extN
is the only filesystem requiring a special mount option to
enable them, when xattrs are otherwise available. So this
change sets that mount option into the defaults, via a
mke2fs.conf option.
Note that if xattrs are config'd off, this will lead to a
mostly-harmless:
EXT4-fs (sdc1): (no)user_xattr options not supported
message at mount time...
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Thu, 17 Feb 2011 21:55:15 +0000 (15:55 -0600)]
e2fsprogs: turn off enforced fsck intervals by default
The forced fsck often comes at unexpected and inopportune moments,
and even enterprise customers are often caught by surprise when
this happens. Because a filesystem with an error condition will
be marked as requiring fsck anyway, I submit that the time-based
and mount-based checks are not particularly useful, and that
administrators can schedule fscks on their own time, or tune2fs
the enforced intervals if they so choose. This patch disables the
intervals by default, and I've added a new mkfs.conf option to
turn on the old behavior of random, unexpected, time-consuming
fscks at boot time. ;)
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Wed, 16 Feb 2011 18:01:39 +0000 (12:01 -0600)]
e2fsprogs: create com_err.h link in includedir
After debian bug #192277, debian/rules started making a symlink
to com_err.h in /usr/include. Now I have Fedora bug #550889
for the same issue, and perhaps it's time to make this link
by default, rather than fixing it up in packaging steps?
[ Changed by tytso to remove the explicit -s option; this will default
to creating a hard link by default, which slightly faster. If
people want to use symlinks for all links during the install
process, they can use configure option --enable-symlink-install.
The reason for this change is that some file systems, like AFS,
don't support symlinks, and AFS users complain when they can't build
or install into AFS. So I don't want to use symlinks
unconditionally without a way of switching things back and forth,
and it's easier if we just make all links made during the install
process to be hard links or sym links. ]
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Sun, 20 Feb 2011 20:19:47 +0000 (15:19 -0500)]
badblocks: Fix up recover_block handling in badblocks
If there was a bad block for block #0, badblocks would never switch
back testing blocks more efficiently. In addition, we were
double-incrementing the blocks to be tested in the read/write test due
to failure to remove code.
Thanks to Ragnar Kjørstad for pointing these problems out.
Theodore Ts'o [Fri, 18 Feb 2011 03:58:21 +0000 (22:58 -0500)]
badblocks: Only report errors when reading/writing one block at a time
With Direct I/O, the kernel can report 0 bytes read even though the
first block has no errors. So there are any errors, we need try to
read/write blocks one at a time and to get an accurate report.
Eric Sandeen [Tue, 21 Dec 2010 21:32:05 +0000 (15:32 -0600)]
resize2fs: do not clear resize inode for 0 resvd blocks
I ran into odd behavior where mkfs.ext4 of a 16T filesystem would
create a resize inode with 0 reserved blocks, and mark the resize_inode
feature.
A subsequent slight downward resize of the filesystem would remove
the resize inode, making any further offline resizing impossible.
This is especially odd in light of the fact that a large downward
resize (say, to 8T) will actually add blocks to the resize inode -
so a small resize removes it, a large resize expands it ...
Eric Sandeen [Thu, 16 Dec 2010 04:37:34 +0000 (22:37 -0600)]
resize2fs: handle exactly-16T filesystems in resize2fs
Before we go whole-hog on 64-bit e2fsprogs, I wonder if this
is worth considering as a last-minute addition to the 1.41
stream. Currently, mke2fs will shave a block off an exactly-16T
device to fit*, but resize2fs does not do the same, leading
to some asymmetry. This patch fixes that up, and allows 16T
devices to be handled more gracefully in offline resize.
(in fact resize2fs will not even open a 16T device, today).
Eric Sandeen [Tue, 14 Dec 2010 19:00:01 +0000 (13:00 -0600)]
e2fsprogs: fix type-punning warnings
Flags used during RHEL/Fedora builds lead to a couple type-punning
warnings:
recovery.c: In function 'do_one_pass':
recovery.c:539: warning: dereferencing type-punned pointer will break strict-aliasing rules
./csum.c: In function 'print_csum':
./csum.c:170: warning: dereferencing type-punned pointer will break strict-aliasing rules
The two changes below fix this up.
Note that the csum test binary output changes slightly, but this does
not break any tests.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Bernd Schubert [Fri, 12 Nov 2010 23:09:07 +0000 (00:09 +0100)]
e2fsck: add an option which causes it to only do a journal replay
As recently discussed on linux-ext4@vger.kernel.org add an option to e2fsck
to allow to replay the journal only. That will allow scripts, such as
pacemakers 'Filesystem' RA to first replay the journal and if that sets
an error state from the journal replay, further check for that error
(dumpe2fh -h | grep "Filesystem state:") and if that shows and error
to refuse to mount. It also allows automatic e2fsck scripts to first
replay the journal and on a second run after the real pass1 to passX checks
to test for the return code.
Theodore Ts'o [Mon, 6 Dec 2010 22:07:27 +0000 (17:07 -0500)]
e2fsck: Do blkid interpretation on the external journal specifier
If the user specifies "e2fsck -j UUID=XXX", e2fsck should do blkid
interpretation, since e2fsck does it with the base file system name.
So from the sake of consistency and user convenience, we should do it
here too.
Theodore Ts'o [Mon, 6 Dec 2010 15:10:33 +0000 (10:10 -0500)]
e2fsck: Add the ability to force a problem to not be fixed
The boolean options "force_no" in the problems stanza of e2fsck.conf
allows a particular problem code be treated as if the user will answer
"no" to the question of whether a particular problem should be fixed
--- even if e2fsck is run with the -y option.
As an example use case, suppose a distribution had widely deployed a
version of the kernel where under some circumstances, the EOFBLOCKS_FL
flag would be left set even though it should not be left set, and a
customer had a workload which exercised the fencepost error all the
time, resulting in many large number of inodes that had EOFBLOCKS_FL
set erroneously. Enough, in fact, the e2fsck runs were taking too
long. (There was such a bug in the kernel, which was fixed by commit 58590b06d in 2.6.36).
Leaving EOFBLOCKS_FL set when it should not be isn't a huge deal, and
is certainly than having high availability timeout alerts going off
left and right. So in this case, the best fix might be to put the
following in /etc/e2fsck.conf:
Andreas Dilger [Mon, 6 Dec 2010 03:20:19 +0000 (22:20 -0500)]
dumpe2fs: fix output for flex_bg bitmap offsets
When running dumpe2fs on a filesystem formatted with flex_bg, it
prints out the relative offsets for the bitmaps and inode table
badly on 64-bit systems, because the offset is computed as a
large positive number instead of being a negative numer (which
will not be printed at all):
Group 1: (Blocks 0x8000-0xffff) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 0x0102 (+4294934786), Inode bitmap at 0x0202 (+4294935042)
Inode table at 0x037e-0x03fa (+4294935422)
This commit prints out the relative offsets for flex_bg
groups as the offset within the reported group. This makes it
more clear where the metadata is located, rather than simply
printing some large negative number.
Group 1: (Blocks 0x8000-0xffff) [INODE_UNINIT, ITABLE_ZEROED]
Block bitmap at 0x0102 (bg #0 +258), Inode bitmap at 0x0202 (bg #0 +514)
Inode table at 0x037e-0x03fa (bg #0 +894)
Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Wed, 1 Dec 2010 23:28:35 +0000 (18:28 -0500)]
mke2fs: Fail if the requested file system type is not defined in mke2fs.conf
If the user passes a file system type which is not defined in
mke2fs.conf (i.e., mke2fs -t xfs ...) change mke2fs so that it prints
a warning and aborts the run. (There is an exception for ext2, since
that file system does not need a special definition in the fs_types
section of the /etc/mke2fs.conf file.)
In addition, print a warning if there are usage types (specified using
the -T option) which are not defined in /etc/mke2fs.conf.
Theodore Ts'o [Sat, 27 Nov 2010 00:25:26 +0000 (19:25 -0500)]
MCONFIG: Fix dependency definitions for the static and profiled blkid library
The dependency definitions for DEPSTATIC_LIBBLKID and
DEPPROFILED_LIBBLKID incorrectly referenced the non-dependency macros
(i.e., STATIC_LIBUUID instead of DEPSTATIC_LIBUUID). This resulted in
-luuid showing up as a Makefile dependency, which is of course wrong.
Theodore Ts'o [Sat, 27 Nov 2010 00:09:43 +0000 (19:09 -0500)]
e2fsck: Fix inode nlink accounting that could cause PROGRAMMING BUG errors
This fixes two possible causes for the error message:
WARNING: PROGRAMMING BUG IN E2FSCK!
OR SOME BONEHEAD (YOU) IS CHECKING A MOUNTED (LIVE) FILESYSTEM.
inode_link_info[X] is Y, inode.i_links_count is Z. They should be the same!
One cause which can trigger this message is when an inode has an
illegal link count > 65500 --- for example, 65535. This was the case
in the Debian Bug report #555456.
Another cause which could trigger this message is if an ext4 directory
previously had more than 65000 subdirectories (thus causing
i_link_count to be set to 1), but then some of the subdirectories were
deleted, such that i_link_count should now be the actual number of
subdirectories.
Lukas Czerner [Thu, 18 Nov 2010 13:38:41 +0000 (14:38 +0100)]
mke2fs: Add discard option into mke2fs.conf
Allow to specify discard in mke2fs.conf. Also change the way how to
specify default value for lazy_itable_init. It is better to have all
this defaulting done in the same place so do it in definition (as we do
with discard).
It would be nice to have consistent "discard" options in every system
tool (mount, fsck, mkfs) taking advantage of discards. Also "discard"
and "nodiscard" is more descriptive instead of just "-K" and can be
easily defaulted and it is something we can not do with "-K".
With this commit you need to specify extended option like this:
./mke2fs -T <fstype> -E nodiscard <device>
in order make a filesystem without discarding the device first. And
./mke2fs -T <fstype> -E discard <device>
respectively.
-K option is with this commit deprecated and should not be used anymore.
Theodore Ts'o [Mon, 22 Nov 2010 16:09:00 +0000 (11:09 -0500)]
mke2fs: Set logical/physical sector size from environment for debugging
If MKE2FS_DEVICE_SECTSIZE is set, then this will override the logical
sector size, which is the smallest sector size that can be written
atomically by the device. (Previously MKE2FS_DEVICE_SECTSIZE set the
physical sector size, which was incorrect given its historical usage.)
The environment variable MKE2FS_DEVICE_PHYS_SECTSIZE will set the
physical sector size, which is the actual sector size used by the
device in reality.
The logical sector size is always less than or equal to the physical
sector size; and writes smaller than the physical sector size but
greather than or equal to the logical sector size will cause a
read-modify-write cycle within the device firmware (or in some
abstract layer lower than the Linux block I/O subsystem, at any rate).
Theodore Ts'o [Mon, 22 Nov 2010 15:50:42 +0000 (10:50 -0500)]
mke2fs: Fill in min_io and opt_io with physical sector size
If the device does not have an explicitly specified minimum io_size or
optimal io_size, and the physical sector size is greater than the
block size, then use the physical sector size as a better-than-nothing
hint.
This should help for SSD's that have a physical sector size of 8k or
16k (which are reportedly will be coming soon).
Theodore Ts'o [Sun, 21 Nov 2010 14:56:53 +0000 (09:56 -0500)]
mke2fs: Do not require -F for block size < physical size
There will be SSD's out soon that have 8k or 16k phyiscal block sizes.
So don't enforce a requirement that the block size be less than the
physical block size unless the force option is given, and don't give a
warning if the user can't do anything about it (i.e., if the physical
block size is > than the page size).
Theodore Ts'o [Fri, 1 Oct 2010 14:47:38 +0000 (10:47 -0400)]
mke2fs: Enable lazy_itable_init if the kernel supports this feature
Add check for /sys/fs/ext4/features/lazy_itable_init. If this file
exists, it should be OK to skip initializing the inode table since the
kernel will do it at mount time.
e2fsck: Open the external journal in exclusive mode
This prevents accidentally replaying and resetting the journal while
it is mounted, due to an accidental attempt to run e2fsck on an LVM
snapshot of a file system with an external journal.
debugfs: Make the extents listing in the stat command more concise
Use "[u]" instead of "[uninit]" and limit the amount of detail printed
for the extent tree blocks, so it is more similar to the format used
for direct/indirect mapped inodes.
Allocate various memory structures to be properly aligned to avoid
needing to use a bounce buffer when doing direct I/O read/writes.
This should also help on FreeBSD systems which require aligned buffers
unconditionally.
ext2fs: Add Direct I/O support to the ext2fs library
This adds the basic support for Direct I/O to unix_io.c, and adds a
new flag EXT_FLAG_DIRECT_IO which can be passed to ext2fs_open() or
ext2fs_open2() to request Direct I/O support.
Note that device mapper devices in Linux don't support Direct I/O, and
in some circumstances using Direct I/O can actually make performance
*worse*!
Eric Sandeen [Fri, 20 Aug 2010 21:41:14 +0000 (16:41 -0500)]
mke2fs: use lazy inode init on some discard-able devices
If a device supports discard -and- returns 0s for discarded blocks,
then we can skip the inode table initialization -and- the inode table
zeroing at mkfs time, and skip the lazy init as well since they are
already zeroed out.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Wed, 8 Sep 2010 14:12:08 +0000 (16:12 +0200)]
e2fsck: Improve error message when device name misspelled
When a device name is misspelled, we output the full text about specifying
alternate superblock. This is slightly misleading because when the device
cannot be open because of ENOENT, this certainly won't help. So just print
that device does not exist and exit.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Thu, 29 Jul 2010 16:59:42 +0000 (11:59 -0500)]
mke2fs.8.in: clarify the sign of a block-size constraint.
This bit of the mke2fs manpage is slightly confusing:
-b block-size
Specify the size of blocks in bytes. <snip>
If block-size is negative, then mke2fs will use heuristics
to determine the appropriate block size, with the constraint
that the block size will be at least block-size bytes.
because it sounds like the block size will be at least a negative
number. Clarify just what the negative sign means.
Reported-by: Chris Frost <chris@frostnet.net> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Mon, 12 Jul 2010 18:27:44 +0000 (13:27 -0500)]
resize2fs: relax requirements for -P output a bit
Requiring an immediate pre-fsck before printing a minimum
resize size seems a bit draconian; if the fs isn't clean or marked
with error, then certainly, but for an informational minimum
size, I don't think we need to require a fsck since last mount.
I had simply copied the checks from the actual resize path,
previously.
Installers use this option (-P) to gather minimum resize info,
and requiring an actual fsck before use just seems to go too far.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck, resize2fs: fix a fp precision error that can lead to a seg fault
Commit 641b66b fixed a floating point precision error which can result
in a search algorithm looping forever. It can also result in an array
index being out of bounds and causing a segfault. Here are two more
cases in e2fsck and resize2fs that need to be fixed. I've just used
the same fix from the that commit.
e2fsck: Add missing ext2fs_close() call when going back to original superblock
In the case where the original superblock and the backup superblock
are both invalid in some way, e2fsck will try to go back to the
orignal superblock. To do that, it must close the attempted open
using the backup superblock first (since otherwise the exclusive open
will prevent the subsequent open from succeding).
Mike Frysinger [Tue, 5 Jan 2010 04:15:32 +0000 (23:15 -0500)]
e2freefrag: Fix getopt bug on machines with unsigned chars
The getopt() function returns an int, not a char. On systems where the
default char is unsigned (like ppc), we get weird behavior where -1 is
truncated to 0xff but compared to (int)-1.
Also fix this same bug for two test programs, test_rel and iscan,
which aren't currently used at the moment.
Addresses-Gentoo-Bug: #299386
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Enhance tst_super_size so that it checks the superblock fields as well
The test now checks to make sure the superblock fields are correctly
aligned and prints them out so they can be manually checked to make
sure they are where we expect them to be.
Theodore Ts'o [Fri, 25 Jun 2010 14:53:13 +0000 (10:53 -0400)]
Add superblock fields which track first and most recent fs errors
Add superblock fields which track where and when the first and most
recent file system errors occured. These fields are displayed by
dumpe2fs and cleared by e2fsck.