Lukas Czerner [Thu, 18 Nov 2010 03:38:38 +0000 (03:38 +0000)]
e2fsck: Discard free data and inode blocks.
In Pass 5 when we are checking block and inode bitmaps we have great
opportunity to discard free space and unused inodes on the device,
because bitmaps has just been verified as valid. This commit takes
advantage of this opportunity and discards both, all free space and
unused inodes.
I have added new set of options, 'nodiscard' and 'discard'. When the
underlying devices does not support discard, or discard ends with an
error, or when any kind of error occurs on the filesystem, no further
discard attempt will be made and the e2fsck will behave as it would
with nodiscard option provided.
As an addition, when there is any not-yet-zeroed inode table and
discard zeroes data, then inode table is marked as zeroed.
Lukas Czerner [Thu, 18 Nov 2010 03:38:37 +0000 (03:38 +0000)]
e2fsprogs: Add CHANNEL_FLAGS_DISCARD_ZEROES flag for io_manager
When the device have discard support and simultaneously discard zeroes
data (and it is properly advertised), then we can take advantage of such
behavior in several e2fsprogs tools.
Add new flag CHANNEL_FLAGS_DISCARD_ZEROES for struct_io_channel so
each io_manager can take advantage of this. The flag is properly set
according to BLKDISCARDZEROES ioctl in unix_open.
Also remove old mke2fs_discard_zeroes_data() function and substitute it
with helper which test this flag.
Lukas Czerner [Thu, 18 Nov 2010 03:38:36 +0000 (03:38 +0000)]
e2fsprogs: Add discard function into struct_io_manager
In order to provide generic "discard" function for all e2fsprogs tools
add a discard function prototype into struct_io_manager. Specific
function for specific io managers can be crated that way.
This commit also creates unix_discard function which uses BLKDISCARD
ioctl to discard data blocks on the block device and bind it into
unit_io_manager structure to be available for all e2fsprogs tools.
Note that BLKDISCARD is still Linux specific ioctl, however other
unix systems may provide similar functionality. So far the
unix_discard() remains linux specific hence is embedded in #ifdef
__linux__ macro.
Justin Maggard [Mon, 22 Nov 2010 22:32:28 +0000 (17:32 -0500)]
Fix EXT4_FEATURE_RO_COMPAT_HUGE_FILE check
Creating a 4TB file on a filesystem with the 64bit flag set results in
e2fsck consistently complaining about i_blocks being wrong, with
confusing messages like this:
Lukas Czerner [Thu, 18 Nov 2010 13:38:41 +0000 (14:38 +0100)]
mke2fs: Add discard option into mke2fs.conf
Allow to specify discard in mke2fs.conf. Also change the way how to
specify default value for lazy_itable_init. It is better to have all
this defaulting done in the same place so do it in definition (as we do
with discard).
It would be nice to have consistent "discard" options in every system
tool (mount, fsck, mkfs) taking advantage of discards. Also "discard"
and "nodiscard" is more descriptive instead of just "-K" and can be
easily defaulted and it is something we can not do with "-K".
With this commit you need to specify extended option like this:
./mke2fs -T <fstype> -E nodiscard <device>
in order make a filesystem without discarding the device first. And
./mke2fs -T <fstype> -E discard <device>
respectively.
-K option is with this commit deprecated and should not be used anymore.
Theodore Ts'o [Mon, 22 Nov 2010 16:09:00 +0000 (11:09 -0500)]
mke2fs: Set logical/physical sector size from environment for debugging
If MKE2FS_DEVICE_SECTSIZE is set, then this will override the logical
sector size, which is the smallest sector size that can be written
atomically by the device. (Previously MKE2FS_DEVICE_SECTSIZE set the
physical sector size, which was incorrect given its historical usage.)
The environment variable MKE2FS_DEVICE_PHYS_SECTSIZE will set the
physical sector size, which is the actual sector size used by the
device in reality.
The logical sector size is always less than or equal to the physical
sector size; and writes smaller than the physical sector size but
greather than or equal to the logical sector size will cause a
read-modify-write cycle within the device firmware (or in some
abstract layer lower than the Linux block I/O subsystem, at any rate).
Theodore Ts'o [Mon, 22 Nov 2010 15:50:42 +0000 (10:50 -0500)]
mke2fs: Fill in min_io and opt_io with physical sector size
If the device does not have an explicitly specified minimum io_size or
optimal io_size, and the physical sector size is greater than the
block size, then use the physical sector size as a better-than-nothing
hint.
This should help for SSD's that have a physical sector size of 8k or
16k (which are reportedly will be coming soon).
Theodore Ts'o [Sun, 21 Nov 2010 14:56:53 +0000 (09:56 -0500)]
mke2fs: Do not require -F for block size < physical size
There will be SSD's out soon that have 8k or 16k phyiscal block sizes.
So don't enforce a requirement that the block size be less than the
physical block size unless the force option is given, and don't give a
warning if the user can't do anything about it (i.e., if the physical
block size is > than the page size).
Theodore Ts'o [Fri, 1 Oct 2010 14:47:38 +0000 (10:47 -0400)]
mke2fs: Enable lazy_itable_init if the kernel supports this feature
Add check for /sys/fs/ext4/features/lazy_itable_init. If this file
exists, it should be OK to skip initializing the inode table since the
kernel will do it at mount time.
e2fsck: Open the external journal in exclusive mode
This prevents accidentally replaying and resetting the journal while
it is mounted, due to an accidental attempt to run e2fsck on an LVM
snapshot of a file system with an external journal.
e2fsck: Set i_blocks_hi when correcting the i_blocks field in pass #1
For file systems with 64-bit block numbers, we need to make sure we
correct the i_blocks_hi field as well as the i_blocks field when
setting it to the correct value.
debugfs: Make the extents listing in the stat command more concise
Use "[u]" instead of "[uninit]" and limit the amount of detail printed
for the extent tree blocks, so it is more similar to the format used
for direct/indirect mapped inodes.
Allocate various memory structures to be properly aligned to avoid
needing to use a bounce buffer when doing direct I/O read/writes.
This should also help on FreeBSD systems which require aligned buffers
unconditionally.
ext2fs: Add Direct I/O support to the ext2fs library
This adds the basic support for Direct I/O to unix_io.c, and adds a
new flag EXT_FLAG_DIRECT_IO which can be passed to ext2fs_open() or
ext2fs_open2() to request Direct I/O support.
Note that device mapper devices in Linux don't support Direct I/O, and
in some circumstances using Direct I/O can actually make performance
*worse*!
Eric Sandeen [Fri, 20 Aug 2010 21:41:14 +0000 (16:41 -0500)]
mke2fs: use lazy inode init on some discard-able devices
If a device supports discard -and- returns 0s for discarded blocks,
then we can skip the inode table initialization -and- the inode table
zeroing at mkfs time, and skip the lazy init as well since they are
already zeroed out.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Wed, 8 Sep 2010 14:12:08 +0000 (16:12 +0200)]
e2fsck: Improve error message when device name misspelled
When a device name is misspelled, we output the full text about specifying
alternate superblock. This is slightly misleading because when the device
cannot be open because of ENOENT, this certainly won't help. So just print
that device does not exist and exit.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Thu, 29 Jul 2010 16:59:42 +0000 (11:59 -0500)]
mke2fs.8.in: clarify the sign of a block-size constraint.
This bit of the mke2fs manpage is slightly confusing:
-b block-size
Specify the size of blocks in bytes. <snip>
If block-size is negative, then mke2fs will use heuristics
to determine the appropriate block size, with the constraint
that the block size will be at least block-size bytes.
because it sounds like the block size will be at least a negative
number. Clarify just what the negative sign means.
Reported-by: Chris Frost <chris@frostnet.net> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
libext2fs: fix obvious big-endian bugs introduced by 64-bit changes
These patches fix obvious bone-headed mistakes, so e2fsprogs will now
build and mostly work on powerpc. The m_meta_bg, u_mke2fs, and
u_tune2fs tests are still failing, however, so there's still work to do...
libext2fs: Don't make a copy of the inode in ext2fs_extent_open2()
Previously, ext2fs_extent_open2() copied the passed-in inode structure
into the extent handle, and the extent functions modified the copy of
the inode structure if necessary due to extent splits, etc. Change
ext2fs_extent_open2() so that the extent functions use the inode
structure passed into ext2fs_extent_open2(). Otherwise the passed-in
inode structure could become out of date due to changes made by the
extent functions.
Eric Sandeen [Mon, 12 Jul 2010 18:27:44 +0000 (13:27 -0500)]
resize2fs: relax requirements for -P output a bit
Requiring an immediate pre-fsck before printing a minimum
resize size seems a bit draconian; if the fs isn't clean or marked
with error, then certainly, but for an informational minimum
size, I don't think we need to require a fsck since last mount.
I had simply copied the checks from the actual resize path,
previously.
Installers use this option (-P) to gather minimum resize info,
and requiring an actual fsck before use just seems to go too far.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
e2fsck, resize2fs: fix a fp precision error that can lead to a seg fault
Commit 641b66b fixed a floating point precision error which can result
in a search algorithm looping forever. It can also result in an array
index being out of bounds and causing a segfault. Here are two more
cases in e2fsck and resize2fs that need to be fixed. I've just used
the same fix from the that commit.
e2fsck: Add missing ext2fs_close() call when going back to original superblock
In the case where the original superblock and the backup superblock
are both invalid in some way, e2fsck will try to go back to the
orignal superblock. To do that, it must close the attempted open
using the backup superblock first (since otherwise the exclusive open
will prevent the subsequent open from succeding).
Mike Frysinger [Tue, 5 Jan 2010 04:15:32 +0000 (23:15 -0500)]
e2freefrag: Fix getopt bug on machines with unsigned chars
The getopt() function returns an int, not a char. On systems where the
default char is unsigned (like ppc), we get weird behavior where -1 is
truncated to 0xff but compared to (int)-1.
Also fix this same bug for two test programs, test_rel and iscan,
which aren't currently used at the moment.
Addresses-Gentoo-Bug: #299386
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Enhance tst_super_size so that it checks the superblock fields as well
The test now checks to make sure the superblock fields are correctly
aligned and prints them out so they can be manually checked to make
sure they are where we expect them to be.
Theodore Ts'o [Fri, 25 Jun 2010 14:53:13 +0000 (10:53 -0400)]
Add superblock fields which track first and most recent fs errors
Add superblock fields which track where and when the first and most
recent file system errors occured. These fields are displayed by
dumpe2fs and cleared by e2fsck.
Eric Sandeen [Mon, 14 Jun 2010 01:00:00 +0000 (21:00 -0400)]
libext2fs: make fs->group_desc opaque
To prevent direct array indexing of fs->group_desc[i] (because the
group_desc may be a different size for different filesystems) make it
an opaque pointer that may only be accessed through the accessor
functions in blknum.c. The type itself is still available in a public
header; if we have a group_desc that we know is one type or another,
it's ok to access its fields directly. This change only prevents us
from indexing off fs->group_desc[i] directly.
Old-style applications who don't want to change their source code can
(as a temporary short-term hack) #define EXT2FS_OLD_32_COMPAT before
including ext2fs.h.
Change the accessors in blknum.c to use ext4fs_group_desc(), a version
of ext2fs_group_desc() which returns a ext4_group_desc pointer.
This simplifies and collapses a fair bit of code in blknum.c
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Sun, 13 Jun 2010 18:00:00 +0000 (14:00 -0400)]
mke2fs: Fix up the fs type and feature selection for 64-bit blocks
We need to defer setting the blocks count field in the fs_param
structure until it is known whether 64-bit feature will be enabled
(and whether the blocks count is valid).
We also add a new mke2fs.conf configuration parameter,
auto_64-bit_support which will automatically enable the 64-bit feature
if the number of blocks requires it.
Andreas Dilger [Wed, 2 Jun 2010 02:21:42 +0000 (22:21 -0400)]
Reserve the EXT4_FEATURE_INCOMPAT_DIRDATA feature flag
Reserve the EXT4_FEATURE_INCOMPAT_DIRDATA feature flag for adding
extra file data in ext2_dir_entry_2 entries.
This changes the on-disk layout in the following way.
Firstly, the ext2_dir_entry_2 file_type field now has a mask: that
limits the "filetype" information to the low 4 bits of this field.
Since these values are sequentially assigned, this allows for up to 7
more filetypes to be assigned. When reading the "filetype" field, the
high 4 bits should be masked off when converting to DT_* filetypes for
userspace.
The high 4 bits of "filetype" are used as a bitmask to register up to
4 different "extended" directory entry fields. Extended data fields
are packed without alignment into the directory entry after the "name"
field in order of increasing bitmask value, for each field where bit
is set. In order to avoid the need to "understand" each of the
extended fields, the first byte of each extended data field holds the
size of that data field (including the size itself), so they can be
skipped if not understood. For fields that change the semantics of
the filesystem it is expected that a separate ROCOMPAT or INCOMPAT
field is registered.
There is a single dirent data type defined currently, for Lustre:
which holds a 128-bit file identifier. It is expected that if there
are 64-bit inode values that this will be assigned the 0x20 value.
Should a need ever arise to use all 4 of the extended dirent data
fields, it would be possible to keep the last bit (0x80) for use as a
multiplexor that stores a 1-byte aggregate data size, then a series of
"<u8_size><u8_type><data>" records in the last extended data record.
It is not expected that this will actually be needed in the lifetime
of ext4.
Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Andreas Dilger [Wed, 2 Jun 2010 02:20:22 +0000 (22:20 -0400)]
Reserve the EXT4_INCOMPAT_EA_INODE feature flag
Reserve the EXT4_INCOMPAT_EA_INODE feature flag for use with
large extended attributes that are stored in a separate inode.
This changes the on-disk format in several ways:
First, replace the e_value_block field with e_value_inum, so that
an xattr entry can reference an external inode. This field is
currently unused, as all of the entries live in the same block.
struct ext2_ext_attr_entry {
__u8 e_name_len; /* length of name */
__u8 e_name_index; /* attribute name index */
__le16 e_value_offs; /* offset in disk block of value */
> __le32 e_value_inum; /* inode in which the value is stored */
__le32 e_value_size; /* size of attribute value */
__le32 e_hash; /* hash value of name and value */
char e_name[0]; /* attribute name */
}
Second, add a flag to the inode that indicates it is using a large
(external) extended attribute. This is needed so that when unlinking
an inode the xattrs will be scanned to unlink the xattr inodes
referenced by the main inode.
Third, for inodes that have a number of xattrs that are larger than
a single block, but not large enough to justify an external inode
(less than 64kB total xattr size, due to e_value_offs limitation)
the ext2_ext_attr_header->h_blocks field can grow beyond a single
block to represent a contiguous allocation of blocks for the xattr.
Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Wed, 19 May 2010 16:14:39 +0000 (12:14 -0400)]
tune2fs: Enable uninit_bg to be set without requiring an fsck
Allow the uninit_bg feature to be set without requiring an fsck. The
first full fsck will require scanning all of the inode table blocks,
but subsequent fsck's will be fast. This allows flexibility over
requiring a full fsck after setting this feature, which is what
tune2fs previously mandated.
Theodore Ts'o [Tue, 18 May 2010 03:48:52 +0000 (23:48 -0400)]
Fix dpkg-buildpackage -j2
Add a dependency on subs to all-libs-recursive and all-progs-recursive
to dpkg-buildpackage -j2, since it builds make target 'libs'
explicitly, and we need to make sure the 'subs' target is run first.
Theodore Ts'o [Tue, 18 May 2010 02:54:06 +0000 (22:54 -0400)]
e2fsck: Don't set the group descriptor checksums if the fsck was cancelled
It's a bad idea to set the checksums if e2fsck is aborted by the user,
and it often causes an error message, "Inode bitmap not loaded while
setting block group checksum info".
Theodore Ts'o [Tue, 18 May 2010 02:31:45 +0000 (22:31 -0400)]
mke2fs: account for physical as well as logical sector size
Some devices, notably 4k sector drives, may have a 512 logical
sector size, mapped onto a 4k physical sector size.
When mke2fs is ratcheting down the blocksize for small filesystems,
or when a blocksize is specified on the commandline, we should not
willingly go below the physical sector size of the device.
When a blocksize is specified, we -must- not go below
the logical sector size of the device.
Add a new library function, ext2fs_get_device_phys_sectsize()
to get the physical sector size if possible, and adjust the
logic in mke2fs to enforce the above rules.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 18 May 2010 01:31:56 +0000 (21:31 -0400)]
libe2p, libext2fs: Update file copyright permission states to match COPYING
The top-level COPYING file states that the e2p and ext2fs libraries
are available under the LGPLv2. The files were incorrectly labelled.
Alex Thomas/Luster has been consulted wrt to the ext3_extents.h file;
the rest of the files were primarily authored by Theodore Ts'o.
Theodore Ts'o [Mon, 17 May 2010 23:32:19 +0000 (19:32 -0400)]
Always build namei.o so that building with "configure --disable-debugfs" works
namei.o is also needed by e2initrd_helper.
Long term, if we care about reduced e2fsprogs builds, we need a more
general solution for deciding what .o files are needed for a
particular build. Given that install floppies are going (gone?) the
way the dodo bird, we probably don't care, though.
Theodore Ts'o [Mon, 17 May 2010 23:21:42 +0000 (19:21 -0400)]
Add configure options --enable-symlink-build and --enable-symlink-install
These options allow e2fsprogs to be built using symlinks instead of
hard links, and to be installed using symlinks instead of hard links,
respectively.
Theodore Ts'o [Sat, 15 May 2010 11:48:25 +0000 (07:48 -0400)]
libcom_err: Only output ^M when tty is in raw mode
This fixes a long-standing botch in the com_err library, and solves a
regression test problem for libss that gets tickled by source code
management systems (like Perforce) that don't preserve CRLF line
endings with fidelity.