Iustin Pop [Thu, 12 Jun 2008 07:30:04 +0000 (09:30 +0200)]
badblocks: implement read throttling
Currently, badblocks will read as fast as it can from the drive. While
this is what one wants usually, if badblocks is run in read-only mode on
a drive that is in use, it will greatly degrade the other users of this
disk.
This patch adds a throttling mode for reads where each read will be
delayed by a percentage of the time the previous read took; i.e., an
invocation of '-d 100' will cause the sleep to be the same as the read
took, a value of 200 will cause the sleep to be twice as high, and a
value of 50 will cause it to be half. This will not be done if the
previous read had errors, since then the hardware will possibly have
timeouts and that would decrease the speed too much.
This algorithm helps when the disk is used by other processes as then,
due to the increased load, the time spent doing the reads will be
higher, and correspondingly badblocks will sleep even more and thus it
will use less of the drive's bandwidth. This is different from using
ionice, as it is a voluntary (and partial) throttling.
Signed-off-by: Iustin Pop <iustin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Iustin Pop [Wed, 11 Jun 2008 15:55:18 +0000 (17:55 +0200)]
badblocks: fix a possible bug in parse_uint
Currently, the parse_uint() function checks errno after the strtoul()
call. But, according to the man page of strtoul():
Since strtoul() can legitimately return 0 or LONG_MAX (LLONG_MAX for
strtoull()) on both success and failure, the calling program
should set errno to 0 before the call, and then determine if an error
occurred by checking whether errno has a nonzero value after the call.
When using locales, it can happen that looking for the locale files is
not successful, and therefore errno will have a nonzero value from this.
And since the argument parsing is one of the first things done after
startup, parse_uint() will wrongly report errors.
The fix is to simply reset errno to zero before calling strtoul().
Signed-off-by: Iustin Pop <iustin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Iustin Pop [Wed, 11 Jun 2008 11:12:17 +0000 (13:12 +0200)]
badblocks: add a max bad blocks count option
Currently, badblocks will continue scanning the device until it reaches
last_block, even though it might be that the drive is not responding
at all anymore.
This patch introduces a new parameter ('-e') that allows one to specify
the maximum bad block count; if badblocks sees more than this number, it
will abort the test.
While this is not useful for testing a device that will need to be used
as a filesystem (because we don't get an exhaustive list of bad blocks),
it is useful for testing if a device has bad blocks at all: for example,
with a count of 1, it will finish after the first error thus not needing
to test the whole device if the only purpose of the test is to check for
any bad blocks.
Signed-off-by: Iustin Pop <iustin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Sat, 7 Jun 2008 12:55:21 +0000 (08:55 -0400)]
Fix ext2fs_swap{16,32,64} for external applications on big-endian machines
The public header files depend on the the autoconf defines
WORDS_BIGENDIAN and HAVE_SYS_TYPES_H, so we add them to ext2_types.h
so that external programs which try to use ext2fs_swap*() will work
correctly on big-endian systems. Fortunately, few if any programs
need to use this libext2's byte-swap functions directly.
This patch adds ZFS filesystem detection to libblkid.
It probes for VDEV_BOOT_MAGIC in the first 2 ZFS labels in big-endian
and little-endian formats.
Unfortunately the probe table doesn't support probing from the end of
the device, otherwise we could also probe in the 3rd and 4th labels (in
case the first 2 labels were accidentally overwritten)..
Eventually we would set the UUID from the ZFS pool GUID and the LABEL tag
from the pool name, but that requires parsing an XDR encoding of the pool
configuration which is not trivial.
Signed-off-by: Ricardo M. Correia <Ricardo.M.Correia@Sun.COM> Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Tue, 3 Jun 2008 00:12:34 +0000 (20:12 -0400)]
e2fsck: Detect unordered extents in an extent node
The logical block numbers must be monotonically increasing, and there
must not be any overlapping extents. If any are found, report them as
filesystem corruption.
Theodore Ts'o [Mon, 2 Jun 2008 21:27:59 +0000 (17:27 -0400)]
e2fsck: Wire up callback functions for _alloc_block() and _block_alloc_stats()
Wire up callback functions for ext2fs_alloc_block() and
ext2fs_block_alloc_stats() so that we use the ctx->block_found_map
block bitmap to determine which new block we should allocate, and then
to update the block_found_map bitmap if the extent functions need to
allocate or release blocks.
Theodore Ts'o [Mon, 2 Jun 2008 21:21:37 +0000 (17:21 -0400)]
libext2fs: Add callback functions for _alloc_block() and _block_alloc_stats()
Add callback functions for ext2fs_alloc_block() and
ext2fs_block_alloc_stats(). This is needed so e2fsck can be informed
when the extent_set_bmap() function needs to allocate or deallocate
blocks.
Eric Sandeen [Tue, 20 May 2008 15:17:46 +0000 (10:17 -0500)]
libext2fs: add new function ext2fs_extent_set_bmap()
Allows unmapping or remapping single mapped logical blocks,
and mapping currently unmapped blocks.
Also implements ext2fs_extent_fix_parents() to fix parent
index logical starts, if the first index of a node changes
its logical start block.
Currently this can result in unnecessary new single-block extents; I
think perhaps ext2fs_extent_insert should grow a flag to request
merging with a nearby extent?
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Sandeen [Tue, 20 May 2008 15:14:20 +0000 (10:14 -0500)]
libext2fs: Teach extent.c how to split nodes
When called for a given handle, the new function extent_node_split()
will split the current node such that half of the node's entries will
be moved to a new tree block. The parent will then be updated to
point to the (now smaller) original node as well as the new node.
If the root node is requested to be split, it will move all
entries out to a new node, and leave a single entry in the
root pointing to that new node.
If the reqested split node's parent is full it will recursively
split up to the root to make room for the new node's insertion.
If you ask to split a non-root node with only one entry,
it will refuse (we'd have an empty node otherwise).
It also updates the i_blocks count when a new block has
successfully been connected to the tree.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Theodore Ts'o [Mon, 2 Jun 2008 04:08:19 +0000 (00:08 -0400)]
ext2fs_extent_open: If the inode is empty, initialize the extent tree
If the inode's i_block[] array is completely empty, create an empty
extent tree in the in-core inode and set the EXT4_EXTENT_FL inode
flag. This makes it easy to create a new inode using extents.
Theodore Ts'o [Fri, 30 May 2008 19:23:24 +0000 (15:23 -0400)]
e2fsck: Don't double count an extent after deleting the last extent
ext2fs_extent_delete() will leave the extent handle pointing at the
next extent --- except if the last extent in the node. To deal with
this last case, call ext2fs_get_extent_info() and stop scanning after
processing info->num_entries extents.
Theodore Ts'o [Wed, 28 May 2008 08:54:44 +0000 (04:54 -0400)]
e2fsck: Don't skip an extent after deleting an invalid extent
ext2fs_delete_extent() deletes the current extent and moves to the
next extent (if present). So we need to skip moving to the next
extent and get the (new) current extent and check it before moving on.
Remove default sizeof sizes in configure script when cross-compiling
Since version 2.50 autoconf fully supports checking sizes of types
(with AC_CHECK_SIZEOF) when cross-compiling. Therefore there is no
need to preset the respective cache variables anymore. The following
patch removes the special case. There is no need to adjust AC_PREREQ
as it's set to 2.50 already.
Tested successfully cross-building for the mips64el-linux-gnu host on
an i386-linux-gnu build system, removing the following warning
(because of a mismatch for the "long" type):
Sizeof(__U64__TYPEDEF) is 4 should be 8
Problem detected with asm_types.h
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Tue, 20 May 2008 18:51:14 +0000 (14:51 -0400)]
e2fsck: Fix potential data corruptor bug in journal recovery
While synchronizing e2fsck's recovery.c with the latest 2.6 kernel
sources, I discovered a serious bug that apparently had been fixed in
the kernel sometime between Deceber 2003 and April 2005, but which had
not been carried over to e2fsprogs. Specifically, when blocks whose
first 4 bytes are JFS_MAGIC_NUMBER (0xc03b3998) are written into the
journal, the first 4 bytes zero'ed out. A one character typo meant
that when the blocks were replayed by e2fsck, the JFS_MAGIC_NUMBER
would not be restored.
Oops.
Fortunately, it is *highly* unlikely that ext4 metadata blocks will
contain that magic number in the first four bytes, and data=journalled
is a relatively rarely used.
This commit fixes this bug, as well as updating e2fsck's recovery.c to
be in sync with 2.6.25.
Eric Sandeen [Mon, 12 May 2008 18:26:51 +0000 (13:26 -0500)]
- fix swap sanity tests in blkid, and blkid tests
Swap is actually native-endian on disk, and with the latest
swapspace sanity checks I added we need to have native swapspace
examples in the blkid tests, so re-mkswap them during testing.
One one other required change, though; mkswap requires at least
10 pages of swap, so the image needs to be increased to 10x64k
if mkswap is to succeed...
Maybe it'd be better to just dd it out on the fly?
Addresses-redhat-bugzilla: 445786
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
With the new mke2fs changes the output of the
command differs if we run mke2fs on a device that
already have the file system. So erase the file system
before running mke2fs so that output remain as expected.
Add new option -I <inode_size> to tune2fs. This is used to change the
inode size. The size need to be multiple of 2 and we don't allow to
decrease the inode size.
As a part of increasing the inode size we increase the inode table
size. We also move the used data blocks around and update the
respective inodes to point to the new block
tune2fs uses the undo I/O manager when migrating to large inode. This
helps in reverting the changes if end results are not correct. The
environment variable TUNE2FS_UNDO_DIR is used to indicate the
directory within which the tdb file need to be created. The file will
be named tune2fs-<device-name> If TUNE2FS_UNDO_DIR is not set
/var/lib/e2fsprogs is used
When running mke2fs, if a file system is detected
on the device, we use Undo I/O manager as the io manager.
This helps in reverting the changes made to the filesystem
in case we wrongly selected the device.
mke2fs: New bitmap and inode table allocation for FLEX_BG
Change the way we allocate bitmaps and inode tables if the FLEX_BG
feature is used at mke2fs time. It places calculates a new offset for
bitmaps and inode table base on the number of groups that the user
wishes to pack together using the new "-G" option. Creating a
filesystem with 64 block groups in a flex group can be done by:
mke2fs -j -I 256 -O flex_bg -G 32 /dev/sdX
Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
ext2fs_open_inode_scan: Handle an non-zero bg_itable_used in block group 0
Previously, the portion of the inode table for block group 0 was
always completely zero'ed out, so the ext2fs_open_inode_scan() didn't
handle a non-zero bg_itable_used value for the first block group. Fix
this.
This fixes some bugs which I introduced recently while revamping the
uninit_bg code. Since mke2fs is no longer calling
ext2fs_set_gdt_csum(), it's important that ext2fs_initialize()
correctly initialize bg_itable_unused for all block group descriptors.
In addition, mke2fs needs to zero out the the reserved inodes based on
the values of bg_itable_unused set by ext2fs_initialize().
blkid: Keep cached filesystem information on EACCES and ENOENT errors
When a nonprivileged user uses the blkid command, we want to keep the
cached filesystem information, and opening a device file could result
in an EACCESS or ENOENT (if an intervening directory is mode 700). We
were previously testing for EPERM, which was really the wrong error
code to be testing against.
Transfer responsibility of setting the *_UNINIT flags to libext2fs
Mke2fs used to have special case, ugly code in
setup_lazy_bg/setup_uninit_bg flag which set the flags based on all
sorts of special cases. Change it so that it is done in libext2fs,
and fix mke2fs to use alloc_stats functions which will take care of
clearing the *_UNINIT flags automatically as needed.
This is preparatory work to make the flex_bg allocation patch much
cleaner.
Add an explanation of exactly what ext2fs_super_and_bgd_loc() and
ext2fs_reserve_super_and_bgd_loc() do, and more importantly, exactly
what they return. Note that most callers should *not* rely on the
return value since it's rarely useful, especially once the flex_bg
feature is enabled and inode table and allocation bitmap blocks may
not be in the block group.
ext2fs_set_gdt_csum: Remove setting of BLOCK_UNINIT
This function tried to set BLOCK_UNINIT based on the return value of
ext2fs_super_and_bgd_loc. That's not something that works once we
start allowing flex_bg since the block group metadata might not be
located in the blockgroup itself.
ext2fs_set_gdt_csum: Remove bogus setting of ITABLE_ZEROED
It used to be the case that ext2fs_set_gdt_csum set the ITABLE_ZEROED
flag if the INODE_UNINIT is not set. This assumed that the only
caller of ext2fs_set_gdt_csum was e2fsck (which was not true), and
that e2fsck would take care of zeroing the inode table (whic was also
not true).