Mark Tinguely [Fri, 16 Aug 2013 18:12:43 +0000 (18:12 +0000)]
xfsprogs: fix inode crash in xfs_repair
Adding the lost+found in phase 6 could allocate an inode from
a new inode chunk. Since this chunk was not around in phase 3
when the inode chunks are verificated and added to the avl tree,
the avl tree look up will return a NULL pointer. This results
in a NULL defererence and segmentation fault.
Add the newly created inode chunk as if found in the chunk
verification phase.
Brian Foster [Fri, 9 Aug 2013 12:32:26 +0000 (12:32 +0000)]
xfsprogs/io: add readdir command
readdir reads the directory entries from an open directory from
the provided offset (or 0 if not specified). On completion,
readdir prints summary information regarding the number of
operations and bytes transferred. Options are available to specify
the starting offset, length and verbose mode to dump directory
entry information.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com>
/* account for newly allocated blocks in reserved blocks total */
args->total -= dp->i_d.di_nblocks - nblks;
but every call into this path from proto file parsing started
reserved / args->total as only "1" as passed tro newdirent() -
so if we allocate a block, args->total hits 0, and then in
xfs_dir2_node_addname():
/*
* Add the new leaf entry.
*/
rval = xfs_dir2_leafn_add(blk->bp, args, blk->index);
if (rval == 0) {
...
} else {
/*
* It didn't work, we need to split the leaf block.
*/
if (args->total == 0) {
ASSERT(rval == ENOSPC);
goto done;
}
/*
* Split the leaf block and insert the new entry.
*/
we hit the args->total == 0 special case, and don't do the next
split, and ENOSPC gets returned all the way up, and we fail.
So rather than calling newdirent with a total of "1" in every case,
which doesn't account for possible tree splits, we should call it
with a more appropriate value: XFS_DIRENTER_SPACE_RES(mp, name->len),
which will handle the maximum nr of block allocations that might be
needed during a directory entry insert.
Since the reservation required doesn't depend on entry type,
just push this down a level, into newdirent() itself.
Reported-by: Boris Ranto <branto@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:25:01 +0000 (22:25 +1000)]
xfs: don't emit v5 superblock warnings on write
We write the superblock every 30s or so which results in the
verifier being called. Right now that results in this output
every 30s:
XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
Use of these features in this kernel is at your own risk!
And spamming the logs.
We don't need to check for whether we support v5 superblocks or
whether there are feature bits we don't support set as these are
only relevant when we first mount the filesytem. i.e. on superblock
read. Hence for the write verification we can just skip all the
checks (and hence verbose output) altogether.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:25:00 +0000 (22:25 +1000)]
xfs: rework remote attr CRCs
Note: this changes the on-disk remote attribute format. I assert
that this is OK to do as CRCs are marked experimental and the first
kernel it is included in has not yet reached release yet. Further,
the userspace utilities are still evolving and so anyone using this
stuff right now is a developer or tester using volatile filesystems
for testing this feature. Hence changing the format right now to
save longer term pain is the right thing to do.
The fundamental change is to move from a header per extent in the
attribute to a header per filesytem block in the attribute. This
means there are more header blocks and the parsing of the attribute
data is slightly more complex, but it has the advantage that we
always know the size of the attribute on disk based on the length of
the data it contains.
This is where the header-per-extent method has problems. We don't
know the size of the attribute on disk without first knowing how
many extents are used to hold it. And we can't tell from a
mapping lookup, either, because remote attributes can be allocated
contiguously with other attribute blocks and so there is no obvious
way of determining the actual size of the atribute on disk short of
walking and mapping buffers.
The problem with this approach is that if we map a buffer
incorrectly (e.g. we make the last buffer for the attribute data too
long), we then get buffer cache lookup failure when we map it
correctly. i.e. we get a size mismatch on lookup. This is not
necessarily fatal, but it's a cache coherency problem that can lead
to returning the wrong data to userspace or writing the wrong data
to disk. And debug kernels will assert fail if this occurs.
I found lots of niggly little problems trying to fix this issue on a
4k block size filesystem, finally getting it to pass with lots of
fixes. The thing is, 1024 byte filesystems still failed, and it was
getting really complex handling all the corner cases that were
showing up. And there were clearly more that I hadn't found yet.
It is complex, fragile code, and if we don't fix it now, it will be
complex, fragile code forever more.
Hence the simple fix is to add a header to each filesystem block.
This gives us the same relationship between the attribute data
length and the number of blocks on disk as we have without CRCs -
it's a linear mapping and doesn't require us to guess anything. It
is simple to implement, too - the remote block count calculated at
lookup time can be used by the remote attribute set/get/remove code
without modification for both CRC and non-CRC filesystems. The world
becomes sane again.
Because the copy-in and copy-out now need to iterate over each
filesystem block, I moved them into helper functions so we separate
the block mapping and buffer manupulations from the attribute data
and CRC header manipulations. The code becomes much clearer as a
result, and it is a lot easier to understand and debug. It also
appears to be much more robust - once it worked on 4k block size
filesystems, it has worked without failure on 1k block size
filesystems, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:59 +0000 (22:24 +1000)]
xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
xfs_attr3_leaf_compact() uses a temporary buffer for compacting the
the entries in a leaf. It copies the the original buffer into the
temporary buffer, then zeros the original buffer completely. It then
copies the entries back into the original buffer. However, the
original buffer has not been correctly initialised, and so the
movement of the entries goes horribly wrong.
Make sure the zeroed destination buffer is fully initialised, and
once we've set up the destination incore header appropriately, write
is back to the buffer before starting to move entries around.
While debugging this, the _d/_s prefixes weren't sufficient to
remind me what buffer was what, so rename then all _src/_dst.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:58 +0000 (22:24 +1000)]
xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
xfs_attr3_leaf_unbalance() uses a temporary buffer for recombining
the entries in two leaves when the destination leaf requires
compaction. The temporary buffer ends up being copied back over the
original destination buffer, so the header in the temporary buffer
needs to contain all the information that is in the destination
buffer.
To make sure the temporary buffer is fully initialised, once we've
set up the temporary incore header appropriately, write is back to
the temporary buffer before starting to move entries around.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:57 +0000 (22:24 +1000)]
xfs: correctly map remote attr buffers during removal
If we don't map the buffers correctly (same as for get/set
operations) then the incore buffer lookup will fail. If a block
number matches but a length is wrong, then debug kernels will ASSERT
fail in _xfs_buf_find() due to the length mismatch. Ensure that we
map the buffers correctly by basing the length of the buffer on the
attribute data length rather than the remote block count.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:56 +0000 (22:24 +1000)]
xfs: remote attribute tail zeroing does too much
When an attribute data does not fill then entire remote block, we
zero the remaining part of the buffer. This, however, needs to take
into account that the buffer has a header, and so the offset where
zeroing starts and the length of zeroing need to take this into
account. Otherwise we end up with zeros over the end of the
attribute value when CRCs are enabled.
While there, make sure we only ask to map an extent that covers the
remaining range of the attribute, rather than asking every time for
the full length of remote data. If the remote attribute blocks are
contiguous with other parts of the attribute tree, it will map those
blocks as well and we can potentially zero them incorrectly. We can
also get buffer size mistmatches when trying to read or remove the
remote attribute, and this can lead to not finding the correct
buffer when looking it up in cache.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:55 +0000 (22:24 +1000)]
xfs: remote attribute read too short
Reading a maximally size remote attribute fails when CRCs are
enabled with this verification error:
XFS (vdb): remote attribute header does not match required off/len/owner)
There are two reasons for this, the first being that the
length of the buffer being read is determined from the
args->rmtblkcnt which doesn't take into account CRC headers. Hence
the mapped length ends up being too short and so we need to
calculate it directly from the value length.
The second is that the byte count of valid data within a buffer is
capped by the length of the data and so doesn't take into account
that the buffer might be longer due to headers. Hence we need to
calculate the data space in the buffer first before calculating the
actual byte count of data.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:54 +0000 (22:24 +1000)]
xfs: remote attribute allocation may be contiguous
When CRCs are enabled, there may be multiple allocations made if the
headers cause a length overflow. This, however, does not mean that
the number of headers required increases, as the second and
subsequent extents may be contiguous with the previous extent. Hence
when we map the extents to write the attribute data, we may end up
with less extents than allocations made. Hence the assertion that we
consume th enumber of headers we calculated in the allocation loop
is incorrect and needs to be removed.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:53 +0000 (22:24 +1000)]
xfs: remote attribute lookups require the value length
When reading a remote attribute, to correctly calculate the length
of the data buffer for CRC enable filesystems, we need to know the
length of the attribute data. We get this information when we look
up the attribute, but we don't store it in the args structure along
with the other remote attr information we get from the lookup. Add
this information to the args structure so we can use it
appropriately.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:51 +0000 (22:24 +1000)]
xfs: Remote attr validation fixes and optimisations
- optimise the calcuation for the number of blocks in a remote xattr.
- check attribute length against MAX_XATTR_SIZE, not MAXPATHLEN
- whitespace fixes
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:26:10 +0000 (10:26 +1000)]
xfs_repair: support CRC enabled remote symlinks
Add support for verifying the contents of remote symlinks with CRCs.
Factor the remote symlink checking code out of the symlink function
so that it is clear what it is checking. This also reduces the
indentation and makes the code clearer.
Then add support for the CRC format by modelling the checking
function directly on the code that is used in the kernel for reading
and checking both remote symlink formats.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:26:09 +0000 (10:26 +1000)]
libxfs: fix dir3 freespace block corruption
When the directory freespace index grows to a second block (2017
4k data blocks in the directory), the initialisation of the second
new block header goes wrong. The write verifier fires a corruption
error indicating that the block number in the header is zero. This
was being tripped by xfs/110.
The problem is that the initialisation of the new block is done just
fine in xfs_dir3_free_get_buf(), but the caller then users a dirv2
structure to zero on-disk header fields that xfs_dir3_free_get_buf()
has already zeroed. These lined up with the block number in the dir
v3 header format.
While looking at this, I noticed that the struct xfs_dir3_free_hdr()
had 4 bytes of padding in it that wasn't defined as padding or being
zeroed by the initialisation. Add a pad field declaration and fully
zero the on disk and in-core headers in xfs_dir3_free_get_buf() so
that this is never an issue in the future. Note that this doesn't
change the on-disk layout, just makes the 32 bits of padding in the
layout explicit.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:26:08 +0000 (10:26 +1000)]
xfs_repair: fix btree block magic number mapping
The magic numbers for generic btree blocks were modified some time
ago (before the kernel code was committed) but the xfs_repair
mapping code was not updated to match. It's no longer a simple
mapping, so just make the code a dense array and use the magic
number as the search key.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:26:04 +0000 (10:26 +1000)]
xfs_mdrestore: recalculate sb CRC before writing
xfs_mdrestore writes the superblock after modifying it, and so the
CRC is not necessarily correct. Make sure the CRC is correct
before we write the superblock back.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:26:02 +0000 (10:26 +1000)]
mkfs.xfs: validate options for CRCs up front.
With CRC enabled filesystems, certain options are now not optional
and so are always enabled. Validate these options up front and
abort if options are specified that cannot be set.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:59 +0000 (10:25 +1000)]
xfs_repair: make directory freespace table CRC format aware.
We fail to take into account the format of the directory block when
reading the best free space form a directory data block for free
space block verification. This causes occasionaly failures in
xfstests.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:58 +0000 (10:25 +1000)]
xfs_repair: convert directory parsing to use libxfs structure
It turns out that xfs_repair copies xfs_db in rolling it's own
opaque directory types for the different block formats. It has a
little comment about how they are "shared" with xfs_db. Shared by
copy and pasting, rather than a common header, it would appear.
Anyway, same problems, need to use format aware definitions and
abstractions from libxfs so that everything is parsed properly.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:57 +0000 (10:25 +1000)]
xfs_db: update field printing for dir crc format changes.
Note that this also requires changing the type parsing to only
allow dir3 data block parsing on CRC enabled filesystems. This is
slighly more complex than it needs to be because of the way the
type table is walked and the assumption that all the entries are in
type number order.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:55 +0000 (10:25 +1000)]
xfs_db: convert directory parsing to use libxfs structure
xfs_db rolls it's own "opaque" directory types for the different
block formats. All it cares about is where the headers end and the
data starts, and none of the other details in the structures. Rather
than duplicate this for the dir3 format, we already have perfectly
good headers and abstraction functions for finding this information
in libxfs. Using these means that the dir2 code used for printing
fields, metadump and check need to be modified to use libxfs
definitions.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:51 +0000 (10:25 +1000)]
libxfs: determine inode size from version number, not struct xfs_dinode
xfs_db does not use the same structure types as libxfs when checking
inodes, and so cannot determine the size of the inode core by
passing a struct xfs_dinode to a function. We do, however, know the
raw version number, so we can pass that instead. Convert the code to
passing the inode version rather than a structure.
Note that this should probably be converted in the kernel code as
well.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:50 +0000 (10:25 +1000)]
xfs_db: disable modification for CRC enabled filessytems.
xfs_db does not have the IO infrastructure to calculate metadata
CRCs after modifying metadata. Hence xfs_db can only run in
read-only mode on filesystems with version 5 superblocks.
To fix this, xfs_db needs to have it's IO engine converted to use
the buffer based IO provided by libxfs rather than rolling it's own
IO routines. That is future work, so until this conversion is done,
only allow xfs_db to run in read-only mode on v5 filesystems.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:49 +0000 (10:25 +1000)]
xfsprogs: disable xfs_check for CRC enabled filesystems
Until xfs_db has full metadata CRC support, xfs_check will not be
able to fully verify filesystems in this format. Don't even
bother trying right now, and to make it simple to test full xfsprogs
installs with xfstests, just silently succeed.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:47 +0000 (10:25 +1000)]
xfsprogs: add crc format support to repair
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
[minor whitespace, spelling, and coarse language cleanup -bpm]
Dave Chinner [Fri, 7 Jun 2013 00:25:45 +0000 (10:25 +1000)]
xfsprogs: Add verifiers to libxfs buffer interfaces.
Verifiers need to be used everywhere to enable calculation of CRCs
during writeback of modified metadata. Add then to the libxfs buffer
interfaces conver the internal use of devices to be buftarg aware.
Verifiers also require that the buffer has a back pointer to the
struct xfs_mount. To make this source level comaptible between
kernel and userspace, convert userspace to pass struct xfs_buftargs
around rather than a "device".
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:44 +0000 (10:25 +1000)]
xfs: implement extended feature masks
The version 5 superblock has extended feature masks for compatible,
incompatible and read-only compatible feature sets. Implement the
masking and mount-time checking for these feature masks.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:43 +0000 (10:25 +1000)]
xfs: add CRC checks to the superblock
With the addition of CRCs, there is such a wide and varied change to
the on disk format that it makes sense to bump the superblock
version number rather than try to use feature bits for all the new
functionality.
This commit introduces all the new superblock fields needed for all
the new functionality: feature masks similar to ext4, separate
project quota inodes, a LSN field for recovery and the CRC field.
This commit does not bump the superblock version number, however.
That will be done as a separate commit at the end of the series
after all the new functionality is present so we switch it all on in
one commit. This means that we can slowly introduce the changes
without them being active and hence maintain bisectability of the
tree.
This patch is based on a patch originally written by myself back
from SGI days, which was subsequently modified by Christoph Hellwig.
There is relatively little of that patch remaining, but the history
of the patch still should be acknowledged here.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:42 +0000 (10:25 +1000)]
xfs: buffer type overruns blf_flags field
The buffer type passed to log recvoery in the buffer log item
overruns the blf_flags field. I had assumed that flags field was a
32 bit value, and it turns out it is a unisgned short. Therefore
having 19 flags doesn't really work.
Convert the buffer type field to numeric value, and use the top 5
bits of the flags field for it. We currently have 17 types of
buffers, so using 5 bits gives us plenty of room for expansion in
future....
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:40 +0000 (10:25 +1000)]
xfs: add CRC protection to remote attributes
There are two ways of doing this - the first is to add a CRC to the
remote attribute entry in the attribute block. The second is to
treat them similar to the remote symlink, where each fragment has
it's own header and identifies fragment location in the attribute.
The problem with the CRC in the remote attr entry is that we cannot
identify the owner of the metadata from the metadata blocks
themselves, or where the blocks fit into the remote attribute. The
down side to this approach is that we never know when the attribute
has been read from disk or not and so we have to verify it every
time it is read, and we must calculate it during the create
transaction and log it. We do not log CRCs for any other metadata,
and so this creates a unique set of coherency problems that, in
general, are best avoided.
Adding an identifying header to each allocated block allows us to
identify each fragment and where in the attribute it is located. It
enables us to rebuild the remote attribute from just the raw blocks
containing the attribute. It also provides us to do per-block CRCs
verification at IO time rather than during the transaction context
that creates it or every time it is read into a user buffer. Hence
it avoids all the problems that an external, logged CRC has, and
provides all the benefits of self identifying metadata.
The only complexity is that we have to add a header per fragment,
and we don't know how many fragments will be needed prior to
allocations. If we take the symlink example, the header is 56 bytes
and hence for a 4k block size filesystem, in the worst case 16
headers requires 1 extra block for the 64k attribute data. For 512
byte filesystems the worst case is an extra block for every 9
fragments (i.e. 16 extra blocks in the worse case). This will be
very rare and so it's not really a major concern.
Because allocation is done in two steps - the first finds a hole
large enough in the attribute file, the second does the allocation -
we only need to find a hole big enough for a worst case allocation.
We only need to allocate enough extra blocks for number of headers
required by the fragments, and we can calculate that as we go....
Hence it really only makes sense to use the same model as for
symlinks - it doesn't add that much complexity, does not require an
attribute tree format change, and does not require logging
calculated CRC values.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:39 +0000 (10:25 +1000)]
xfs: split remote attribute code out
Adding CRC support to remote attributes adds a significant amount of
remote attribute specific code. Split the existing remote attribute
code out into it's own file so that all the relevant remote
attribute code is in a single, easy to find place.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:36 +0000 (10:25 +1000)]
xfs: shortform directory offsets change for dir3 format
Because the header size for the CRC enabled directory blocks is
larger, the offset of the first entry into a directory block is
different to the dir2 format. The shortform directory stores the
dirent's offset so that it doesn't change when moving from shortform
to block form and back again, and hence it needs to take into
account the different header sizes to maintain the correct offsets.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:35 +0000 (10:25 +1000)]
xfs: add CRC checking to dir2 leaf blocks
This addition follows the same pattern as the dir2 block CRCs.
Seeing as both LEAF1 and LEAFN types need to changed at the same
time, this is a pretty large amount of change. leaf block headers
need to be abstracted away from the on-disk structures (struct
xfs_dir3_icleaf_hdr), as do the base leaf entry locations.
This header abstract allows the in-core header and leaf entry
location to be passed around instead of the leaf block itself. This
saves a lot of converting individual variables from on-disk format
to host format where they are used, so there's a good chance that
the compiler will be able to produce much more optimal code as it's
not having to byteswap variables all over the place.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:33 +0000 (10:25 +1000)]
xfs: add CRC checking to dir2 free blocks
This addition follows the same pattern as the dir2 block CRCs, but
with a few differences. The main difference is that the free block
header is different between the v2 and v3 formats, so an "in-core"
free block header has been added and _todisk/_from_disk functions
used to abstract the differences in structure format from the code.
This is similar to the on-disk superblock versus the in-core
superblock setup. The in-core strucutre is populated when the buffer
is read from disk, all the in memory checks and modifications are
done on the in-core version of the structure which is written back
to the buffer before the buffer is logged.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:32 +0000 (10:25 +1000)]
xfs: add CRC checks to block format directory blocks
Now that directory buffers are made from a single struct xfs_buf, we
can add CRC calculation and checking callbacks. While there, add all
the fields to the on disk structures for future functionality such
as d_type support, uuids, block numbers, owner inode, etc.
To distinguish between the different on disk formats, change the
magic numbers for the new format directory blocks.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:28 +0000 (10:25 +1000)]
xfsprogs: Support new AGFL format
With the addition of CRCs to the filesystem format, the AGFL has a
new format structure definition. Existing code that pulls freelist
blocks out via dereferencing agfl->agfl_bno no longer works as the
location of the free list is now variable depending on the disk
format in use.
Hence all the users of agfl_bno need ot be converted to extract the
location of the first free list entry from the AGFL and grab entries
relative to that first entry. It's a simple change, but needs to be
made in several places as there is very little code reuse within and
between the different utilities in xfsprogs.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 17 May 2013 11:12:57 +0000 (21:12 +1000)]
logprint: fix wrapped log dump issue
When running xfs/295 on a 512 byte block size filesystem, logprint
fails during checking with a "Bad log record header" error. This is
due to the fact that the log has wrapped and there is partial record
a the start of the log.
logprint doesn't check for this condition, and simply assumes that
the first block in the log contains a log header, and hence aborts
when this case occurs. So we now have a spurious test failure due to
logprint displaying how right this comment is:
/*
* This code is gross and needs to be rewritten.
*/
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Eric Sandeen [Tue, 16 Jul 2013 02:16:52 +0000 (21:16 -0500)]
xfs_metadump: manpage fix regarding frozen fs
The xfs_metadump manpage states that metadump works
on a frozen filesystem; it does not. In fact, there is
no way to detect a frozen filesystem, so we can't make it
work, either.
So just remove this from the manpage; unmounted or RO
mounted is what is enforced by xfs_metadump.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:24 +0000 (10:25 +1000)]
mkfs: fix realtime device initialisation
The method that libxfs uses for logging inodes is not followed by rtinit().
It fails to join the realtime bitmap inode to the final extent free
transactions, and so mkfs.xfs dies when trying to log changes to the bitmap
inode. Fix it.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Tue, 18 Jun 2013 03:40:53 +0000 (13:40 +1000)]
xfsprogs: fix make deb
Commit 48212a30 ("xfsprogs: update 'make deb' to use tarball) fixed
a bunch of problems with making the source tarball for releases.
However, it broke the debian package builds in a way I hadn't
noticed until I rewrote my CI system build script.
I noticed that the CI system wasn't building from a pristine
workarea, and instead was just updating the old workarea and running
'make deb'. I added a 'make realclean' to remove all previous state
from the workarea, and then 'make deb' started failing with errors
building the tarball because po/xfsprogs.pot didn't have a build
rule
The above commit removed the pre-build of the translations target,
and instead made the translation build target a dependency of
building the the tarball. Hence the lack of a build rule of the
translations causes the source tarball build to fail.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Michael L. Semon [Fri, 14 Jun 2013 07:00:34 +0000 (03:00 -0400)]
xfsprogs: define umode_t for build if not defined already
umode_t has not been exported to the kernel private headers since
around kernel 3.2.0-rc7. Add a check for umode_t and define it
if it has not been defined already.
Signed-off-by: Michael L. Semon <mlsemon35@gmail.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Thu, 9 May 2013 16:20:13 +0000 (11:20 -0500)]
xfs_logprint: fix continuation transactions
As demonstrated by xfs/295, continuation transactions cause of
problems for xfs_logprint. The failure demonstrated by the test is
that the buffer log format structures are variable sized on disk -
the dirty bitmap is sized according to the buffer length, not fixed
to the length of the maximum supported buffer size.
xfs_logprint assumes that the buf log format reocrds are of fixed
size, and so when a short buffer is found it fails to handle it
properly and treats it like a continuation record. This causses the
opheader pointer to be incremented incorrectly and then logprint
wanders off into a dark corner and gets eaten by a grue.
While fixing this, make the xlog_print_record code that does the
transaction opheader walking a little easier to read and stop it
from outputting binary data direct to the console by converting the
no-data-print case to use a hex dumping loop.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Thu, 9 May 2013 13:19:06 +0000 (08:19 -0500)]
xfsprogs: updata libxlog to current kernel code
Update the libxlog log recovery code to match the current 3.8-rc2
kernel code and ensure all the callers work correctly.
Note: while this introduces CRC validation infrastructure, it is
currently short-circuited as it is not clear what to do with log
buffers that fail CRC checking. We're only reading the log to
determine it is clean/dirty or dumping the contents for analysis, so
it's not clear what to do with CRC validation errors yet, or even
if there is any commonality with the kernel handling. This will need
to be revisited as the situation clarifies.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Thu, 9 May 2013 13:18:51 +0000 (08:18 -0500)]
xfsprogs: add CRC32c infrastructure
Pull the generic crc32(c) code from the kernel and add it to libxfs.
Modify it to build in the libxfs environment, and drop the bigendian
CRC version as it is unused by XFS, which uses the little endian
version so that it can be hardware accelerated using native
instructions on x86-64 CPUs.
Also wire up the self-test code in the crc32 module to the build
infrastructure and make passing the self test a build dependency.
This prevents xfsprogs from being built on platforms that the CRC
algorithm does not work on and hence ensures the tools do not write
bad CRCs to disk as a result of a broken calculation.
Also pull the XFS CRC helper functions across in preparation for
using the CRC functions in libxfs.
XXX: something in the CRC table generation breaks the debian package
build. It fails to build libxfs as a dependency of mkfs.xfs. Works
fine outside the debian build environment, so I'm not sure what the
issue is yet. Most likely it is the execution path of the
gen_crc32table binary that is built...
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Thu, 9 May 2013 12:22:15 +0000 (07:22 -0500)]
xfsprogs: Die dir1 Die!
Version 1 directories have never been supported on linux, and we base the
default on pre-Irix 6.2 kernels (~1998). A recent xfs_repair debugging session
on #xfs determined that dir v1 support in xfs_repair has been broken since May
26, 2008 when the ascii case insensitivity support was added to userspace.
Seeing that the code has been broken for roughly 5 years and the first time that
it was noticed was a couple of days ago, it is clearly rarely required, rarely
used and completely untested.
Following the time-honoured X server deprecation model, if it's been broken for
several years and nobody has noticed, then it can and should be removed. So,
rather than trying to fix something we can't test and very, very few people care
about, let's just remove it.
For xfs_repair, some of the checking code is shared with the attribute repair
code. Once all the dirv1 specific code is removed, there isn't a whole lot left,
so move it to attr_repair.c and we can kill the dir.[ch] files completely.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Thu, 9 May 2013 12:16:09 +0000 (07:16 -0500)]
xfs_fsr: file reads should be O_DIRECT
When running xfs_fsr on a sparse filesystem image containing
approximately 8 million extents and 80GB of data, I noticed that the
page cache grew and consumed all the memory in the machine. It turns
out that xfs_fsr is using direct IO to write data, but buffered IO
to read data. Convert the read side to use direct IO to prevent page
cache blowouts.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Thu, 9 May 2013 12:11:50 +0000 (07:11 -0500)]
xfs_logprint: print all AGI unlinked buckets
When printing buffer contents, the AGI unlinked buckets are not
printed in transactional output. In normal dump format, they are
printed, but that format is generally not useful for log recovery
analysis. Add the same code to the transactional buffer dump.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Thu, 2 May 2013 12:59:20 +0000 (07:59 -0500)]
xfs_repair: validate on-disk extent count better
When scanning a btree format inode, we trust the extent count to be
in range. However, values of the range 2^31 <= cnt < 2^32 are
invalid and can cause problems with signed range checks. This
results in assert failures which validating the extent count such
as:
Eric Sandeen [Thu, 25 Apr 2013 15:03:47 +0000 (15:03 +0000)]
xfsprogs: Fix manpages for missing or incorrect options
Add valid options which aren't in manpages, and
remove invalid options which are in manpages:
* Document -V (show version and exit) for many manpages.
* Remove -? option from xfs_estimate.8
* Document -p passes, -d (debug) and -g (syslog) in xfs_fsr.8
* Document -n (O_NONBLOCK) in xfs_io.8
* Document -v (print overwrite) in xfs_logprint.8
* Document -m max_extents in xfs_metadump.8
* Document -p (preallocate) in xfs_mkfile.8
Brian Foster [Tue, 19 Mar 2013 13:23:35 +0000 (13:23 +0000)]
xfsprogs: reduce bb_numrecs in bno/cnt btrees when log consumes all agf space
The mkfs code currently creates a single free space extent record
for each of the bno and cnt btrees in each AG. The start block of
the record is pushed forward on the AG that hosts an internal log.
If the log happens to consume all available space in the AG, the
start block becomes equal to sb->sb_agblocks and thus invalid.
This causes xfs_repair to complain.
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- scan filesystem freespace and inode maps...
invalid start block 4096 in record 0 of bno btree block 1600/1
invalid start block 4096 in record 0 of cnt btree block 1600/2
- found root inode chunk
...
xfs_repair appears to correct the numrecs value such that subsequent
checks are successful. The sequence above is pulled from xfstests
test #250, which fails due to this behavior.
Modify mkfs.xfs such that we check the block count value of the
free space record for the log AG after the log is accounted for. If
no space is left for the record, reset the record count to 0.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Eric Sandeen [Sat, 9 Mar 2013 15:21:55 +0000 (15:21 +0000)]
xfsprogs: skip freelist scans of corrupt agf
If an agf has bad values in the freelist, this can wreak
havoc if, for example, first > last and the loop
never exits; we index agfl->agfl_bno[i] off into the weeds.
If they're off, warn about it and skip the scan.
This is done both in xfs_check and xfs_db's freespace cmd.
Also fix uninit'd variable "i" from previous, similar fix
for xfs_repair.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Eric Sandeen [Sat, 2 Mar 2013 21:23:12 +0000 (21:23 +0000)]
xfsprogs: xfs_repair skip freelist scan of corrupt agf in no-modify mode
In xfs_repair's no-modify mode (-n), verify_set_agf doesn't fix up
bad freelist blocks that it finds. When we get to scan_freelist,
this can wreak havoc if, for example, first > last and the loop
never exits; we index agfl->agfl_bno[i] off into the weeds.
To fix this, re-check the values in no-modify mode, and if
they're off, warn about it and skip the scan.
Reported-by: Ole Tange <tange@binf.ku.dk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Eric Sandeen [Sat, 26 Jan 2013 22:40:31 +0000 (22:40 +0000)]
xfs_fsr: fix attribute no_change_count logic
As it stands today, if no_change_count++ isn't > 10,
we will reset it to 0. There's no way to get above 1
(let alone 10) so this isn't working as intended.
If we see progress (last_forkoff != tbstat.bs_forkoff)
*then* we sould reset the no_change_count counter to 0.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Eric Sandeen [Sat, 26 Jan 2013 22:40:27 +0000 (22:40 +0000)]
libxfs: fix setup_cursor array allocation
setup_cursor() wants an array of xfs_agbno_t's, but
it allocated a multiple of *pointers* to xfs_agbno_t's.
xfs_agbno_t is 4 bytes, so this is harmless other than
allocating twice as much memory as needed on a 64-bit
machine.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Mark Tinguely <tinguely@sgi.com>