git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

xfs_db: write command broken on 64 bit values

convert_args() has problesm with 64 bit fields because it tries to
shift them by 64 bits. The result of doing so is undefined by the C
standard, and so results in the unexpected behaviour of the result
being being the original value unchanged rather than 0. Hence you
can't write 64 bit fields because the code thinks that all values
other than 0 are out of range.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: support more than 25 ACLs

v5 superblock supports many more than 25 ACLs on an inode, but
xfs_repair still thinks that the maximum is 25. This slipped through
becase the reapir code does not share any of the kernel side ACL
code in libxfs, and instead has all it's own internal ACL
definitions.

Fix the repair code to support more than 25 ACLs and update
the ACL definitions to match the kernel definitions. In doing so,
this tickles a off-by-one bug on remote attribute maximum sizes
that is already fixed in the kernel code. So in addition to fixing
the repair code, this patch pulls in parts of the following kernel
commits:

bba719b5 xfs: fix off-by-one error in xfs_attr3_rmt_verify
0a8aa193 xfs: increase number of ACL entries for V5 superblocks

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Tested-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: Repair directory block CRC mismatches

It can happen that just CRC doesn't match for directory blocks. In that
case xfs_repair will just report the error but won't fix anything (as
further checking of the block doesn't reveal any problems). Make sure we
recompute and write out new CRC when we failed verification during
reading.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs: add "-m" options to the man page

Because they are missing.

Reported-by: Matthias Schniedermeyer <ms@citd.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: fix crc field handling in xfs_sb_to/from_disk

If we xfs_mdrestore an image from a non-crc filesystem, lo
and behold the restored image has gained a CRC:

# db/xfs_metadump.sh -o /dev/sdc1 - | xfs_mdrestore - test.img
# xfs_db -c "sb 0" -c "p crc" /dev/sdc1
crc = 0 (correct)
# xfs_db -c "sb 0" -c "p crc" test.img
crc = 0xb6f8d6a0 (correct)

This is because xfs_sb_from_disk doesn't fill in sb_crc,
but xfs_sb_to_disk(XFS_SB_ALL_BITS) does write the in-memory
CRC to disk - so we get uninitialized memory on disk.

Fix this by always initializing sb_crc to 0 when we read
the superblock, and masking out the CRC bit from ALL_BITS
when we write it.

This same fix has already been sent for kernelspace.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: don't send null bp to xfs_trans_brelse()

In this case, if bp is null, error is set, and we send
bp to xfs_trans_brelse, which will try to dereference it.

Test whether we actualy have a buffer before we try to
free it.

Same fix as was sent for kernelspace.

Coverity spotted this.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: indicate default mount options in xfs.5 manpage

Not every pair of mount options indicated which was the
default, so add those.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: add mount options to xfs.5 manpage

This is a straight cut and paste from the util-linux
mount manpage to xfs.5.

It's pretty much impossible for util-linux to keep up
with every filesystem out there, and Karel has more than
once expressed a wish that mount options move into fs-specific
manpages.

So, here we go.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_fsr: test for more potential failures in packfile()

Test for lseek, ftruncate, and fsync failures in packfile()

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_fsr: create a cleanup/return target in packfile()

Error handling is a mishmash of closes, frees, etc at every
error point. Create an "out" target that does this all
in one place.

Minor comment/doc update while we're at it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_fsr: ensure the line we read from leftofffile is null terminated

Ensure that the string we read from leftofffile is NULL
terminated; the buffer gets passed to strchr(), so
it's important that we ensure it ends with NULL.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_copy: fix data corruption of target

The unit of XFS_AGFL_DADDR(mp) is "basic block" whose size is "BBSIZE"
(512 bytes), so when "source_sectorsize" is not 512, it will cause the
target a corrupted filesystem.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs.xfs: don't call blkid_get_topology on existing regular files

If we encounter a target that's really a regular file,
even without "-d file..." on the cmdline, call
platform_findsizes() instead of blkid_get_topology to
try to discover the "sector size" via the fsgeom() call.

Otherwise mkfs.xfs will try to do direct IO with a default
512 sector size, and if the underlying file has different
DIO requirements, mkfs will fail.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: try to handle mkfs of a file on 4k sector device

Try the xfs geometry ioctl if the mkfs target resides
in a file; this gives us the equivalent of a device
sector size.

If this fails, and there's a sector size mismatch
between the host FS and the filesystem, then mkfs might
fail - but that's no worse than it's been before.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: update polish translation

Jakub provided the polish translation here:

http://qboosh.pl/pl.po/xfsprogs-3.2.0.pl.po

Signed-off-by: Dave Chinner <dchinner@redhat.com>

db: add finobt support to metadump

Include the free inode btree in metadump images. If the source fs
is finobt-enabled, run an additional scan_btree() of the finobt.
Since the private 'agi' scanfunc_ino() parameter is unused, change
the private parameter to a flag that indicates whether the current
scan is for the inobt or finobt. If the latter, we skip copying the
actual inode chunks as this work is already performed by the inobt
scan.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

growfs: report finobt status in fs geometry (xfs_info)

Check and report on the free inode btree status bit in the fs
geometry.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: reconstruct the finobt in phase 5

Support reconstruction of the finobt in phase 5 of xfs_repair. We
create a new cursor for the finobt and write the in-core records
that contain free inodes to the tree. Finally, pass the cursor
along to build_agi() to include the finobt root and level count in
the agi header.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: helpers for finding in-core inode records w/ free inodes

Add the findfirst_free_inode_rec() and next_free_ino_rec() helpers
to assist scanning the in-core inode records for records with at
least one free inode. These will be used to determine what records
are included in the free inode btree.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: pull the build_agi() call up out of the inode tree build

Pull the build_agi() call out of build_ino_tree() in phase 5. This
is to prepare for finobt support, in which build_agi() will require
context from multiple inode tree reconstructions (both the inode
allocation tree and free inode tree, when it exists).

Create the new 'agi_stat' structure to carry the requisite state
from the build_ino_tree() operation to build_agi().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: pass btree block magic as param to build_ino_tree()

A minor cleanup to build_ino_tree() to provide the appropriate
magic value for btree block initialization from the caller. This
facilitates use of separate magic values for finobt blocks when
building the free inode btree.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: phase 2 finobt scan

If one exists, scan the free inode btree in phase 2 of xfs_repair.
We use the same general infrastructure as for the inobt scan, but
trigger finobt chunk scan logic in in scan_inobt() via the magic
value.

The new scan_single_finobt_chunk() function is similar to the inobt
equivalent with some finobt specific logic. We can expect that
underlying inode chunk blocks are already marked used due to the
previous inobt scan. We can also expect to find every record
tracked by the finobt already accounted for in the in-core tree
with equivalent (and internally consistent) inobt record data.

Spit out a warning on any divergences from the above and add the
inodes referenced by the current finobt record to the appropriate
in-core tree.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: account for finobt in ag 0 geometry pre-calculation

Account for the finobt in calc_mkfs().

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

db: finobt support

Add the AGI finobt fields and fibt layouts.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs: finobt mkfs support

Add the 'finobt' metadata option to mkfs to format an fs with free
inode btree support. If enabled, initialize the associated AGI
header fields and btree root block.

Also, do the initialization of the superblock version and feature
bits (including the new finobt flag) a bit earlier. These fields
must now be initialized prior to the use of XFS_PREALLOC_BLOCKS(),
as the latter returns a value that depends on whether a finobt root
btree block is reserved.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: enable the finobt feature on v5 superblocks

Add the finobt feature bit to the list of known features. As of
this point, the kernel code knows how to mount and manage both
finobt and non-finobt formatted filesystems.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: report finobt status in fs geometry

Define the XFS_FSOP_GEOM_FLAGS_FINOBT fs geometry flag and set the
associated bit if the filesystem supports the free inode btree.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: update the finobt on inode free

An inode free operation can have several effects on the finobt. If
all inodes have been freed and the chunk deallocated, we remove the
finobt record. If the inode chunk was previously full, we must
insert a new record based on the existing inobt record. Otherwise,
we modify the record in place.

Create the xfs_ifree_finobt() function to identify the potential
scenarios and update the finobt appropriately.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: refactor xfs_difree() inobt bits into xfs_difree_inobt() helper

Refactor xfs_difree() in preparation for the finobt. xfs_difree()
performs the validity checks against the ag and reads the agi
header. The work of physically updating the inode allocation btree
is pushed down into the new xfs_difree_inobt() helper.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: use and update the finobt on inode allocation

Replace xfs_dialloc_ag() with an implementation that looks for a
record in the finobt. The finobt only tracks records with at least
one free inode. This eliminates the need for the intra-ag scan in
the original algorithm. Once the inode is allocated, update the
finobt appropriately (possibly removing the record) as well as the
inobt.

Move the original xfs_dialloc_ag() algorithm to
xfs_dialloc_ag_slow() and fall back as such if finobt support is
not enabled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: insert newly allocated inode chunks into the finobt

A newly allocated inode chunk, by definition, has at least one
free inode, so a record is always inserted into the finobt.

Create the xfs_inobt_insert() helper from existing code to insert
a record in an inobt based on the provided BTNUM. Update
xfs_ialloc_ag_alloc() to invoke the helper for the existing
XFS_BTNUM_INO tree and XFS_BTNUM_FINO tree, if enabled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: update inode allocation/free transaction reservations for finobt

Create the xfs_calc_finobt_res() helper to calculate the finobt log
reservation for inode allocation and free. Update
XFS_IALLOC_SPACE_RES() to reserve blocks for the additional finobt
insertion on inode allocation. Create XFS_IFREE_SPACE_RES() to
reserve blocks for the potential finobt record insertion on inode
free (i.e., if an inode chunk was previously fully allocated).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: support the XFS_BTNUM_FINOBT free inode btree type

Define the AGI fields for the finobt root/level and add magic
numbers. Update the btree code to add support for the new
XFS_BTNUM_FINOBT inode btree.

The finobt root block is reserved immediately following the inobt
root block in the AG. Update XFS_PREALLOC_BLOCKS() to determine the
starting AG data block based on whether finobt support is enabled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: reserve v5 superblock read-only compat. feature bit for finobt

Reserve a v5 read-only compatibility feature bit for the finobt and
create the xfs_sb_version_hasfinobt() helper to determine whether
an fs has the feature enabled.

The finobt does not change existing on-disk structures, but must
remain consistent with the ialloc btree. Modifications from older
kernels would violate that constrant. Therefore, we restrict older
kernels to read-only mounts of finobt-enabled filesystems.

Note that this does not yet enable the ability to rw mount a finobt
fs (by setting the feature bit in the XFS_SB_FEAT_RO_COMPAT_ALL
mask).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: refactor xfs_ialloc_btree.c to support multiple inobt numbers

The introduction of the free inode btree (finobt) requires that
xfs_ialloc_btree.c handle multiple trees. Refactor xfs_ialloc_btree.c
so the caller specifies the btree type on cursor initialization to
prepare for addition of the finobt.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: don't let bplist index go negative in prefetch

After:

bbd3275 repair: don't unlock prefetch tree to read discontig buffers

Coverity spotted that it's possible for us to arrive at the loop
below with num == 1, and then we decrement it to 0, and try to
index bplist[num-1].

I think this was possible before the change, i.e. it's probably
not a regression.

Fix this by not trying to shrink the window unless we have
more than one buffer in the array.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: fix compile error when libxfs header used in C++ code

xfs_ialloc.h:102: error: expected ',' or '...' before 'delete'

Simple parameter rename, no changes to behaviour.

Signed-off-by: Roger Willcocks <roger@filmlight.ltd.uk>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: remove xfs_check

This removes xfs_check and all references to it in
manpages. The DIAGNOSTICS section from xfs_check(8)
has been moved to xfs_db(8).

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_copy: use exit() to replace killall()

Sending a SIGKILL signal to child thread will terminate the whole process,
xfs_copy will return an error value 137. This cause confuse for script to
know whether the copy successes.

Calling exit() in main thread can terminate the whole process and return the
right value. Replace killall()+abort() with exit(1) to match the old way
exit in error case. Also remove killall()+pthread_exit(NULL) since return 0
will be followed by an exit(0) to terminate the process.

[ Christoph Hellwig:
Btw, I think the reason for this cruft is that xfs_copy was originally
written using the IRIX sproc interface, and the port to pthreads didn't
remove this gem:

http://marc.info/?l=linux-xfs&m=99535721110020&w=2 ]

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: dont free xfs_inode until complete

Originally, the xfs_inode are released upon the first
call to xfs_trans_cancel, xfs_trans_commit, or
inode_item_done. This code used the log item lock field
to prevent the release of the inode on the next call to
one of the above functions. This is a unusual use of the
log item lock field which is suppose to specify which lock
is to be release on transaction commit or cancel. User
space does not perform locking in transactions..

Unfortunately, this breaks any code that relies on multiple
transaction operations. For example, adding an extended
attribute to an inode that does not have an attribute fork
will fail:

# xfs_db -x XFS_DEVICE
xfs_db> inode INO_NUM
xfs_db> attr_set newattribute

This patch does the following:
1) Removes the iput from the transaction completion and
    requires that the xfs_inode allocators call IRELE()
    when they are done with the pointer. The real time
    inodes are pointed to by the xfs_mount and have a longer
    lifetime.
2) Removes libxfs_trans_iput() because transaction entries
    are removed in transaction commit and cancel.
3) Removes libxfs_trans_ihold() which is an obsolete interface.
4) Removes the now unneeded ili_ilock_flags from the
    xfs_inode_log_item structure.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: remove unused argument in trans_iput

Remove the unused second argument to xfs_iput() and
xfs_trans_iput().

Introduce the define "IRELE()" and use in place of xfs_iput().

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: v3.2.0 release

Update all the version and changelog files for v3.2.0.

Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: v3.2.0-rc3 release

Update all the various version and changelog files for a v3.2.0-rc3
release.

Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: don't grind CPUs with large extent lists

When repairing a large filesystem with fragemented files, xfs_repair
can grind to a halt burning multiple CPUs in blkmap_set_ext().
blkmap_set_ext() inserts extents into the blockmap for the inode
fork and it keeps them in order, even if the inserts are not done in
order.

The ordered insert is highly inefficient - it starts at the first
extent, and simple walks the array to find the insertion point. i.e.
it is an O(n) operation. When we have a fragemented file with a
large number of extents, the cost of the entire mapping operation
is rather costly.

The thing is, we are doing the insertion from an *ordered btree
scan* which is inserting the extents in ascending offset order.
IOWs, we are always inserting the extent at the end of the array
after searching the entire array. i.e. the mapping operation cost is
O(N^2).

Fix this simply by reversing the order of the insert slot search.
Start at the end of the blockmap array when we do almost all
insertions, which brings the overhead of each insertion down to O(1)
complexity. This, in turn, results in the overall map building
operation being reduced to an O(N) operation, and so performance
degrades linearly with increasing extent counts rather than
exponentially.

While there, I noticed that the growing of the blkmap array was only
done 4 extents at a time. When we are dealing with files that may
have hundreds of thousands of extents, growing th map only 4 extents
at a time requires excessive amounts of reallocation. Reduce the
reallocation rate by increasing the grow increment according to how
large the array currently is.

The result is that the test filesystem (27TB, 30M inodes, at ENOSPC)
takes 5m10s to *fully repair* on my test system, rather that getting
15 (of 60) AGs into phase three and sitting there burning 3-4 CPUs
making no progress for over half an hour.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: don't unlock prefetch tree to read discontig buffers

The way discontiguous buffers are currently handled in prefetch is
by unlocking the prefetch tree and reading them one at a time in
pf_read_discontig(), inside the normal loop of searching for buffers
to read in a more optimized fashion.

But by unlocking the tree, we allow other threads to come in and
find buffers which we've already stashed locally on our bplist[].
If 2 threads think they own the same set of buffers, they may both
try to delete them from the prefetch btree, and the second one to
arrive will not find it, resulting in:

fatal error -- prefetch corruption

To fix this, simply abort the buffer gathering loop when we come
across a discontiguous buffer, process the gathered list as per
normal, and then after running the large optimised read, check to
see if the last buffer on the list is a discontiguous buffer.
If is is discontiguous, then issue the discontiguous buffer read
while the locks are not held. We only ever have one discontiguous
buffer per read loop, so it is safe just to check the last buffer in
the list.

The fix is loosely based on a a patch provided by Eric Sandeen, who
did all the hard work of finding the bug and demonstrating how to
fix it.

Reported-by:Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

Update debian changelog in preparation for pending release

In particular, add a note to the changelog about the added
dh_autoconf use so the BTS will be updated appropriately.

Signed-off-by: Nathan Scott <nathans@debian.org>

Fix msgfmt warning when building the German translation

Use the same header present in the Polish language files to
resolve the following warning in the de.po build:
[MSGFMT] de.mo
de.po:7: warning: header field 'Language' missing in header

Signed-off-by: Nathan Scott <nathans@debian.org>

Fix 32 bit build warning in libxfs, xfs_daddr_t printing

Add the usual type casts to resolve the following warnings:
rdwr.c: In function 'libxfs_getbufr_map':
rdwr.c:499:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'xfs_daddr_t' [-Wformat]
rdwr.c:499:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'xfs_daddr_t' [-Wformat]

Signed-off-by: Nathan Scott <nathans@debian.org>

xfsprogs: v3.2.0-rc2 release

Update all the various version and changelog files for a v3.2.0-rc2
release.

Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs.xfs: prevent close(-1) on protofile error path

My previous cleanups introduced this; in the case where
fd=open() failed, the out_fail: path would try to close(-1).

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_logprint: Fix error handling in xlog_print_trans_efi

A recent change to xlog_print_trans_efi() led to a leaked
"src_f" on this error return.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: detect and handle attribute tree CRC errors

Currently the attribute code will not detect and correct errors in
the attribute tree. It also fails to validate the CRCs and headers
on remote attribute blocks. Ensure that all the attribute blocks are
CRC checked and that the processing functions understand the correct
block formats for decoding.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: handle remote symlink CRC errors

We can't really repair broken symlink buffer contents, but we can at
least warn about it and correct the CRC error so the symlink is
again readable.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: remove more dirv1 leftovers

get_bmapi() and it's children were only called by dirv1 code. There
are no current callers, so remove them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: report AG btree verifier errors

When we scan the filesystem freespace and inode maps in phase 2, we
don't report CRC errors that are found. We don't really care from a
repair perspective, because the trees are completely rebuilt from
the ground up in phase 5, but froma checking perspective we need to
inform the user that we found inconsistencies.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: detect CRC errors in AG headers

repair doesn't currently detect verifier errors in AG header
blocks - apart from the primary superblock they are not detected.
They are, fortunately, corrected in the important cases (AGF, AGI
and AGFL) because these structures are rebuilt in phase 5, but if
you run xfs_repair in checking mode it won't report them as bad.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: detect and correct CRC errors in directory blocks

repair doesn't currently verifier errors in directory blocks - they
cause repair to ignore blocks and hence fail because it can't read
critical blocks from the directory.

Fix this by having the directory buffer read code detect a verifier
error and retry the read without the verifier if the verifier has
detected an error. Then pass the verifer error with the successfully
read buffer back to the caller, so the caller can handle the error
appropriately. In most cases, this is simply marking the directory
as needing a rebuild, so once the directory entries have been
checked and repaired, it will rewrite all the directory buffers
(including the clean ones) and in the process recalculate all the
the CRC on the directory blocks.

Hence pure CRC errors in directory blocks are now handled correctly
by xfs_repair.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: ensure prefetched buffers have CRCs validated

Prefetch currently does not do CRC validation when the IO completes
due to the optimisation it performs and the fact that it does not
know what the type of metadata into the buffer is supposed to be.
Hence, mark all prefetched buffers as "suspect" so that when the
end user tries to read it with a supplied validation function the
validation is run even though the buffer was already in the cache.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

db: verify buffer on type change

Currently when the type command is run, we simply change the type
associated with the buffer, but don't verify it. This results in
unchecked CRCs being displayed. Hence when changing the type, run
the verifier associated with the type to determine if the buffer
contents is valid or not.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

db: don't claim unchecked CRCs are correct

Currently xfs_db will claim the CRC on a structure is correct if the
buffer is not marked with an error. However, buffers may have been
read without a verifier, and hence have not had their CRCs
validated. in this case, we shoul dreport "unchecked" rather than
"correct". For example:

xfs_db> fsb 0x6003f
xfs_db> type dir3
xfs_db> p
dhdr.hdr.magic = 0x58444433
dhdr.hdr.crc = 0x2d0f9c9d (unchecked)
....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: v3.2.0-rc1 release

Update all the various version and changelog files for a v3.2.0-rc1
release.

Signed-off-by: Dave Chinner <david@fromorbit.com>

logprint: handle split EFI entry

xfs_logprint does not correctly handle EFI entries that
are split across two log buffers. xfs_efi_copy_format()
falsely interrupts the truncated size of the split entry
as being a corrupt entry.

If the first log entry has enough information, namely the
number of extents in the entry and the identifier, then
display this information and a warning that this entry is
truncated. Otherwise, if there is not enough information in
the first log buffer, then print a message that the EFI decode
was not possible. These messages are similar to split inode
entries.

Example of a continued entry:
Oper (336): tid: f214bdb len: 44 clientid: TRANS flags: CONTINUE
EFI: #regs: 1 num_extents: 2 id: 0xffff880804f63900
EFI free extent data skipped (CONTINUE set, no space)

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: remove never-read "offset" assignment in readbufr_map & writebufr

libxfs_readbufr_map() & libxfs_writebufr() iterate
over bp->b_map[] and read each chunk.  The loops start
out correctly, getting the offset from bm_bn and the
length from bm_len.  After the IO it correctly
advances the target buffer pointer by len, but then
inexplicably advances "offset" by len as well.  The
whole point of this exercise is to handle discontiguous
ranges - marching offset along by length of IO done
is incorrect.

Thankfully offset is immediately reset to the proper
value again at the top of the loop for the next range,
so this is harmless, other than being confusing.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: fix random pread/pwrite to honor offset

xfs_io's pread & pwrite claim to support a random IO mode
where it will do random IOs between offset & offset+len.

However, offset was ignored, and we did the IOs between 0
and len instead.

Clang caught this by pointing out that the calculated/normalized
"offset" variable was never read.

(NB: If the range is larger than RAND_MAX, these functions don't
work, but that's always been true, so I'll leave it for another
day...)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: remove write-only assignments

There are many instances where variable assignments are made,
but never read (or are re-assigned before they are read).
The Clang static analyzer finds these.

Here's a chunk of what I think are trivial removals of such
assignments; other detections point to more serious problems
(or are shared w/ kernel code so should probably be fixed there
first).

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: don't use invalid index in ring_f

ring_f() tests for an invalid index which would overrun the
iocur_ring[] array and warns, but then uses it anyway.

Return immediately if it's out of bounds.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs: catch unknown format in protofile parsing

As the code stands today we can't get an unknown format in the
last case statement, but Coverity warns that if we ever do, we'll
use an uninitialized "ip" in the call to libxfs_trans_log_inode().

Adding a default: case to catch unknown formats is defensive and
makes the checker happy.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: address never-true tests in repair/bmap.c on 64 bit

The test "if (new_naexts > BLKMAP_NEXTS_MAX)" is never true
on a 64-bit platform; new_naexts is an int, and BLKMAP_NEXTS_MAX
is INT_MAX. So just wrap the whole thing in the #ifdef.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_quota: remove impossible tests in printpath

printpath() had some cut & paste tests of "c" - but
nothing had set it yet other than c=0, so testing it
is pointless. Just remove tests for non-zero "c"
until we might have set it to something interesting.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: fix too-large memset value in attr code

memset(value, 0xfeedface, valuelen);

seemed to be trying to fill a new attr with some
recognizeable magic, but of course memset can
only set a single byte; switch this to use 'v'

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: annotate a case fallthrough in libxfs_ialloc

This is all working as intended, but add a comment to
make it more obvious to readers and static code
checkers.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: free resources in libxfs_alloc_file_space error paths

The bmap freelist & transaction pointer weren't
being freed in libxfs_alloc_file_space error paths;
more or less copy the error handling that exists
in kernelspace to resolve this.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_logprint: fix leak in error path of xlog_print_record()

In 2 error paths we returned without freeing the allocated buf.
Collapse them into a compound test & free buf on the way out.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_quota: fix memory leak in quota_group_type() error path

quota_group_type has some rather contorted logic that's
been around since 2005.

In the (!name) case, if any of the 3 calls setting up ngroups fails,
we fall back to using just one group.

However, if it's the getgroups() call that fails, we overwrite
the allocated gid ptr with &gid, thus leaking that allocated
memory. Worse, we set "dofree" to 1, so will free non-allocated
local var gid. And that last else case is redundant; if we get there,
gids is guaranteed to be non-null.

Refactor it a bit to be more clear (I hope) and correct.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: fix memory leak in xfs_dir2_node_removename

Fix the leak of kernel memory in xfs_dir2_node_removename()
when xfs_dir2_leafn_remove() returns an error code.

Cross-port of kernel commit 3a8c9208

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxlog: fix memory leak in xlog_recover_add_to_trans

Free the memory in error path of xlog_recover_add_to_trans().
Normally this memory is freed in recovery pass2, but is leaked
in the error path.

Userspace version of kernel commits 519ccb8 & aaaae98

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: trivial buffer frees on error paths

Lots of memory leaks on error paths etc, spotted by
coverity. This patch rolls up the super-straightforward
fixes across xfsprogs.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_fsr: refactor fsrall_cleanup

fsrall_cleanup leaked an fd in the non-timeout
case - but the logic was weird and tortured, refactor
it to make more sense and fix the fd leak as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: fix various fd leaks

Coverity spotted these; several paths where we don't
close fds when we return.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: fix prefetch queue waiting

97b1fcf xfs_repair: fix array overrun in do_inode_prefetch

The thread creation loop has 2 ways to exit; either via
the loop counter based on thread_count, or the break statement
if we've started enough workers to cover all AGs.

Whether or not the loop counter "i" reflects the number of
threads started depends on whether or not we exited via the
break.

The above commit prevented us from indexing off the end
of the queues[] array if we actually advanced "i" all the
way to thread_count, but in the case where we break, "i"
is one *less* than the nr of threads started, so we don't
wait for completion of all threads, and all hell breaks
loose in phase 5.

Just stop with the cleverness of re-using the loop counter -
instead, explicitly count threads that we start, and then use
that counter to wait for each worker to complete.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: fix directory hash ordering bug

Commit f5ea1100 ("xfs: add CRCs to dir2/da node blocks") introduced
in 3.10 incorrectly converted the btree hash index array pointer in
xfs_da3_fixhashpath(). It resulted in the the current hash always
being compared against the first entry in the btree rather than the
current block index into the btree block's hash entry array. As a
result, it was comparing the wrong hashes, and so could misorder the
entries in the btree.

For most cases, this doesn't cause any problems as it requires hash
collisions to expose the ordering problem. However, when there are
hash collisions within a directory there is a very good probability
that the entries will be ordered incorrectly and that actually
matters when duplicate hashes are placed into or removed from the
btree block hash entry array.

This bug results in an on-disk directory corruption and that results
in directory verifier functions throwing corruption warnings into
the logs. While no data or directory entries are lost, access to
them may be compromised, and attempts to remove entries from a
directory that has suffered from this corruption may result in a
filesystem shutdown. xfs_repair will fix the directory hash
ordering without data loss occuring.

[dchinner: wrote useful a commit message]

Ported from equivalent kernel commit c88547a8.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: add a missing QA header file to install-qa target

Included from "include/libxfs.h".

Signed-off-by: David Sterba <dsterba@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: hide debug bbmap output

Most of xfsprogs building with DEBUG enables extra
checks, asserts, etc, but this bunch of printfs was
extra output that's not generally helpful for most
people's runtime experience - and it breaks xfs/290
with all the noise.

I assume it's for actual debugging use, and not
generally useful, so bury it a bit deeper under
it's own #ifdef.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: ensure that unused superblock fields are zeroed

When we grab a superblock off disk via get_sb(), we don't know what
the in-memory superblock we are filling out contained. We need to
ensure that the entire structure is returned in an initialised
state regardless of which fields libxfs_sb_from_disk() populates
from disk. In this case, it doesn't populate the sb_crc field,
and so uninitialised values can escape through to disk on v4
filesystems because of this. This causes xfs/031 to fail on v4
filesystems.

Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_io: add support for flink

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: fix use after free in inode_item_done()

Commit "3a19fb7 libxfs: stop caching inode structures"
introduced a use after free.

libxfs_iput() already does the check for ip->i_itemp, and a
kmem_zone_free() if it's present, and then frees the ip pointer.
Re-checking ip->i_itemp after the libxfs_iput call will access
the freed ip pointer, as will setting ip_>i_itemp to NULL.

Simply remove the offending code to fix this up.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: fix array overrun in do_inode_prefetch

Coverity spotted this:

do_inode_prefetch() does a while loop, creating queues:

for (i = 0; i < thread_count; i++) {
...
create_work_queue(&queues[i], mp, 1);
...
}

and then does this to wait for them all to complete:

for (; i >= 0; i--)
destroy_work_queue(&queues[i]);

But we leave the first for loop with (i == thread_coun)t, and
the second one will try to index queues[] one past the end.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: phase 1 does not handle superblock CRCs

Phase 1 of xfs_repair verifies and corrects the primary
superblock of the filesystem. It does not verify that the CRC of the
superblock that is found is correct, nor does it recalculate the CRC
of the superblock it rewrites.

This happens because phase1 does not use the libxfs buffer cache -
it just uses pread/pwrite on a memory buffer, and works directly
from that buffer. Hence we need to add CRC verification to
verify_sb(), and CRC recalculation to write_primary_sb() so that it
works correctly.

This also enables us to use get_sb() as the method of fetching the
superblock from disk after phase 1 without needing to use the libxfs
buffer cache and guessing at the sector size. This prevents a
verifier error because it attempts to CRC a superblock buffer that
is much longer than the usual sector sizes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: Use EFSBADCRC for CRC validity indication

xfs_db currently gives indication as to whether a buffer CRC is ok
or not. Currently it does this by checking for EFSCORRUPTED in the
b_error field of the buffer. Now that we have EFSBADCRC to indicate
a bad CRC independently of structure corruption, use that instead to
drive the CRC correct/incorrect indication in the structured output.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: modify verifiers to differentiate CRC from other errors

[userspace port]

Modify all read & write verifiers to differentiate
between CRC errors and other inconsistencies.

This sets the appropriate error number on bp->b_error,
and then calls xfs_verifier_error() if something went
wrong. That function will issue the appropriate message
to the user.

Also, fix the silly bug in xfs_buf_ioerror() that this patch
exposes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: add xfs_verifier_error()

[userspace port]

We want to distinguish between corruption, CRC errors,
etc.  In addition, the full stack trace on verifier errors
seems less than helpful; it looks more like an oops than
corruption.

Create a new function to specifically alert the user to
verifier errors, which can differentiate between
EFSCORRUPTED and CRC mismatches.  It doesn't dump stack
unless the xfs error level is turned up high.

Define a new error message (EFSBADCRC) to clearly identify
CRC errors.  (Defined to EBADMSG, bad message)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: add helper for updating checksums on xfs_bufs

[userspace port]

Many/most callers of xfs_update_cksum() pass bp->b_addr and
BBTOB(bp->b_length) as the first 2 args. Add a helper
which can just accept the bp and the crc offset, and work
it out on its own, for brevity.

Signed-off-by: Dave Chinner <dchinner@redhat.com>

libxfs: add helper for verifying checksums on xfs_bufs

[userspace port]

Many/most callers of xfs_verify_cksum() pass bp->b_addr and
BBTOB(bp->b_length) as the first 2 args. Add a helper
which can just accept the bp and the crc offset, and work
it out on its own, for brevity.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: Use defines for CRC offsets in all cases

[userspace port]

Some calls to crc functions used useful #defines,
others used awkward offsetof() constructs.

Switch them all to #define to make things a bit cleaner.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: skip pointless CRC updates after verifier failures

[userspace port]

Most write verifiers don't update CRCs after the verifier
has failed and the buffer has been marked in error. These
two didn't, but should.

Add returns to the verifier failure block, since the buffer
won't be written anyway.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: limit superblock corruption errors to actual corruption

[userspace port]

Today, if

xfs_sb_read_verify
  xfs_sb_verify
    xfs_mount_validate_sb

detects superblock corruption, it'll be extremely noisy, dumping
2 stacks, 2 hexdumps, etc.

This is because we call XFS_CORRUPTION_ERROR in xfs_mount_validate_sb
as well as in xfs_sb_read_verify.

Also, *any* errors in xfs_mount_validate_sb which are not corruption
per se; things like too-big-blocksize, bad version, bad magic, v1 dirs,
rw-incompat etc - things which do not return EFSCORRUPTED - will
still do the whole XFS_CORRUPTION_ERROR spew when xfs_sb_read_verify
sees any error at all.  And it suggests to the user that they
should run xfs_repair, even if the root cause of the mount failure
is a simple incompatibility.

I'll submit that the probably-not-corrupted errors don't warrant
this much noise, so this patch removes the warning for anything
other than EFSCORRUPTED returns, and replaces the lower-level
XFS_CORRUPTION_ERROR with an xfs_notice().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: skip verification on initial "guess" superblock read

[userspace port]

When xfs_readsb() does the very first read of the superblock,
it makes a guess at the length of the buffer, based on the
sector size of the underlying storage. This may or may
not match the filesystem sector size in sb_sectsize, so
we can't i.e. do a CRC check on it; it might be too short.

In fact, mounting a filesystem with sb_sectsize larger
than the device sector size will cause a mount failure
if CRCs are enabled, because we are checksumming a length
which exceeds the buffer passed to it.

So always read twice; the first time we read with NULL
buffer ops to skip verification; then set the proper
read length, hook up the proper verifier, and give it
another go.

Once we are sure that we've got the right buffer length,
we can also use bp->b_length in the xfs_sb_read_verify,
rather than the less-trusted on-disk sectorsize for
secondary superblocks. Before this we ran the risk of
passing junk to the crc32c routines, which didn't always
handle extreme values.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: xfs_sb_read_verify() doesn't flag bad crcs on primary sb

[userspace port]

My earlier commit 10e6e65 deserves a layer or two of brown paper
bags. The logic in that commit means that a CRC failure on the
primary superblock will *never* result in an error return.

Hopefully this fixes it, so that we always return the error
if it's a primary superblock, otherwise only if the filesystem
has CRCs enabled.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: sanitize sb_inopblock in xfs_mount_validate_sb

[userspace port]

xfs_mount_validate_sb doesn't check sb_inopblock for sanity
(as does its xfs_repair counterpart, FWIW).

If it's out of bounds, we can go off the rails in i.e.
xfs_inode_buf_verify(), which uses sb_inopblock as a loop
limit when stepping through a metadata buffer.

The problem can be demonstrated easily by corrupting
sb_inopblock with xfs_db and trying to mount the result:

# mkfs.xfs -dfile,name=fsfile,size=1g
# xfs_db -x fsfile
xfs_db> sb 0
xfs_db> write inopblock 512
inopblock = 512
xfs_db> quit

# mount -o loop fsfile mnt
and we blow up in xfs_inode_buf_verify().

With this patch, we get a (very noisy) corruption error,
and fail the mount as we should.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: be more forgiving of a v4 secondary sb w/ junk in v5 fields

[userspace port]

Today, if xfs_sb_read_verify encounters a v4 superblock
with junk past v4 fields which includes data in sb_crc,
it will be treated as a failing checksum and a significant
corruption.

There are known prior bugs which leave junk at the end
of the V4 superblock; we don't need to actually fail the
verification in this case if other checks pan out ok.

So if this is a secondary superblock, and the primary
superblock doesn't indicate that this is a V5 filesystem,
don't treat this as an actual checksum failure.

We should probably check the garbage condition as
we do in xfs_repair, and possibly warn about it
or self-heal, but that's a different scope of work.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>