git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

xfs: Use consistent logging message prefixes

The second and subsequent lines of multi-line logging messages
are not prefixed with the same information as the first line.

Separate messages with newlines into multiple calls to ensure
consistent prefixing and allow easier grep use.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: remote attribute headers contain an invalid LSN

In recent testing, a system that crashed failed log recovery on
restart with a bad symlink buffer magic number:

XFS (vda): Starting recovery (logdev: internal)
XFS (vda): Bad symlink block magic!
XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 2060

On examination of the log via xfs_logprint, none of the symlink
buffers in the log had a bad magic number, nor were any other types
of buffer log format headers mis-identified as symlink buffers.
Tracing was used to find the buffer the kernel was tripping over,
and xfs_db identified it's contents as:

000: 5841524d 00000000 00000346 64d82b48 8983e692 d71e4680 a5f49e2c b317576e
020: 00000000 00602038 00000000 006034ce d0020000 00000000 4d4d4d4d 4d4d4d4d
040: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
060: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
.....

This is a remote attribute buffer, which are notable in that they
are not logged but are instead written synchronously by the remote
attribute code so that they exist on disk before the attribute
transactions are committed to the journal.

The above remote attribute block has an invalid LSN in it - cycle
0xd002000, block 0 - which means when log recovery comes along to
determine if the transaction that writes to the underlying block
should be replayed, it sees a block that has a future LSN and so
does not replay the buffer data in the transaction. Instead, it
validates the buffer magic number and attaches the buffer verifier
to it. It is this buffer magic number check that is failing in the
above assert, indicating that we skipped replay due to the LSN of
the underlying buffer.

The problem here is that the remote attribute buffers cannot have a
valid LSN placed into them, because the transaction that contains
the attribute tree pointer changes and the block allocation that the
attribute data is being written to hasn't yet been committed. Hence
the LSN field in the attribute block is completely unwritten,
thereby leaving the underlying contents of the block in the LSN
field. It could have any value, and hence a future overwrite of the
block by log recovery may or may not work correctly.

Fix this by always writing an invalid LSN to the remote attribute
block, as any buffer in log recovery that needs to write over the
remote attribute should occur. We are protected from having old data
written over the attribute by the fact that freeing the block before
the remote attribute is written will result in the buffer being
marked stale in the log and so all changes prior to the buffer stale
transaction will be cancelled by log recovery.

Hence it is safe to ignore the LSN in the case or synchronously
written, unlogged metadata such as remote attribute blocks, and to
ensure we do that correctly, we need to write an invalid LSN to all
remote attribute blocks to trigger immediate recovery of metadata
that is written over the top.

As a further protection for filesystems that may already have remote
attribute blocks with bad LSNs on disk, change the log recovery code
to always trigger immediate recovery of metadata over remote
attribute blocks.

cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: Fix uninitialized return value in xfs_alloc_fix_freelist()

xfs_alloc_fix_freelist() can sometimes jump to out_agbp_relse
without ever setting value of 'error' variable which is then
returned. This can happen e.g. when pag->pagf_init is set but AG is
for metadata and we want to allocate user data.

Fix the problem by initializing 'error' to 0, which is the desired
return value when we decide to skip this group.

CC: xfs@oss.sgi.com
Coverity-id: 1309714
Signed-off-by: Jan Kara <jack@suse.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: remote attributes need to be considered data

We don't log remote attribute contents, and instead write them
synchronously before we commit the block allocation and attribute
tree update transaction. As a result we are writing to the allocated
space before the allcoation has been made permanent.

As a result, we cannot consider this allocation to be a metadata
allocation. Metadata allocation can take blocks from the free list
and so reuse them before the transaction that freed the block is
committed to disk. This behaviour is perfectly fine for journalled
metadata changes as log recovery will ensure the free operation is
replayed before the overwrite, but for remote attribute writes this
is not the case.

Hence we have to consider the remote attribute blocks to contain
data and allocate accordingly. We do this by dropping the
XFS_BMAPI_METADATA flag from the block allocation. This means the
allocation will not use blocks that are on the busy list without
first ensuring that the freeing transaction has been committed to
disk and the blocks removed from the busy list. This ensures we will
never overwrite a freed block without first ensuring that it is
really free.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: xfs_bunmapi() does not need XFS_BMAPI_METADATA flag

xfs_bunmapi() doesn't care what type of extent is being freed and
does not look at the XFS_BMAPI_METADATA flag at all. As such we can
remove the XFS_BMAPI_METADATA from all callers that use it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: don't cast string literals

The commit:

a9273ca5 xfs: convert attr to use unsigned names

added these (unsigned char *) casts, but then the _SIZE macros
return "7" - size of a pointer minus one - not the length of
the string. This is harmless in the kernel, because the _SIZE
macros are not used, but as we sync up with userspace, this will
matter.

I don't think the cast is necessary; i.e. assigning the string
literal to an unsigned char *, or passing it to a function
expecting an unsigned char *, should be ok, right?

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: Fix file type directory corruption for btree directories

Users have occasionally reported that file type for some directory
entries is wrong. This mostly happened after updating libraries some
libraries. After some debugging the problem was traced down to
xfs_dir2_node_replace(). The function uses args->filetype as a file type
to store in the replaced directory entry however it also calls
xfs_da3_node_lookup_int() which will store file type of the current
directory entry in args->filetype. Thus we fail to change file type of a
directory entry to a proper type.

Fix the problem by storing new file type in a local variable before
calling xfs_da3_node_lookup_int().

Reported-by: Giacomo Comes <comes@naic.edu>
Signed-off-by: Jan Kara <jack@suse.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: remove self-assignment in libxfs/util.c

We don't have percpu counters in userspace, so libxfs plays
tricks.  Rather than calling percpu_counter_set() in
xfs_reinit_percpu_counters, we just directly assign
the values in mp->m_sb to the counters in mp.

But this was already handled by #defining the percpu counters
in the mount structure to those in the superblock, i.e.:

#define m_icount        m_sb.sb_icount
#define m_ifree         m_sb.sb_ifree
#define m_fdblocks      m_sb.sb_fdblocks

so we actually end up with pointless self-assignment.

Define away the xfs_reinit_percpu_counters() function,
because it's a no-op.

Addresses-Coverity-Id: 1298009
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: unconditionally free blockmaps when threads complete

blkmap_free() doesn't actually free the block map unless it's
inordinately large; this keeps us from constantly freeing
and re-allocating blockmaps for each inode, which makes sense.

However, once the threads which have allocated these structures
exit, we should actually free them; they can grow up to 2MB
for each of the data and attr maps, for each thread, and not
be freed through the normal blkmap_free() test.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: call IRELE(ip) after libxfs_trans_iget calls

Commit 260c85e libxfs: dont free xfs_inode until complete
changed the alloc/free convention a bit:

    Originally, the xfs_inode are released upon the first
    call to xfs_trans_cancel, xfs_trans_commit, or
    inode_item_done.
    <snip>
    This patch does the following:
     1) Removes the iput from the transaction completion and
        requires that the xfs_inode allocators call IRELE()
        when they are done with the pointer.

But that change missed several callers in xfs_repair phase6;
fix that up.

Addresses-Coverity-Id: 1315100
Addresses-Coverity-Id: 1315101
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: free msgbuf on exit

Just to keep valgrind less noisy, and make it easiser to spot
more things that actually matter ...

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: fix memory leasks in libxfs_umount()

libxfs_umount was failing to free a handful of resources; fix that
up. Call it from xfs_copy as well, while we're at it; every other
libxfs_mount has a libxfs_umount counterpart, at least on a clean
exit.

[dchinner: fix superblock buffer leak uncovered by adding
libxfs_umount() to xfs_copy. ]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: fix broken EFSBADCRC/EFSCORRUPTED usage with buffer errors

When we encounter CRC or verifier errors, bp->b_error is set to
-EFSBADCRC and -EFSCORRUPTED; note the negative sign. For whatever
reason, repair and db use the positive versions, and therefore fail to
notice the error, so fix all the broken uses.

Note however that the db and repair turn the negative codes returned
by libxfs into positive codes that can be used with strerror.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_db: don't crash on a corrupt inode

If the user selects a corrupt inode via the 'inode XXX' command, the
read verifier will fail and the io cursor at the top of the ring will
not have any data attached. When this is the case, we cannot
dereference the NULL pointer or xfs_db will crash. Therefore, check
the buffer pointer before using it.

It's arguable that we ought to retry the read without the verifiers
if the inode is corrupt or fails CRC, since this /is/ a debugging
tool, and maybe you wanted the contents anyway.

[dchinner: fixes xfs/003 on 1k block size failure]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: readahead of dir3 data blocks should use the read verifier

In the dir3 data block readahead function, use the regular read
verifier to check the block's CRC and spot-check the block contents
instead of calling the spot-checking routine directly. This prevents
corrupted directory data blocks from being read into the kernel, which
can lead to garbage ls output and directory loops (if say one of the
entries contains invalid characters).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: fix wrong logic when validating node magic number

Magic number is wrong only when != XFS_DA_NODE_MAGIC and
!= XFS_DA3_NODE_MAGIC.

This is triggered by shared/002 when testing 512 block size XFS.

  Phase 1 - find and verify superblock...
  Phase 2 - using internal log
          - scan filesystem freespace and inode maps...
          - found root inode chunk
  Phase 3 - for each AG...
          - scan (but don't clear) agi unlinked lists...
          - process known inodes and perform inode discovery...
          - agno = 0
  bad magic number febe in block 64 (108) for directory inode 35
  ......

Fix it by changing "||" to "&&".

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Release v4.2.0-rc2

Update all the release files for a 4.2.0-rc2 release.

Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: v3 inodes are only valid on crc-enabled filesystems

xfs_repair was not detecting that version 3 inodes are invalid for
for non-CRC filesystems. The result is specific inode corruptions go
undetected and hence aren't repaired if only the version number is
out of range.

The core of the problem is that the XFS_DINODE_GOOD_VERSION() macro
doesn't know that valid inode versions are dependent on a superblock
version number. Fix this in libxfs, and propagate the new function
out into the rest of xfsprogs to fix the issue.

[dchinner: forward port from 3.2.4 to 4.2.0-rc1, move
xfs_dinode_good_version() to libxfs/xfs_inode-buf.c with all the
other dinode validation functions. ]

Reported-by: Leslie Rhorer <lrhorer@mygrande.net>
Signed-off-by: Roger Willcocks <roger@filmlight.ltd.uk>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

doc: Update OS X build info and limitations

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

build: Add fls check into autoconf

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

build:: Add mntent.h check into autoconf

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

build: Change OS X-specific CFLAGS/LDFLAGS

OS X uses clang as a default compiler.
So remove incompatible options.

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: Fix attr leaf block definition

struct xfs_attr_leafblock contains 'entries' array which is declared
with size 1 altough it can in fact contain much more entries. Since this
array is followed by further struct members, gcc (at least in version
4.8.3) thinks that the array has the fixed size of 1 element and thus
optimizes away all accesses beyond the end of array resulting in
non-working code. In particular this problem was seen with
xfsprogs-3.1.8.

Signed-off-by: Jan Kara <jack@suse.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

mkfs.xfs: fix ftype-vs-crc option combination testing

mkfs.xfs got weird along the way; today it has different outcomes
depending on the order of option specification:

$ mkfs/mkfs.xfs -n ftype=1 -m crc=0 -dfile,name=fsfile,size=16g
cannot specify both crc and ftype
$ mkfs/mkfs.xfs -m crc=0 -n ftype=1 -dfile,name=fsfile,size=16g
<succeeds>

Somehow the tests got written as being constrained on what options
are specified - and in what order! - vs actually testing for
incompatible feature sets.

It's fine to specify both crc & ftype options, as long as it's an
allowed combination, so just test for the incompatible combination
(crc=1 and ftype=0) after all options have been processed.

[dchinner: fix dirftype init value so mkfs default config works]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

libxfs: remove sparse inode mount warning

The sparse inodes experimental feature warning fires multiple times
during mkfs because the warning is emitted as part of the superblock
verifier codepath. The warning is intended as a mount-time warning only
and has been relocated as such in the kernel repo.

Remove the warning from libxfs such that it is not emitted from
userspace.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: fix syntax error in include/buildmacros

There's an extra ";" in include/buildmacros, which causes make
install-dev to fail

    ......
    Installing libhandle-install-dev
    cd ../libhandle/.libs; ... if [ "x/usr/lib64" != "x/usr/lib64"; ]; ...
    /bin/sh: line 0: [: missing `]'
    /bin/sh: ]: command not found
    ......

This was introduced by
02ef543 libhandle: fix installation for symlinked /usr

Signed-off-by: Eryu Guan <eguan@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_fsr: Fix parentheses around truth value

Someone in the distant past must have responded to gcc's
warning about parentheses around assignment used as a truth
value by changing:

while (ret = func() == 0)
to:
while ((ret = func() == 0))

While this shuts up gcc, it doesn't yield the proper result.
If func () returns 0, func == 0 is true, and ret is assigned
a value of 1.

This does keep the while loop going, but it's a very strange
way to go about it, and may someday yield confusing results.

Fix this as:

while ((ret = func()) == 0)

so that ret gets the function return value as expected.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: prevent LIST_ macros conflicts

BSD 4.4 added some LIST_ macros into system header files, which
causes "macro redefined" warnings. To ensure we use our own macros,
undefine the system ones at first.

The conflicting macros are LIST_HEAD and LIST_HEAD_INIT

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: missing and dummy calls for OS X support

Add and update various API, macros and types where is some change
in OS X or xfsprogs. Most changes are in darwin.h.

Add dummy implementations where native support is nonexistent
and the tools are not expected to work anyway, so all tools can be
at least compiled.

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Add includes required for OS X builds

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Add ifdef dirent checks where it was missing

CHANGED: text width fix

Add check for _DIRENT_HAVED_RECLEN/_OFF to read_directory().
In dump_dirent() these checks already are used, but they were
missing in read_directory.

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: undefined variable fix

Typo fix, which wasn't catch earlier due to #ifdef branching. The
'rmnttomname' does not exists anywhere and looks like a hybrid between
rmntfromname and rmntonname. And because the previous if has has
'fromname' on both arguments of realpath, I choose the same approach
when fixing it.

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

db: fix uninitialised variable warnings

New versions of gcc barf on the conversion table code in
db/convert.c. Shut it up by initialising the conversion array.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Release v4.2.0-rc1

Update all the release files for a 4.2.0-rc1 release.

Signed-off-by: Dave Chinner <david@fromorbit.com>

Merge branch 'progs-misc-fixes-2' into for-next

Conflicts:
copy/xfs_copy.c

libxfs: fix uuid check durign inode allocation

Needs to check sb_meta_uuid now that we the sb_uuid can change on v5
filesystems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: use sb_meta_uuid for checking of metadata headers

Now that we can change the uuid on v5 filesystems, we always need to
verify the metadata uuid against sb_meta_uuid, not sb_uuid. This
fixes quite a few xfstests failures when UUIDs are changed before
executing tests.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_repair: Fix malloc size of rt_ext_tree_ptr

rt_ext_tree_ptr points to an avl64tree_desc_t, but we malloc memory
according to the size of avltree_desc_t. Oddly, the latter happens
to be larger, so we're ok, but may as well make it correct.

Addresses-Coverity-Id: 1297533
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs_copy: fix up initial sb buffer read on CRC fs

My prior commit, aaf90a2 xfs_copy: fix copy of hard 4k devices
causes xfs_copy to emit a CRC error warning when copying a
CRC filesystem.

This is because we are now reading the maximum sector size,
and attempting to verify the CRC based on that (likely incorrect)
length.

In xfs_db, we currently just don't verify this read, so it's
not a problem. In xfs_copy, we almost certainly want to verify.

So, first do the maximal read with no verifier; once it's read,
drop that buffer, and re-read with the proper sector size and
verifier.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Add new sb_meta_uuid field, update userspace tools to manipulate it

This adds a new superblock field, sb_meta_uuid. This allows us to
change the use-visible UUID on crc-enabled filesytems from userspace
if desired, by copying the existing UUID to the new location for
metadata comparisons. If this is done, an incompat flag must be
set to prevent older filesystems from mounting the filesystem, but
the original UUID can be restored, and the incompat flag removed,
with a new xfs_db / xfs_admin UUID command, "restore."

Much of this patch mirrors the kernel patch in simply renaming
the field used for metadata uuid comparison; other bits:

* Teach xfs_db to print the new meta_uuid field
* Allow xfs_db to generate a new UUID for CRC-enabled filesystems
* Allow xfs_db to revert to the original UUID and clear the flag
* Fix up xfs_copy to work with CRC-enabled filesystems
* Update the xfs_admin manpage to show the UUID "restore" command

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

Merge branch 'progs-header-cleanup' into for-next

xfsprogs: use "unsigned short" instead of ushort

Android's bionic libc doesn't define ushort. There isn't a real
benefit (other than perhaps conciseness) to use ushort over "unsigned
short", and it's only used in a handful of files in xfsprogs. So
change over to using unsigned short everywhere.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: avoid use of si_tid in struct xlog_split_item

In Android's bionic libc (as well as the Linux kernel's
include/uapi/asm-generic/siginfo.h), si_tid is a #define to provide
backwards compatibility for the timerid in the siginfo structure.
This breaks the compile of logprint/log_misc.c. Change this to be
si_xtid in order to avoid a namespace collision

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: define and use BUILD_CC in configure.ac for cross compilation

In order to support cross-compilation, we need to build gen_crc32table
using the C compiler targetted for the build platform, since it is run
as part of the build process.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: pull in libgen.h to get prototype for basename()

The function prototype for basename() is in <libgen.h>, per Posix.
Without the the function prototype, the build will throw errors due to
the missing prototype.

On glibc, using libgen.h will force the use of Poxis's basename(),
instead of glibc's basename() with GNU extensions. However, xfsprogs
doesn't depend on any of the GNU extensions, so this is fine.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: define NBBY if not defined by the system header files

Android's bionic libc doesn't define NBBY; this isn't a standard
define, and since all modern/sane platforms have 8 bits per byte, use
this as a default.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Use glibtoolize on osx

OS X doesn't have libtoolize binary by default, and the available$
ports are named "glibtoolize". Autodetect this issue.

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Don't Make .po files with gettext disabled

"po" target is added only if gettext binary is found.
Without this patch, Make tried to build the target even
with --enable-gettext=no configure option, which led
to a failing build.

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: Search path for utilities unified

Currently, when autoconf is checking for an utility, every utility has
its own pathes defined independently. Unify it in a single variable
used for (almost) all utilities.

Also, add /opt/local/bin to the path.

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: blkid is now mandatory

Because blkid is here for a long time, I hereby propose a patch for
removing support for NOT having blkid. The current support through
set of #ifdef is prone to errors like
making a patch just in one of the branches, and according to a
recent talk between Christoph and Eric, it is not necessary to keep
it supported.

Remove code for checking ENABLE_BLKID, and the code when
ENABLE_BLKID is not defined. The only use of libdisk was in the
removed code, so remove libdisk too. It makes blkid required for
compilation.

[dchinner: also remove include/volume.h and include/dvh.h as
suggested by Christoph during review. ]

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: don't install platform_defs.h

platform_defs.h is a generated header file, which causes all kinds
of problems when installed on multiarch systems, and requires
workarounds in distribution packages. Instead move the small parts
of it needed in the installed xfs.h into xfs.h and keep
platform_defs.h private to xfsprogs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: move __u*/__s* typedefs to per-port headers

Currently we have to install the autoconf-generated platform_defs.h
to get the defintions for these. But they are clearly a feature
of Linux vs non-Linux platforms so move them to the per-port headers
instead.

Note: in the long run it might be a good idea to just the standard
uint*_t/int*_t types instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: move __be*/__le* types and __arch_pack to xfs_arch.h

These are defines and typedefs only needed for the XFS on disk format,
so there is no need to have the available for every user of xfs.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: simplify internal includes

We don't need the xfs/ prefix for local includes if we just add the
libxfs directory to the include path. Once that is done we only
need to link the installed headers into include/xfs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: use <>-style includes in installed headers

Once installed these are system headers, so we need to use <>-style
include statements between them.

[dchinner: fix include/xfs creation as this changes include/xfs
population dependencies]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: remove install-qa target

Now that we don't install all the libxfs internals but just the disk
format definitions we can install those as part of the normal
install-dev target.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: only install *format.h headers in install-qa

Now that we've properly split up the headers we don't need to install all
the libxfs-internal headers for xfstests. Just install the three headers
defining the on-disk format and xfs_arch.h which is required to compile
them instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: remove swab.h

The macros in swab.h are only used to implement those in xfs_arch.h, so let's
consolidate the two headers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: remove unused macros from xfs_arch.h

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: don't install internal header files

All the headers in $(HFILES) are internal to xfsprogs and should not be
installed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

build: create include/xfs before installing headers

Currently the install-headers rule from include/Makefile creates
include/xfs, but there is no guarantee that it will be the first
directory that make executes that rule in. Hence other directories
can race with the creation on include/xfs and fail.

Move the creation of include/xfs to occur before running the
install_headers rules on the subdirectories to avoid any possible
races with creation.

Signed-off-by: Dave Chinner <dchinner@redhat.com>

Merge branch 'libxfs-commit-script' into for-next

Merge branch 'progs-cleanup' into for-next

Merge branch 'libxfs-4.2-rc1-update' into for-next

xfs: check min blks for random debug mode sparse allocations

The inode allocator enables random sparse inode chunk allocations in
DEBUG mode to facilitate testing. Sparse inode allocations are not
always possible, however, depending on the fs geometry. For example,
there is no possibility for a sparse inode allocation on filesystems
where the block size is large enough to fit one or more inode chunks
within a single block.

Fix up the DEBUG mode sparse inode allocation logic to trigger random
sparse allocations only when the geometry of the fs allows it.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: use percpu_counter_read_positive for mp->m_icount

Function percpu_counter_read just return the current counter, which can be
negative. This will cause the checking of "allocated inode
counts <= m_maxicount" false positive. Use percpu_counter_read_positive can
solve this problem, and be consistent with the purpose to introduce percpu
mechanism to xfs.

Signed-off-by: George Wang <xuw2015@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: clean up XFS_MIN_FREELIST macros

We no longer calculate the minimum freelist size from the on-disk
AGF, so we don't need the macros used for this. That means the
nested macros can be cleaned up, and turn this into an actual
function so the logic is clear and concise. This will make it much
easier to add support for the rmap btree when the time comes.

This also gets rid of the XFS_AG_MAXLEVELS macro used by these
freelist macros as it is simply a wrapper around a single variable.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: sanitise error handling in xfs_alloc_fix_freelist

The error handling is currently an inconsistent mess as every error
condition handles return values and releasing buffers individually.
Clean this up by using gotos and a sane error label stack.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

Merge branch 'progs-misc-fixes-1' into for-next

xfs: factor out free space extent length check

The longest extent length checks in xfs_alloc_fix_freelist() are now
essentially identical. Factor them out into a helper function, so we
know they are checking exactly the same thing before and after we
lock the AGF.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: xfs_alloc_fix_freelist() can use incore perag structures

At the moment, xfs_alloc_fix_freelist() uses a mix of per-ag based
access and agf buffer based access to freelist and space usage
information. However, once the AGF buffer is locked inside this
function, it is guaranteed that both the in-memory and on-disk
values are identical. xfs_alloc_fix_freelist() doesn't modify the
values in the structures directly, so it is a read-only user of the
infomration, and hence can use the per-ag structure exclusively for
determining what it should do.

This opens up an avenue for cleaning up a lot of duplicated logic
whose only difference is the structure it gets the data from, and in
doing so removes a lot of needless byte swapping overhead when
fixing up the free list.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: xfs_attr_inactive leaves inconsistent attr fork state behind

xfs_attr_inactive() is supposed to clean up the attribute fork when
the inode is being freed. While it removes attribute fork extents,
it completely ignores attributes in local format, which means that
there can still be active attributes on the inode after
xfs_attr_inactive() has run.

This leads to problems with concurrent inode writeback - the in-core
inode attribute fork is removed without locking on the assumption
that nothing will be attempting to access the attribute fork after a
call to xfs_attr_inactive() because it isn't supposed to exist on
disk any more.

To fix this, make xfs_attr_inactive() completely remove all traces
of the attribute fork from the inode, regardless of it's state.
Further, also remove the in-core attribute fork structure safely so
that there is nothing further that needs to be done by callers to
clean up the attribute fork. This means we can remove the in-core
and on-disk attribute forks atomically.

Also, on error simply remove the in-memory attribute fork. There's
nothing that can be done with it once we have failed to remove the
on-disk attribute fork, so we may as well just blow it away here
anyway.

cc: <stable@vger.kernel.org> # 3.12 to 4.0
Reported-by: Waiman Long <waiman.long@hp.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: always log the inode on unwritten extent conversion

The fsync() requirements for crash consistency on XFS are to flush file
data and force any in-core inode updates to the log. We currently check
whether the inode is pinned to identify whether the log needs to be
forced, since a non-zero pin count generally represents an inode that
has transactions awaiting a flush to the on-disk log.

This is not sufficient in all cases, however. Reports of xfstests test
generic/311 failures on ppc64/s390x hosts have identified failures to
fsync outstanding inode modifications due to the inode not being pinned
at the time of the fsync. This occurs because certain bmap updates can
complete by logging bmapbt buffers but without ever dirtying (and thus
pinning) the core inode. The following is a specific incarnation of this
problem:

$ mount $dev /mnt -o noatime,nobarrier
$ for i in $(seq 0 2 31); do \
xfs_io -f -c "falloc $((i * 32768)) 32k" -c fsync /mnt/file; \
done
$ xfs_io -c "pwrite -S 0 80k 16k" -c fsync -c "pwrite 76k 4k" -c fsync /mnt/file; \
hexdump /mnt/file; \
./xfstests-dev/src/godown /mnt
...
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0013000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0014000 0000 0000 0000 0000 0000 0000 0000 0000
*
00f8000
$ umount /mnt; mount ...
$ hexdump /mnt/file
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
00f8000

In short, the unwritten extent conversion for the last write is lost
despite the fact that an fsync executed before the filesystem was
shutdown. Note that this is impossible to reproduce on v5 supers due to
unconditional time callbacks for di_changecount and highly difficult to
reproduce on CONFIG_HZ=1000 kernels due to those same callbacks
frequently updating cmtime prior to the bmap update. CONFIG_HZ=100
reduces timer granularity enough to increase the odds that time updates
are skipped and allows this to reproduce within a handful of attempts.

To deal with this problem, unconditionally log the core in the unwritten
extent conversion path. Fix up logflags after the extent conversion to
keep the extent update code consistent with the other extent update
helpers. This fixup is not necessary for the other (hole, delay) extent
helpers because they execute in the block allocation codepath, which
already logs the inode for other reasons (e.g., for di_nblocks).

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: extent size hints can round up extents past MAXEXTLEN

This results in BMBT corruption, as seen by this test:

# mkfs.xfs -f -d size=40051712b,agcount=4 /dev/vdc
....
# mount /dev/vdc /mnt/scratch
# xfs_io -ft -c "extsize 16m" -c "falloc 0 30g" -c "bmap -vp" /mnt/scratch/foo

which results in this failure on a debug kernel:

XFS: Assertion failed: (blockcount & xfs_mask64hi(64-BMBT_BLOCKCOUNT_BITLEN)) == 0, file: fs/xfs/libxfs/xfs_bmap_btree.c, line: 211
....
Call Trace:
[<ffffffff814cf0ff>] xfs_bmbt_set_allf+0x8f/0x100
[<ffffffff814cf18d>] xfs_bmbt_set_all+0x1d/0x20
[<ffffffff814f2efe>] xfs_iext_insert+0x9e/0x120
[<ffffffff814c7956>] ? xfs_bmap_add_extent_hole_real+0x1c6/0xc70
[<ffffffff814c7956>] xfs_bmap_add_extent_hole_real+0x1c6/0xc70
[<ffffffff814caaab>] xfs_bmapi_write+0x72b/0xed0
[<ffffffff811c72ac>] ? kmem_cache_alloc+0x15c/0x170
[<ffffffff814fe070>] xfs_alloc_file_space+0x160/0x400
[<ffffffff81ddcc29>] ? down_write+0x29/0x60
[<ffffffff815063eb>] xfs_file_fallocate+0x29b/0x310
[<ffffffff811d2bc8>] ? __sb_start_write+0x58/0x120
[<ffffffff811e3e18>] ? do_vfs_ioctl+0x318/0x570
[<ffffffff811cd680>] vfs_fallocate+0x140/0x260
[<ffffffff811ce6f8>] SyS_fallocate+0x48/0x80
[<ffffffff81ddec09>] system_call_fastpath+0x12/0x17

The tracepoint that indicates the extent that triggered the assert
failure is:

xfs_iext_insert:   idx 0 offset 0 block 16777224 count 2097152 flag 1

Clearly indicating that the extent length is greater than MAXEXTLEN,
which is 2097151. A prior trace point shows the allocation was an
exact size match and that a length greater than MAXEXTLEN was asked
for:

xfs_alloc_size_done:  agno 1 agbno 8 minlen 2097152 maxlen 2097152
    ^^^^^^^        ^^^^^^^

We don't see this problem with extent size hints through the IO path
because we can't do single IOs large enough to trigger MAXEXTLEN
allocation. fallocate(), OTOH, is not limited in it's allocation
sizes and so needs help here.

The issue is that the extent size hint alignment is rounding up the
extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking
into account extent size hints when calculating the maximum extent
length to allocate. xfs_bmapi_reserve_delalloc() is already doing
this, but direct extent allocation is not.

Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is
wrong, and it works only because delayed allocation extents are not
limited in size to MAXEXTLEN in the in-core extent tree. hence this
calculation does not work for direct allocation, and the delalloc
code needs fixing. This may, in fact be the underlying bug that
occassionally causes transaction overruns in delayed allocation
extent conversion, so now we know it's wrong we should fix it, too.
Many thanks to Brian Foster for finding this problem during review
of this patch.

Hence the fix, after much code reading, is to allow
xfs_bmap_extsize_align() to align partial extents when full
alignment would extend the alignment past MAXEXTLEN. We can safely
do this because all callers have higher layer allocation loops that
already handle short allocations, and so will simply run another
allocation to cover the remainder of the requested allocation range
that we ignored during alignment. The advantage of this approach is
that it also removes the need for callers to do anything other than
limit their requests to MAXEXTLEN - they don't really need to be
aware of extent size hints at all.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfs: return a void pointer from xfs_buf_offset

This avoids all kinds of unessecary casts in an envrionment like Linux where
we can assume that pointer arithmetics are support on void pointers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: helper to transition inode blocks to inode state

The state of each block in an inode chunk transitions from free state to
inode state as we process physical inodes on disk. We take care to
detect invalid transitions and warn the user if multiply claimed blocks
are detected.

This block of code is a largish switch statement that is executed twice
due to the implementation details of the inode processing loop. Factor
it into a new helper.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: helper to import on-disk inobt records to in-core trees

In the common case, the in-core inode state from the on-disk inobt
records is imported from the inobt and validated against the finobt (if
one exists). When both trees exist along with some form of corruption,
it's possible to find inodes in the finobt not tracked by the inobt.
While this is unexpected, we attempt to repair by importing the inodes
from the finobt.

The associated code in the finobt scan function mirrors the associated
code in the inobt scan function. Factor this into a separate helper that
can be called by either tree scan.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: helper for inode chunk alignment and start/end ino number verification

The inobt scan code executes different routines for processing inobt
records and finobt records. While some verification differs between the
trees, much of it is the same. One such example of this is the inode
record alignment and start/end inode number verification. The only
difference between the inobt and finobt verification is the error
message that is generated as a result of failure.

Factor out these alignment checks into a new helper that takes an enum
parameter that identifies which tree is undergoing the scan. Use a new
string array for this function and subsequent common inobt scan helpers
to convert the enum to the name of the tree for the purposes of
including in any resulting warning messages.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: access helpers for on-disk inobt record freecount

The on-disk inobt record has two formats depending on whether sparse
inode support is enabled or not. If so, the freecount field is a single
byte and does not require byte-conversion. Otherwise, it is a 4-byte
field and does.

Create the inorec_[get|set]_freecount() helpers to abstract this detail
away from the core repair code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

metadump: support sparse inode records

xfs_metadump currently uses mp->m_ialloc_blks sized buffers to copy
inode chunks. If a filesystem supports sparse inodes, some clusters
within inode chunks can point to arbitrary data. If the buffer used to
read inodes includes these sparse clusters, inode read verification
fails and prints filesystem corruption warnings.

Update copy_inode_chunks() to support using a cluster sized buffer to
read a full inode chunk in multiple iterations if sparse inodes is
enabled. For each cluster read, check whether the first inode in the
cluster is sparse and skip the cluster if so. This is safe because
sparse records are allocated at cluster granularity.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: remove xfs_caddr_t

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: remove non-Linux defintions for loff_t

We don't use loff_t anywhere in xfsprogs, so no need to define it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: remove boolean_t typedef

Only one use of this exists, and it's treated like an int anyway by
both caller and callee.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: remove the uchar_t typedef

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

xfsprogs: remove the uint_t typedef

This was only used in a few IRIX platform helpers that can use __uint32_t
instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

metadump: reorder inode record sanity checks and inode buffer read

In preparation to support sparse inode records, refactor
copy_inode_chunk() to perform all record sanity checks before the cursor
is set to the inode chunk and the inode buffer is read.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: handle sparse inode alignment

Sparse inode support requires inode alignment to match inode chunk size.
xfs_repair currently expects inode alignment to match the default
cluster size or a scaled factor thereof.

Update sb_validate_ino_align() to consider the superblock valid if
sparse inode support is enabled and alignment matches the chunk size.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: do not prefetch holes in sparse inode chunks

The repair prefetch mechanism reads all inode chunks in advance of
repair processing to improve performance. Inode buffer verification and
processing can occur within the prefetch mechanism such as when
directories are being processed. Prefetch currently assumes fully
populated inode chunks which leads to corruption errors attempting to
verify inode buffers that do not contain inodes.

Update prefetch to check the previously scanned sparse inode bits and
skip inode buffer reads of clusters that are sparse. We check sparse
state per-inode cluster because the cluster size is the min. allowable
inode chunk hole granularity.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: reconstruct sparse inode records correctly on disk

Phase 5 traverses all of the in-core inode records and regenerates the
inode btrees a record at a time. The record insertion code doesn't
account for sparse inodes which means the ir_holemask and ir_count
fields are not set on-disk and ir_freecount is set with an invalid value
for sparse inode records.

Update build_ino_tree() to handle sparse inode records correctly. We
must account real, allocated inodes only into the ir_freecount field.
The 64-bit in-core sparse inode bitmask must be converted to compressed
16-bit ir_holemask format. Finally, the ir_count field must set to the
total (non-sparse) inode count of the record.

If the fs does not support sparse inodes, both the ir_holemask and
ir_count field are initialized to zero to preserve backwards
compatibility. These bytes historically landed in the high order bytes
of ir_freecount and must be 0 to be interpreted correctly by older XFS
implementations without sparse inode support.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: do not account sparse inodes in phase 5 cursor init.

The inode btrees are reconstructed in phase 5 of xfs_repair. The btree
cursor initialization counts the allocated and free inodes in the
in-core records and calculates the expected geometry of the resulting
btree. The free and total inode counts for each AG are also ultimately
aggregated to update the associated superblock counts.

Update init_ino_cursor() to not assume 64 inode records and not account
sparse inodes into the total or free inode count for each AG.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: factor out sparse inodes from finobt reconstruction

Phase 5 of xfs_repair recreates the on-disk btrees. The free inode btree
(finobt) contains inode records that contain one or more free inodes.
Sparse inodes are marked as free and therefore sparse inode records can
be incorrectly included in the finobt even when no real free inodes are
available in the record.

Update the finobt in-core record traversal helpers to factor out sparse
inodes and only consider inode records with allocated, free inodes for
finobt insertion.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: process sparse inode records correctly

The inode processing phases of xfs_repair (3 and 4) validate the actual
inodes referred to by the previously scanned inode btrees. The physical
inodes are read from disk and internally validated in various ways. The
inode block state is also verified and corrected if necessary.

Sparse inodes are not physically allocated and the associated blocks may
be allocated to any other area of the fs (file data, internal use,
etc.). Attempts to validate these blocks as inode blocks produce noisy
corruption errors.

Update the inode processing mechanism to handle sparse inode records
correctly. Since sparse inodes do not exist, the general approach here
is to simply skip validation of sparse inodes. Update
process_inode_chunk() to skip reads of sparse clusters and set the buf
pointer of associated clusters to NULL. Update the rest of the function
to only verify non-NULL cluster buffers. Also, skip the inode block
state checks for blocks in sparse inode clusters.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: validate ir_count field for sparse format records

Sparse format inobt records contain an additional count field that
records the number of physical inodes tracked by the record. Verify the
count is internally consistent according to the holemask, similar to how
freecount is validated against the free mask.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: scan sparse finobt records correctly

The finobt scan performs similar checks as to the inobt scan, including
internal record consistency checks, consistency with inobt records,
inode block state, etc. Various parts of this mechanism also assume
fully allocated inode records and thus lead to false errors with sparse
records.

Update the finobt scan to detect and handle sparse inode records
correctly. As for the inobt, do not assume that blocks associated with
sparse regions are allocated for inodes and do not account sparse inodes
against the freecount. Additionally, verify that sparse state is
consistent with the in-core record and set up any new in-core records
that might have been missing from the inobt correctly.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: scan and track sparse inode chunks correctly

Phase 2 of xfs_repair scans the on-disk inobt and creates in-core
records for all inodes in the fs. This also involves marking
free/allocated state of all inodes, internal record verification and
block state management for the inode chunks tracked by inode records.
Various parts of the inobt scan mechanism assume fully allocated inode
records and thus lead to spurious errors when sparse inode records are
encountered.

Update the inobt scan to detect and handle sparse inode records
correctly. Do not set the allocation state of blocks in sparse inode
regions as these blocks do not belong to the record. Do not account
sparse inodes against the ir_freecount as these inodes do not exist and
are not available for allocation by the fs. Finally, track the sparse
status of each individual inode in the in-core inode records for future
reference.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: use ir_count for filesystems with sparse inode support

Repair currently assumes each inobt record covers 64 inodes and uses
this value to validate inode counts in the AGI headers and superblock.
This is not always the case with sparse inode support.

Update scan_inobt() to check for sparse inode support and use the new
ir_count field for inode accounting. ir_count contains the total number
of inodes tracked by the record.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: remove duplicate field from aghdr_cnts

The agicount and icount fields are used in separate parts of the AG scan
but both fields track the same data. agicount is used to compare with
the AGI header and icount is used to calculate the total inode count to
compare with sb_icount.

Use agicount rather than icount in scan_ags() and remove the icount
field.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

repair: handle sparse format inobt record freecount correctly

The sparse inode chunk feature introduces a new inobt record format that
converts ir_freecount from 4 bytes to 1 byte. ir_freecount references
throughout repair currently assume the 'full' format and endian-convert
from the 32-bit value.

Update the xfs_repair inobt scan and tree rebuild codepaths to use the
correct record format for ir_freecount when sparse inodes is enabled.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

growfs: display sparse inode status from xfs_info

Check the sparse inode feature bit of the geometry flags and display
whether sparse inode chunks are supported by the fs.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

db: show sparse inodes feature state in version command output

The xfs_db version command prints a string for each of the various
features supported by a filesystem. Include 'SPARSE_INODES' in the
version string when sparse inode chunk allocation is supported by the
fs.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>