git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

libxfs: remove libxfs_trans_iget

libxfs_trans_iget no longer has a counterpart in the kernel. Remove it
and make the xfs_iget/xfs_trans_ijoin usage consistent throughout
xfsprogs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: refactor the open-coded libxfs_trans_bjoin calls

Refactor open-coded bjoin code to use libxfs_trans_bjoin.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: hoist bitmap out of scrub

Move the bitmap code to libfrog so that we can use bitmaps in
xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: pass ops through during scan

Pass the buffer ops through scan_sbtree so that we detect finobt blocks
properly and we don't have to keep switching on magic numbers for the
free space btrees.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: refactor buffer ops assignments during phase 5

Refactor the buffer ops assignments in phase 5 to use a helper function
to determine the correct buf_ops instead of open-coding them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix uninitialized variable warnings

Fix some uninitialized variable warnings because ASSERT disappears if
DEBUG is not defined.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: bump the irec on-disk nlink when adding lost+found

We increment the nlink of the root directory inode when creating a
"lost+found" directory during phase 6, but we don't update the irec copy
of the root dir nlink.  This normally gets papered over by phase 7, but
this can fail badly if:

1) The root directory had an entry to a busted subdirectory, so
   that root directory will have nlink == 3, but in the ino_tree,
   counted_nlinks == 2 and disk_nlinks == 3.

2) Phase 6 creates lost+found to root the files that were in the busted
   directory, we'll set nlink = 4 and counted_nlinks = 3.  The correct
   nlink is 3 ('.', '..', 'lost+found'), not 4.

3) During phase 7, we see that counted_nlinks == disk_nlinks and so we
   totally fail to correct the on-disk inode.

4) A subsequent run of xfs_repair complains about the nlink being 4
   instead of 3.

To fix this, we have to adjust the irec's disk_nlinks in step 2 so that
phase 7 seeds that counted_nlinks < disk_nlinks and resets nlink to
counted_nlinks.  This can be reproduced somewhat frequently by xfs/117.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: reinitialize the root directory nlink correctly

In mk_root_dir, we reinitialize the root directory inode with a link
count of 1. This differs from mkfs parseproto, which initializes the
root to have a link count of 2. The nlink discrepancy in repair is
caught and corrected during phase 7, but this is unnecessary since we
should set it properly in the first place.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: validate extent size hint parameters

Validate extent and cow extent size hints that are passed to mkfs so
that we avoid formatting a filesystem that will never mount.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: use DIFLAG macros for now to be consistent, fixed later]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: fix typo in unicrash header file

The no-op #definintion was missing 'fs_' so remained undefined
in the #else case.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: use TYP_FINOBT for finobt metadump

Use the correct xfs_db type for dumping free inode btree blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: check label for misleading characters

Make sure that we can retrieve the label and that it doesn't contain
anything potentially misleading.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: don't close mnt_fd when mnt_fd open fails

If we fail to open the mountpoint during phase 1 of scrub, don't bother
trying to close the file descriptor since it's silly to spray error
messages about that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: one read/verify pool per disk

Simplify the read/verify pool code further by creating one pool per
disk. This enables us to tailor the concurrency levels of each disk to
that specific disk so that if we have a mixed hdd/ssd environment we
don't flood the hdd with a lot of requests.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: don't expose internal pool state

In xfs_scrub, the read/verify pool tries to coalesce the media
verification requests into a smaller number of large IOs. There's no
need to force callers to keep track of this internal state, so just move
all that into read_verify.c.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: use datadev parallelization estimates for thread count

During phases 2-5, xfs_scrub should estimate the level of
parallelization possible on the data device to determine the number of
threads spawned to scrub filesystem metadata, not just blindly using the
number of CPUs. This avoids flooding non-rotational storage with random
reads, which totally destroys performance and makes scrub runtimes
higher.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: rename the global nr_threads

Various functions have nr_threads local variables that shadow the global
one. Since the global one forces the number of threads we use, change
its name to remove this ambiguity and reflect what it really does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub_all.timer: activate after most of the system is up

We really don't want the xfs_scrub_all timer triggering while the system
is booting up because not all the mounts will have finished, networking
might not be up for reporting, and slowing down bootup annoys people.
Therefore, delay the xfs_scrub_all service's activation until after the
system has started all the big pieces it's going to start.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub_all: walk the lsblk device/fs hierarchy correctly

Back when I was designing xfs_scrub_all, I naïvely assumed that the
emitted output would always list physical storage before the virtual
devices stacked atop it. However, this is not actually true when one
omits the "NAME" column, which is crucial to forcing the output (json or
otherwise) to capture the block device hierarchy. If the assumption is
violated, the program crashes with a python exception.

To fix this, force the hierarchal json output and restructure the
discovery routines to walk the json object that we receive, from the top
(physical devices) downwards to wherever there are live xfs filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: document fzero_f -k option in manpage

Perhaps the reason -k was broken is that nobody knew it existed?

Fixes: 938904c4 ("xfs_io: add fzero command for zeroing range via fallocate")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: don't walk off the end of argv in fzero_f

The fzero_f function doesn't check that there are enough non-switch
parameters to supply offset and length arguments to fallocate. As a
result, we can walk off the end of the argv array and crash. A
secondary problem is that we don't use getopt to detect the -k, which is
not how most xfs_io commands work.

Therefore, use getopt to detect the -k argument and rewire the offset
and length interpretation code to check optind and use argv correctly.
This bug is trivially reproduced by "xfs_io -c 'fzero -k 0' /some/file".

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: statx -r should print attributes_mask

We're dumping the raw structure, so we ought to dump everything,
including the attributes_mask field.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: actually check copy file range helper return values

We need to check the return value of copy_src_filesize and
copy_dst_truncate because either could return -1 due to fstat/ftruncate
failure.

Fixes: 628e112afdd98c5 ("xfs_io: implement 'copy_range' command")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

debian: enable parallel make

Use parallel make to speed up dpkg builds.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Nathan Scott <nathans@debian.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

debian: don't bypass top level Makefile when building subdirs

The top level Makefile does some processing to set build environment
variables (Q and CHECK_CMD). debian/rules uses -C to build subdirs
directly, which bypases this feature of the top-level makefile, which
causes more build spew than necessary (because Q never gets set to quiet
the build).

Since the top level makefile can be used to build the subdirs
debian/rules cares about, drop the -C and build subdirs via the top
level Makefile to quiet the build.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Nathan Scott <nathans@debian.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

debian: drop dangling libhandle.a symlinks in xfslibs-dev

We don't ship static libhandle libraries anymore, so make sure we drop
the symlink.

Fixes: ec1cf08dbeb2d ("Several updates to use more modern Debian packaging")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Nathan Scott <nathans@debian.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

configure.ac: fix alignment of features

Fix the alignment of the feature options in the --help screen.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

configure: use sys/xattr.h for fsetxattr detection

The only user of fsetxattr and HAVE_FSETXATTR is fsr, which includes
sys/xattr.h (from libc). However, the m4 macro to detect fsetxattr
support requires attr/xattr.h (from libattr). libattr dropped xattr.h
last year, so update the check.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: fix repair deadlock due to failed inode flushes.

If inode_item_done() fails to flush an inode after we've grabbed a
reference to the underlying buffer during a transaction commit, we
fail to put the buffer and hence leak it. We then deadlock on the
next lookup ofthe inode buffer as it is still locked and no-one owns
it.

To fix it, put the buffer on error so that it gets unlocked and
can be recovered appropriately in a later phase of repair.

Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Fixes: d15188a1ec14 ("xfs: rework the inline directory verifiers")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: stringify scrub types in ftrace output

Source kernel commit: 86d163dbfe2ac0b30fbb6e256301abbfa9e4549e

Use __print_symbolic to print the scrub type in ftrace output.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: stringify btree cursor types in ftrace output

Source kernel commit: c494213f30080423b70b24b6af7f6da554d9390f

Use __print_symbolic to print the btree type in ftrace output.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: move XFS_INODE_FORMAT_STR mappings to libxfs

Source kernel commit: 0357d21a6c9be2870904598b4767c7d424524849

Move XFS_INODE_FORMAT_STR to libxfs so that we don't forget to keep it
updated, and add necessary TRACE_DEFINE_ENUM.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: move XFS_AG_BTREE_CMP_FORMAT_STR mappings to libxfs

Source kernel commit: 05c753c4cf53f51a7e35fcfe684500113cf1fd13

Move XFS_AG_BTREE_CMP_FORMAT_STR to libxfs so that we don't forget to
keep it updated, and TRACE_DEFINE_ENUM the values while we're at it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix symbolic enum printing in ftrace output

Source kernel commit: 85f8dff00a3193fe5659aa4c91adde31723c0d3d

ftrace's __print_symbolic() has a (very poorly documented) requirement
that any enum values used in the symbol to string translation table be
wrapped in a TRACE_DEFINE_ENUM so that the enum value can be encoded in
the ftrace ring buffer. Fix this unsatisfied requirement.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: cache minimum realtime summary level

Source kernel commit: 355e3532132b487ebf6a4900fad8f3525fa3e137

The realtime summary is a two-dimensional array on disk, effectively:

u32 rsum[log2(number of realtime extents) + 1][number of blocks in the bitmap]

rsum[log][bbno] is the number of extents of size 2**log which start in
bitmap block bbno.

xfs_rtallocate_extent_near() uses xfs_rtany_summary() to check whether
rsum[log][bbno] != 0 for any log level. However, the summary array is
stored in row-major order (i.e., like an array in C), so all of these
entries are not adjacent, but rather spread across the entire summary
file. In the worst case (a full bitmap block), xfs_rtany_summary() has
to check every level.

This means that on a moderately-used realtime device, an allocation will
waste a lot of time finding, reading, and releasing buffers for the
realtime summary. In particular, one of our storage services (which runs
on servers with 8 very slow CPUs and 15 8 TB XFS realtime filesystems)
spends almost 5% of its CPU cycles in xfs_rtbuf_get() and
xfs_trans_brelse() called from xfs_rtany_summary().

One solution would be to also store the summary with the dimensions
swapped. However, this would require a disk format change to a very old
component of XFS.

Instead, we can cache the minimum size which contains any extents. We do
so lazily; rather than guaranteeing that the cache contains the precise
minimum, it always contains a loose lower bound which we tighten when we
read or update a summary block. This only uses a few kilobytes of memory
and is already serialized via the realtime bitmap and summary inode
locks, so the cost is minimal. With this change, the same workload only
spends 0.2% of its CPU cycles in the realtime allocator.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: precalculate cluster alignment in inodes and blocks

Source kernel commit: c1b4a321ede083521b91c314e1c4fa233ac33740

Store the inode cluster alignment information in units of inodes and
blocks in the mount data so that we don't have to keep recalculating
them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: precalculate inodes and blocks per inode cluster

Source kernel commit: 83dcdb4469e759f984db92616d7885fc14329841

Store the number of inodes and blocks per inode cluster in the mount
data so that we don't have to keep recalculating them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: add a block to inode count converter

Source kernel commit: 43004b2a8da2652b5ec526269a8acfba7d3d219c

Add new helpers to convert units of fs blocks into inodes, and AG blocks
into AG inodes, respectively. Convert all the open-coded conversions
and XFS_OFFBNO_TO_AGINO(, , 0) calls to use them, as appropriate. The
OFFBNO_TO_AGINO macro is retained for xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove xfs_rmap_ag_owner and friends

Source kernel commit: 7280fedaf3a0f9097c0621c7d5b35849954d7f54

Owner information for static fs metadata can be defined readonly at
build time because it never changes across filesystems. This enables us
to reduce stack usage (particularly in scrub) because we can use the
statically defined oinfo structures.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: const-ify xfs_owner_info arguments

Source kernel commit: 66e3237e724c6650dca03627b40bb00a812d3f7a

Only certain functions actually change the contents of an
xfs_owner_info; the rest can accept a const struct pointer. This will
enable us to save stack space by hoisting static owner info types to
be const global variables.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: streamline defer op type handling

Source kernel commit: 02b100fb83f9b0f8719deef6c4ed973b4d9ce00c

There's no need to bundle a pointer to the defer op type into the defer
op control structure. Instead, store the defer op type enum, which
enables us to shorten some of the lines.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: idiotproof defer op type configuration

Source kernel commit: bc9f2b7c8a732d896753709cc9d495780ba7e9f9

Recently, we forgot to port a new defer op type to xfsprogs, which
caused us some userspace pain. Reorganize the way we make libxfs
clients supply defer op type information so that all type information
has to be provided at build time instead of risky runtime dynamic
configuration.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: zero length symlinks are not valid

Source kernel commit: 43feeea88c9cb2955b9f7ba8152ec5abeea42810

A log recovery failure has been reproduced where a symlink inode has
a zero length in extent form. It was caused by a shutdown during a
combined fstress+fsmark workload.

The underlying problem is the issue in xfs_inactive_symlink(): the
inode is unlocked between the symlink inactivation/truncation and
the inode being freed. This opens a window for the inode to be
written to disk before it xfs_ifree() removes it from the unlinked
list, marks it free in the inobt and zeros the mode.

For shortform inodes, the fix is simple. xfs_ifree() clears the data
fork state, so there's no need to do it in xfs_inactive_symlink().
This means the shortform fork verifier will not see a zero length
data fork as it mirrors the inode size through to xfs_ifree()), and
hence if the inode gets written back and the fork verifiers are run
they will still see a fork that matches the on-disk inode size.

For extent form (remote) symlinks, it is a little more tricky. Here
we explicitly set the inode size to zero, so the above race can lead
to zero length symlinks on disk. Because the inode is unlinked at
this point (i.e. on the unlinked list) and unreferenced, it can
never be seen again by a user. Hence when we set the inode size to
zeor, also change the type to S_IFREG. xfs_ifree() expects S_IFREG
inodes to be of zero length, and so this avoids all the problems of
zero length symlinks ever hitting the disk. It also avoids the
problem of needing to handle zero length symlink inodes in log
recovery to replay the extent free intents and the remaining
deferops to free the extents the symlink used.

Also add a couple of asserts to warn us if zero length symlinks end
up in either the symlink create or inactivation paths.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: libxfs: move xfs_perag_put late

Source kernel commit: fe5ed6c22e94b131ed5608d66ebce1efc39a7edb

The function xfs_alloc_get_freelist calls xfs_perag_put to drop the
reference. However, pag->pagf_btreeblks is read and written after the
put operation. This patch moves the put operation later.

Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
[darrick: minor changelog edits]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.20.0

Update all the necessary files for a 4.20.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.20.0-rc1

Update all the necessary files for a 4.20.0-rc1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix incorrect return value in namecheck()

Obviously a directory entry with a '/' in the name should return
1, i.e. failure. This was just a dumb thinko.

Fixes: 45571fd5885d ("xfs_repair: allow '/' in attribute names")
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Remove barrier/nobarrier mount options from xfs.5

Remove the now-removed barrier/nobarrier mount options from
the xfs.5 manpage.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: add -d to short help for write command

And note in the man page that -c and -d are exclusive.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: remove generated scrub files under clean target

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: initialize non-leaf finobt blocks with correct magic

The free inode btree construction code in xfs_repair has a bug where
any non-leaf nodes outside of the leftmost block at the associated
level in the tree are incorrectly initialized with the inobt magic
value. Update the prop_ino_cursor() path responsible for growing the
non-leaf portion of the inode btrees to use the btnum of the
specific tree being generated rather than the hardcoded inode btree
type.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reported-by: Lucas Stach <l.stach@pengutronix.de>
Root-caused-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: fix finobt record decoding when sparse inodes enabled

Use the sparse inobt record field decoder (inobt_spcrc_hfld) to decode
finobt records when sparse inodes are enabled. Otherwise, xfs_db
prints out bogus things like:

recs[1] = [startino,freecount,free]
1:[214720,16429,0xfffffffffff80000]

There can never be 16429 records in an inode btree record; instead it
should print:

recs[1] = [startino,holemask,count,freecount,free]
1:[214720,0,64,45,0xfffffffffff80000]

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix off by one error when rebuilding high keys

Fix an off-by-one error when scanning a rmap btree block for high keys
as part of rebuilding rmap btrees during phase 5. This causes
xfs_repair to emit a corrupt filesystem, which is bad.

This can be reproduced pretty easily by exporting
TEST_XFS_REPAIR_REBUILD=1 and running generic/051 with a 1k block size.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_quota: fix false error reporting of project inheritance flag is not set

After kernel commit:

9336e3a7 "xfs: project id inheritance is a directory only flag"

xfs stopped setting the project inheritance flag on regular files, but
userspace quota code still checks for it and will now issue the error:

"project inheritance flag is not set"

for every regular file during quotacheck. Fix this by only checking
for the flag on directories.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1663502
Reported-by: Steven Gardner <sgardner@redhat.com>
Signed-off-by: Achilles Gaikwad <agaikwad@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: allow '/' in attribute names

For some reason, since the earliest days of XFS, a '/' character
in an extended attribute name has been treated as corruption by
xfs_repair. This despite nothing in other userspace tools or the
kernel having this restriction.

My best guess is that this was an unintentional leftover from
common code between dirs & attrs in the "da" code, and there has
never been a good reason for it.

Since userspace and kernelspace allow such a name to be set,
listed, and read, it seems wrong to flag it as corruption.
So, make this test conditional on whether we're validating a name
in a dir, as opposed to the name of an attr.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_metadump: Zap dev inodes

Signed-off-by: Stefan Ring <stefanrin@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_metadump: Zap unused space in inode btrees

Signed-off-by: Stefan Ring <stefanrin@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_metadump: Zap freeindex blocks in directory inodes

Signed-off-by: Stefan Ring <stefanrin@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_metadump: Zap multi fsb blocks

Using basically the same code as in process_single_fsb_objects.

Signed-off-by: Stefan Ring <stefanrin@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_metadump: Extend data zapping to XFS_DIR{2,3}_LEAFN_MAGIC blocks

Signed-off-by: Stefan Ring <stefanrin@gmail.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: teach the frag command about sparse inode chunks

This is the equivalent of:

ea8a48f xfs_check: process sparse inode chunks correctly

for the frag_f() command in xfs_db.

Without this, the xfs_db frag command shows corruption as it
wanders into blocks that are not inodes.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201823
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs.xfs: null-terminate symlinks created via protofile

Now that we have a symlink verifier which checks that in-memory
symlink names are null-terminated, be sure we do that when we
create them via the mkfs protofile.

We only want to null-terminate inline data if it's a symlink;
we only ever /call/ newfile() with "dolocal" for symlinks, so
rename that function argument for clarity.

Then, rather than open-coding all this, just call
xfs_init_local_fork which handles it properly.

Zorro found this by running xfs/019 on an s390x machine, it
failed with:

Metadata corruption detected at 0x101214a, inode 0x89 data fork

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reported-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: don't install xfs_scrub man pages if config'd off

Don't install man pages for xfs_scrub if it's turned off
in the config.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: copy_file_range length is a size_t

copy_file_range() takes a size_t as it's length, not a "long long".
Therefore we need to be able to pass sizes larger than 8EB to it
to be able to test the interface fully and that requires copy_range
to accept all values except an explicit error value of "-1LL".

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: allow open file permissions to be changed

I need to be able to open a file read-write, then change the
permissions on the file to read-only to check that copy_file_range
returns EPERM correctly in that case. This can't be done as root,
because root ignores file permissions, but as a normal user we can't
open a 0444 file for writing and so can't actually test writing to
a read-only file without some method of "open read-write, change
permissions to read-only, try to write to file through open
read-write file".

So, allow adding or removing write permissions on an open file.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[sandeen: Move man page entry to FILE section]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

io: open pipes in non-blocking mode

So that O_RDONLY open commands (such as from copy_range) do not
block forever waiting on a non-existent writer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Tulak <jtulak@redhat.com>
[sandeen: initialize st per djwong's suggestion]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: add missing string name for DBM_COWDATA

In db/check.c, typename[] is supposed to have strings for every DBM_
type, but we forgot one. Add it now.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: move everything to /usr/sbin

Recently, it was pointed out that xfs_scrub{,_all} depend on components
and libraries (libicu, python) that live in /usr. /sbin binaries
shouldn't depend on /usr, so let's move the scrub binaries to /usr/sbin.

Reported-by: xfs@tlinx.org
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: fix fractional reporting of single inodes

When there are fewer than 1024 inodes in the filesystem, scrub reports
fractional inodes in its final report:

35.2MiB data used; 5.0 inodes used.
34.2MiB data found; 5.0 inodes found.
5.0 inodes counted; 5.0 inodes checked.

Inodes are indivisible, so only report the fractional part when we have
a large enough number of inodes to perform a unit conversion:

35.2MiB data used; 5 inodes used.
34.2MiB data found; 5 inodes found.
5 inodes counted; 5 inodes checked.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: handle totally empty inode chunks

We try to load a single inobt record with each FSINUMBERS call. If the
chunk is totally empty (which can happen when there are more than one
inobt record per block) we should skip to the next INUMBERS call since
there are no inodes to bulkstat.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: skip block reservation when fixing freelist

AGFL blocks are considered to be part of the fdblocks count, so there's
no need to obtain a block reservation when fixing the AGFL as part of
repair. Asking for a reservation can cause repair to fail if the
superblock claims zero fdblocks because we haven't gotten far enough
into phase 5 to have reset the superblock counters.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: rebuild directory when non-root leafn blocks claim block 0

As explained in

71a6af8 Revert "xfs_repair: treat zero da btree pointers as corruption"

a single root LEAFN block can exist in a directory until it grows
further.

This is why, normally, we skip directories with a root marked
XFS_DIR2_LEAFN_MAGIC, as detected by the left-most leaf block being
found at file block 0.

However, if we traversed any level of a btree to get here (as
indicated by da_cursor.active > 0), then a leaf block claiming block
0 indicates corruption, and we should handle it as such, and rebuild
the directory.

This was found by repair repeatedly rebuilding a directory containing a
single leafn block (xfs/495).

Fixes: 67a79e2cc932 ("xfs_repair: treat zero da btree pointers as corruption")
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: clarify commit log, refer to revert commit]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io.8: rearrange command listings by section

Most of the commands listed under "OTHER COMMANDS" apply to files or
filesystems. Create two new sections for that and populate them
appropriately.

Here's what moves:

fsmap: moves from file io commands to filesystem commands
utimes: moves from file io commands to file commands

>From the OTHER COMMANDS section:

lsattr/chattr: moves to file commands
flink: moves to file commands
stat/statx: moves to file commands
lsproj/chproj: moves to file commands
parent: moves to file commands
[gs]et_encpolicy: moves to file io commands
freeze/thaw: move to filesystem commands
inject: move to filesystem commands
resblks: move to filesystem commands
shutdown: move to filesystem commands
statfs: move to filesystem commands
label: move to filesystem commands

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: drop notion of file io commands]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix inverted return from xfs_btree_sblock_verify_crc

Source kernel commit: 7d048df4e9b05ba89b74d062df59498aa81f3785

xfs_btree_sblock_verify_crc is a bool so should not be returning
a failaddr_t; worse, if xfs_log_check_lsn fails it returns
__this_address which looks like a boolean true (i.e. success)
to the caller.

(interestingly xfs_btree_lblock_verify_crc doesn't have the issue)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: delalloc -> unwritten COW fork allocation can go wrong

Source kernel commit: 9230a0b65b47fe6856c4468ec0175c4987e5bede

Long saga. There have been days spent following this through dead end
after dead end in multi-GB event traces. This morning, after writing
a trace-cmd wrapper that enabled me to be more selective about XFS
trace points, I discovered that I could get just enough essential
tracepoints enabled that there was a 50:50 chance the fsx config
would fail at ~115k ops. If it didn't fail at op 115547, I stopped
fsx at op 115548 anyway.

That gave me two traces - one where the problem manifested, and one
where it didn't. After refining the traces to have the necessary
information, I found that in the failing case there was a real
extent in the COW fork compared to an unwritten extent in the
working case.

Walking back through the two traces to the point where the CWO fork
extents actually diverged, I found that the bad case had an extra
unwritten extent in it. This is likely because the bug it led me to
had triggered multiple times in those 115k ops, leaving stray
COW extents around. What I saw was a COW delalloc conversion to an
unwritten extent (as they should always be through
xfs_iomap_write_allocate()) resulted in a /written extent/:

xfs_writepage:        dev 259:0 ino 0x83 pgoff 0x17000 size 0x79a00 offset 0 length 0
xfs_iext_remove:      dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/2 offset 32 block 152 count 20 flag 1 caller xfs_bmap_add_extent_delay_real
xfs_bmap_pre_update:  dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 4503599627239429 count 31 flag 0 caller xfs_bmap_add_extent_delay_real
xfs_bmap_post_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 121 count 51 flag 0 caller xfs_bmap_add_ex

Basically, Cow fork before:

0 1            32          52
+H+DDDDDDDDDDDD+UUUUUUUUUUU+
   PREV RIGHT

COW delalloc conversion allocates:

  1        32
  +uuuuuuuuuuuu+
  NEW

And the result according to the xfs_bmap_post_update trace was:

0 1            32          52
+H+wwwwwwwwwwwwwwwwwwwwwwww+
   PREV

Which is clearly wrong - it should be a merged unwritten extent,
not an unwritten extent.

That lead me to look at the LEFT_FILLING|RIGHT_FILLING|RIGHT_CONTIG
case in xfs_bmap_add_extent_delay_real(), and sure enough, there's
the bug.

It takes the old delalloc extent (PREV) and adds the length of the
RIGHT extent to it, takes the start block from NEW, removes the
RIGHT extent and then updates PREV with the new extent.

What it fails to do is update PREV.br_state. For delalloc, this is
always XFS_EXT_NORM, while in this case we are converting the
delayed allocation to unwritten, so it needs to be updated to
XFS_EXT_UNWRITTEN. This LF|RF|RC case does not do this, and so
the resultant extent is always written.

And that's the bug I've been chasing for a week - a bmap btree bug,
not a reflink/dedupe/copy_file_range bug, but a BMBT bug introduced
with the recent in core extent tree scalability enhancements.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: finobt AG reserves don't consider last AG can be a runt

Source kernel commit: c08768977b9a65cab9bcfd1ba30ffb686b2b7c69

The last AG may be very small comapred to all other AGs, and hence
AG reservations based on the superblock AG size may actually consume
more space than the AG actually has. This results on assert failures
like:

XFS: Assertion failed: xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <= pag->pagf_freeblks + pag->pagf_flcount, file: fs/xfs/libxfs/xfs_ag_resv.c, line: 319
[   48.932891]  xfs_ag_resv_init+0x1bd/0x1d0
[   48.933853]  xfs_fs_reserve_ag_blocks+0x37/0xb0
[   48.934939]  xfs_mountfs+0x5b3/0x920
[   48.935804]  xfs_fs_fill_super+0x462/0x640
[   48.936784]  ? xfs_test_remount_options+0x60/0x60
[   48.937908]  mount_bdev+0x178/0x1b0
[   48.938751]  mount_fs+0x36/0x170
[   48.939533]  vfs_kern_mount.part.43+0x54/0x130
[   48.940596]  do_mount+0x20e/0xcb0
[   48.941396]  ? memdup_user+0x3e/0x70
[   48.942249]  ksys_mount+0xba/0xd0
[   48.943046]  __x64_sys_mount+0x21/0x30
[   48.943953]  do_syscall_64+0x54/0x170
[   48.944835]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Hence we need to ensure the finobt per-ag space reservations take
into account the size of the last AG rather than treat it like all
the other full size AGs.

Note that both refcountbt and rmapbt already take the size of the AG
into account via reading the AGF length directly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix overflow in xfs_attr3_leaf_verify

Source kernel commit: 837514f7a4ca4aca06aec5caa5ff56d33ef06976

generic/070 on 64k block size filesystems is failing with a verifier
corruption on writeback or an attribute leaf block:

[   94.973083] XFS (pmem0): Metadata corruption detected at xfs_attr3_leaf_verify+0x246/0x260, xfs_attr3_leaf block 0x811480
[   94.975623] XFS (pmem0): Unmount and run xfs_repair
[   94.976720] XFS (pmem0): First 128 bytes of corrupted metadata buffer:
[   94.978270] 000000004b2e7b45: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00  ........;.......
[   94.980268] 000000006b1db90b: 00 00 00 00 00 81 14 80 00 00 00 00 00 00 00 00  ................
[   94.982251] 00000000433f2407: 22 7b 5c 82 2d 5c 47 4c bb 31 1c 37 fa a9 ce d6  "{\.-\GL.1.7....
[   94.984157] 0000000010dc7dfb: 00 00 00 00 00 81 04 8a 00 0a 18 e8 dd 94 01 00  ................
[   94.986215] 00000000d5a19229: 00 a0 dc f4 fe 98 01 68 f0 d8 07 e0 00 00 00 00  .......h........
[   94.988171] 00000000521df36c: 0c 2d 32 e2 fe 20 01 00 0c 2d 58 65 fe 0c 01 00  .-2.. ...-Xe....
[   94.990162] 000000008477ae06: 0c 2d 5b 66 fe 8c 01 00 0c 2d 71 35 fe 7c 01 00  .-[f.....-q5.|..
[   94.992139] 00000000a4a6bca6: 0c 2d 72 37 fc d4 01 00 0c 2d d8 b8 f0 90 01 00  .-r7.....-......
[   94.994789] XFS (pmem0): xfs_do_force_shutdown(0x8) called from line 1453 of file fs/xfs/xfs_buf.c. Return address = ffffffff815365f3

This is failing this check:

end = ichdr.freemap[i].base + ichdr.freemap[i].size;
if (end < ichdr.freemap[i].base)
>>>>>                   return __this_address;
if (end > mp->m_attr_geo->blksize)
return __this_address;

And from the buffer output above, the freemap array is:

freemap[0].base = 0x00a0
freemap[0].size = 0xdcf4 end = 0xdd94
freemap[1].base = 0xfe98
freemap[1].size = 0x0168 end = 0x10000
freemap[2].base = 0xf0d8
freemap[2].size = 0x07e0 end = 0xf8b8

These all look valid - the block size is 0x10000 and so from the
last check in the above verifier fragment we know that the end
of freemap[1] is valid. The problem is that end is declared as:

uint16_t end;

And (uint16_t)0x10000 = 0. So we have a verifier bug here, not a
corruption. Fix the verifier to use uint32_t types for the check and
hence avoid the overflow.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=201577
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: Add attibute remove and helper functions

Source kernel commit: 068f985a9e5ec70fde58d8f679994fdbbd093a36

This patch adds xfs_attr_remove_args. These sub-routines remove
the attributes specified in @args. We will use this later for setting
parent pointers as a deferred attribute operation.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: Add attibute set and helper functions

Source kernel commit: 2f3cd8091963810d85e6a5dd6ed1247e10e9e6f2

This patch adds xfs_attr_set_args and xfs_bmap_set_attrforkoff.
These sub-routines set the attributes specified in @args.
We will use this later for setting parent pointers as a deferred
attribute operation.

[dgc: remove attr fork init code from xfs_attr_set_args().]
[dgc: xfs_attr_try_sf_addname() NULLs args.trans after commit.]
[dgc: correct sf add error handling.]

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: Add helper function xfs_attr_try_sf_addname

Source kernel commit: 4c74a56b9de76bb6b581274b76b52535ad77c2a7

This patch adds a subroutine xfs_attr_try_sf_addname
used by xfs_attr_set. This subrotine will attempt to
add the attribute name specified in args in shortform,
as well and perform error handling previously done in
xfs_attr_set.

This patch helps to pre-simplify xfs_attr_set for reviewing
purposes and reduce indentation. New function will be added
in the next patch.

[dgc: moved commit to helper function, too.]

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h

Source kernel commit: e2421f0b5ff3ce279573036f5cfcb0ce28b422a9

This patch moves fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
since xfs_attr.c is in libxfs. We will need these later in
xfsprogs.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove suport for filesystems without unwritten extent flag

Source kernel commit: daa79baefc47293c753fed191d722f7ef605a303

The option to enable unwritten extents was made default in 2003,
removed from mkfs in 2007, and cannot be disabled in v5. We also
rely on it for a lot of common functionality, so filesystems without
it will run a completely untested and buggy code path. Enabling the
support also is a simple bit flip using xfs_db, so legacy file
systems can still be brought forward.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.19.0

Update all the necessary files for a 4.19.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

make: don't spray static check failures all over the subdir  build

Debian package building is special -- it directly calls make -C libxfs
when building the debian-installer packages.  This means that any
variables we define in the top level Makefile don't get passed down to
subdir make processes.

This means that the new static checker support effectively runs the
first argument in $(CFLAGS) as a command, which is surprising.  Fix up
buildrules to patch out CHECK_CMD if nobody's defined it, so that direct
subdir make works again.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v4.19.0-rc1

Update all the necessary files for a 4.19.0-rc1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: don't dirty inodes unconditionally when testing unlinked state

I noticed phase 4 writing back lots of inode buffers during recent
testing. The recent rework of clear_inode() in commit 0724d0f4cb53
("xfs_repair: clear_dinode should simply clear, not check contents")
accidentally caught a call to clear_inode_unlinked() as well,
resulting in all inodes being marked dirty whether then needed
updating or not.

Fix it by making clear_inode_unlinked unconditionally do the clear
(as was done for clear_inode), and move the test to the caller.
Add warnings as well so that this corruption is no longer silently
fixed.

Fixes: 0724d0f ("xfs_repair: clear_dinode should simply clear, ...")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[sandeen: rework so clear_inode_unlinked is unconditional]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

Revert "xfs_repair: treat zero da btree pointers as corruption"

This reverts commit 67a79e2cc9320aaf269cd00e9c8d16892931886d.

A root LEAFN block can exist in a directory. When we convert from
leaf format (LEAF1 - internal free list) to node format (LEAFN -
external free list) the only change to the single root leaf block is
that it's magic number is changed from LEAF1 to LEAFN.

We don't actually end up with DA nodes in the tree until the LEAFN
node is split, and that requires a couple more dirents to be added
to the directory to fill the LEAFN block up completely. Then it will
split and create a DA node root block pointing to multiple LEAFN
leaf blocks.

Hence restore the old behaviour where we skip the DA node tree
rebuild if there is a LEAFN root block found as there is no tree to
rebuild.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: add crc32 self test

Add a self test for crc32c into xfs_io so that xfstests can check the
operation of the (potentially cross-compiled) package binaries by
isolating the self test code to a header file that can be included by
the build system self test and xfs_io.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: add missing agfl free deferred op type

When we added a new defer op type for agfl block freeing to the kernel,
we forgot to add that type to the userspace side of things. This will
cause spontaneous combustion in xfs_repair if we try to rebuild
directories.

Fixes: d5c1b462232 ("xfs: defer agfl block frees when dfops is available")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: print freecount in xfs_inobt_rec as unsigned

"freecount" in the xfs_inobt_rec is unsigned, so xfs_db should
print it as such.

Not doing so tickles a bug in getbitval() where we try to handle
sign extension for signed fields and fail badly on big endian
machines, causing us to incorrectly report negative numbers when
printing structures even when the number is nowhere near the
signed maximum value.

So this fix works around that bug by properly marking this field
as unsigned, because I have yet to convince myself of the proper
fix for the underlying bug.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201453
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: kick processing thread if ra_count is at limit

Zorro hit an xfs_repair hang on a 500T filesystem where
all the prefetch threads were sleeping and nothing progressed.

The problem is that if every buffer we tried to read ahead in
phase6 was already up to date, pf_start_io_workers has no effect;
there is no io to do, and the sem_wait in pf_queuing_worker waits
forever.

Kick the processing thread to avoid this situation.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201173
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: continue after xfs_bunmapi deadlock avoidance

xfs_bunmapi can legitimately return before all work is done, to
avoid deadlocks across AGs.

Sadly nobody told xfs_repair, so it fires an assert if this happens:

phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed.

Fix this by calling back in until all work is done, as we do
in the kernel.

Fixes: 5a8bcc ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1641116
Reported-by: Tomasz Torcz <tomek@pipebreaker.pl>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: initialize realloced bplist in longform_dir2_entry_check

If we need to realloc the bplist[] array holding buffers for a given
directory, we don't initialize the new slots. This causes a problem
if the directory has holes, because those slots never get filled in.

At the end of the function we call libxfs_putbuf for every non-null
slot, and any uninitialized slots are segfault landmines.

Make sure we initialize all new slots to NULL for this reason.

Reported-by: Oleg Davydov <burunduk3@gmail.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: lack of kernel support is not a service failure

Don't treat a lack of kernel support for scrubbing as an automated
service failure because we have not actually determined that there's
anything wrong with the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

misc: only build with lto if explicitly enabled

Change the LTO default to off from probe because it wastes build time on
developer machines. Anyone who really wants it for release builds or
whatever can still turn it on.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: remove retpoline support

When it came up that xfsprogs was using retpolines by default,
the gcc folks inside Red Hat expressed ... alarm. I'm not sure
of all the details, but I think the concern was that userspace
support for this is not really quite baked.

Unless/until there is a demonstrated side-channel which would
warrant retpolines here, let's just remove it for now.

Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: enable sparse checking with make C=2

Enable "make C=2" sparse checking of all files without rebuilding them.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: enable sparse checking with make C=1

Enable "make C=1" sparse checking when files get rebuilt. To check
all files, run "make clean" first.

This is a bit simpler than redefining CC for the whole build, which
requires extra commandline definitions and apparently is enough of a
barrier that nobody's doing sparse checking.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: change project entity variable scope to local/static.

The project quota code used a global variable "p" for getprent() and
getprpathent(), presumably to keep the interface analogous to getpwent()
etc. However, other functions had their own local "p" which led to shadow
variable warnings from sparse.

Rather than a global, make it a static variable within the project
functions. Same behavior, same interface, less confusion, and retains an
interface similar that of getpwent etc.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_metadump: remove shadow variable

"length" is used in 2 nested inner scopes, which is fairly unclear and
generates a sparse warning. Rename the inner scope variable for clarity.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>