git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

xfsprogs: Release v5.8.0-rc0

Update all the necessary files for a 5.8.0-rc0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: more lockdep whackamole with kmem_alloc*

Source kernel commit: 6dcde60efd946e38fac8d276a6ca47492103e856

Dave Airlie reported the following lockdep complaint:

>  ======================================================
>  WARNING: possible circular locking dependency detected
>  5.7.0-0.rc5.20200515git1ae7efb38854.1.fc33.x86_64 #1 Not tainted
>  ------------------------------------------------------
>  kswapd0/159 is trying to acquire lock:
>  ffff9b38d01a4470 (&xfs_nondir_ilock_class){++++}-{3:3},
>  at: xfs_ilock+0xde/0x2c0 [xfs]
>
>  but task is already holding lock:
>  ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at:
>  __fs_reclaim_acquire+0x5/0x30
>
>  which lock already depends on the new lock.
>
>
>  the existing dependency chain (in reverse order) is:
>
>  -> #1 (fs_reclaim){+.+.}-{0:0}:
>         fs_reclaim_acquire+0x34/0x40
>         __kmalloc+0x4f/0x270
>         kmem_alloc+0x93/0x1d0 [xfs]
>         kmem_alloc_large+0x4c/0x130 [xfs]
>         xfs_attr_copy_value+0x74/0xa0 [xfs]
>         xfs_attr_get+0x9d/0xc0 [xfs]
>         xfs_get_acl+0xb6/0x200 [xfs]
>         get_acl+0x81/0x160
>         posix_acl_xattr_get+0x3f/0xd0
>         vfs_getxattr+0x148/0x170
>         getxattr+0xa7/0x240
>         path_getxattr+0x52/0x80
>         do_syscall_64+0x5c/0xa0
>         entry_SYSCALL_64_after_hwframe+0x49/0xb3
>
>  -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}:
>         __lock_acquire+0x1257/0x20d0
>         lock_acquire+0xb0/0x310
>         down_write_nested+0x49/0x120
>         xfs_ilock+0xde/0x2c0 [xfs]
>         xfs_reclaim_inode+0x3f/0x400 [xfs]
>         xfs_reclaim_inodes_ag+0x20b/0x410 [xfs]
>         xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
>         super_cache_scan+0x190/0x1e0
>         do_shrink_slab+0x184/0x420
>         shrink_slab+0x182/0x290
>         shrink_node+0x174/0x680
>         balance_pgdat+0x2d0/0x5f0
>         kswapd+0x21f/0x510
>         kthread+0x131/0x150
>         ret_from_fork+0x3a/0x50
>
>  other info that might help us debug this:
>
>   Possible unsafe locking scenario:
>
>         CPU0                    CPU1
>         ----                    ----
>    lock(fs_reclaim);
>                                 lock(&xfs_nondir_ilock_class);
>                                 lock(fs_reclaim);
>    lock(&xfs_nondir_ilock_class);
>
>   *** DEADLOCK ***
>
>  4 locks held by kswapd0/159:
>   #0: ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at:
>  __fs_reclaim_acquire+0x5/0x30
>   #1: ffffffffbbb7cef8 (shrinker_rwsem){++++}-{3:3}, at:
>  shrink_slab+0x115/0x290
>   #2: ffff9b39f07a50e8
>  (&type->s_umount_key#56){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0
>   #3: ffff9b39f077f258
>  (&pag->pag_ici_reclaim_lock){+.+.}-{3:3}, at:
>  xfs_reclaim_inodes_ag+0x82/0x410 [xfs]

This is a known false positive because inodes cannot simultaneously be
getting reclaimed and the target of a getxattr operation, but lockdep
doesn't know that.  We can (selectively) shut up lockdep until either
it gets smarter or we change inode reclaim not to require the ILOCK by
applying a stupid GFP_NOLOCKDEP bandaid.

Reported-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Tested-by: Dave Airlie <airlied@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: force writes to delalloc regions to unwritten

Source kernel commit: a5949d3faedf492fa7863b914da408047ab46eb0

When writing to a delalloc region in the data fork, commit the new
allocations (of the da reservation) as unwritten so that the mappings
are only marked written once writeback completes successfully. This
fixes the problem of stale data exposure if the system goes down during
targeted writeback of a specific region of a file, as tested by
generic/042.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: always return -ENOSPC on project quota reservation failure

Source kernel commit: dcf1ccc99e6db06a3a3cc9f72161f7d084a38d40

XFS project quota treats project hierarchies as "mini filesysems" and
so rather than -EDQUOT, the intent is to return -ENOSPC when a quota
reservation fails, but this behavior is not consistent.

The only place we make a decision between -EDQUOT and -ENOSPC
returns based on quota type is in xfs_trans_dqresv().

This behavior is currently controlled by whether or not the
XFS_QMOPT_ENOSPC flag gets passed into the quota reservation. However,
its use is not consistent; paths such as xfs_create() and xfs_symlink()
don't set the flag, so a reservation failure will return -EDQUOT for
project quota reservation failures rather than -ENOSPC for these sorts
of operations, even for project quota:

# mkdir mnt/project
# xfs_quota -x -c "project -s -p mnt/project 42" mnt
# xfs_quota -x -c 'limit -p isoft=2 ihard=3 42' mnt
# touch mnt/project/file{1,2,3}
touch: cannot touch ‘mnt/project/file3’: Disk quota exceeded

We can make this consistent by not requiring the flag to be set at the
top of the callchain; instead we can simply test whether we are
reserving a project quota with XFS_QM_ISPDQ in xfs_trans_dqresv and if
so, return -ENOSPC for that failure. This removes the need for the
XFS_QMOPT_ENOSPC altogether and simplifies the code a fair bit.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: cleanup xfs_idestroy_fork

Source kernel commit: ef8385128d4b31a382d496b1c433697993bd0bfb

Move freeing the dynamically allocated attr and COW fork, as well
as zeroing the pointers where actually needed into the callers, and
just pass the xfs_ifork structure to xfs_idestroy_fork. Also simplify
the kmem_free calls by not checking for NULL first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: move the fork format fields into struct xfs_ifork

Source kernel commit: f7e67b20ecbbcb9180c888a5c4fde267935e075f

Both the data and attr fork have a format that is stored in the legacy
idinode. Move it into the xfs_ifork structure instead, where it uses
up padding.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: move the per-fork nextents fields into struct xfs_ifork

Source kernel commit: daf83964a3681cf1f1f255ad6095c0b60cba7dca

There are there are three extents counters per inode, one for each of
the forks.  Two are in the legacy icdinode and one is directly in
struct xfs_inode.  Switch to a single counter in the xfs_ifork structure
where it uses up padding at the end of the structure.  This simplifies
various bits of code that just wants the number of extents counter and
can now directly dereference it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove the XFS_DFORK_Q macro

Source kernel commit: 09c38edd54c16657093a73a3169342f9f9080bb3

Just checking di_forkoff directly is a little easier to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove the NULL fork handling in xfs_bmapi_read

Source kernel commit: 4b516ff4e772993a99fc9bf36503d23ce5bd5ba9

Now that we fully verify the inode forks before they are added to the
inode cache, the crash reported in

https://bugzilla.kernel.org/show_bug.cgi?id=204031

can't happen anymore, as we'll never let an inode that has inconsistent
nextents counts vs the presence of an in-core attr fork leak into the
inactivate code path. So remove the work around to try to handle the
case, and just return an error and warn if the fork is not present.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove the special COW fork handling in xfs_bmapi_read

Source kernel commit: 1a1c57b2826f8b408feb733d3321490591a6e4c9

We don't call xfs_bmapi_read for the COW fork anymore, so remove the
special casing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: improve local fork verification

Source kernel commit: 0f45a1b20cd8f9cfc985a1f91a1e7a86e5e14dd6

Call the data/attr local fork verifiers as soon as we are ready for them.
This keeps them close to the code setting up the forks, and avoids a
few branches later on. Also open code xfs_inode_verify_forks in the
only remaining caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: refactor xfs_inode_verify_forks

Source kernel commit: 7c7ba2186305d6bee5eb5b8fb95a61d8de14de4f

The split between xfs_inode_verify_forks and the two helpers
implementing the actual functionality is a little strange. Reshuffle
it so that xfs_inode_verify_forks verifies if the data and attr forks
are actually in local format and only call the low-level helpers if
that is the case. Handle the actual error reporting in the low-level
handlers to streamline the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove xfs_ifork_ops

Source kernel commit: 1934c8bd81bee4c239478b03a59addf5fe8e2785

xfs_ifork_ops add up to two indirect calls per inode read and flush,
despite just having a single instance in the kernel. In xfsprogs
phase6 in xfs_repair overrides the verify_dir method to deal with inodes
that do not have a valid parent, but that can be fixed pretty easily
by ensuring they always have a valid looking parent.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove xfs_iread

Source kernel commit: bb8a66af4fff1cecb7631c68af761ea8e1a41ac2

There is not much point in the xfs_iread function, as it has a single
caller and not a whole lot of code. Move it into the only caller,
and trim down the overdocumentation to just documenting the important
"why" instead of a lot of redundant "what".

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: don't reset i_delayed_blks in xfs_iread

Source kernel commit: 7f0290123506e2b248fe06fa7cdc17c1b5b603b5

i_delayed_blks is set to 0 in xfs_inode_alloc and can't have anything
assigned to it until the inode is visible to the VFS.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: call xfs_dinode_verify from xfs_inode_from_disk

Source kernel commit: 2d6051d4965308c3367bf5a2468dff969872a96e

Keep the code dealing with the dinode together, and also ensure we verify
the dinode in the owner change log recovery case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: handle unallocated inodes in xfs_inode_from_disk

Source kernel commit: 0bce8173fdcf203c92a4d57dc7d3bb642ed478a1

Handle inodes with a 0 di_mode in xfs_inode_from_disk, instead of partially
duplicating inode reading in xfs_iread.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: split xfs_iformat_fork

Source kernel commit: 9229d18e801bdbdf79d963d8c944980fc77b5d6b

xfs_iformat_fork is a weird catchall. Split it into one helper for
the data fork and one for the attr fork, and then call both helper
as well as the COW fork initialization from xfs_inode_from_disk. Order
the COW fork initialization after the attr fork initialization given
that it can't fail to simplify the error handling.

Note that the newly split helpers are moved down the file in
xfs_inode_fork.c to avoid the need for forward declarations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: call xfs_iformat_fork from xfs_inode_from_disk

Source kernel commit: cb7d58594412fff106cde550dd9e0a7999cc2a0c

We always need to fill out the fork structures when reading the inode,
so call xfs_iformat_fork from the tail of xfs_inode_from_disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: xfs_bmapi_read doesn't take a fork id as the last argument

Source kernel commit: b90c2a9c8b4422bb9398b50fe3d6163e46dcddec

The last argument to xfs_bmapi_raad contains XFS_BMAPI_* flags, not the
fork. Given that XFS_DATA_FORK evaluates to 0 no real harm is done,
but let's fix this anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix the warning message in xfs_validate_sb_common()

Source kernel commit: 14506f7a91d8f4d13fc07126ac8d14c6519f00e3

Fix this error message to complain about project and group quota flag
bits instead of "PUOTA" and "QUOTA".

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: use ordered buffers to initialize dquot buffers during quotacheck

Source kernel commit: 78bba5c812cc651cee51b64b786be926ab7fe2a9

While QAing the new xfs_repair quotacheck code, I uncovered a quota
corruption bug resulting from a bad interaction between dquot buffer
initialization and quotacheck.  The bug can be reproduced with the
following sequence:

# mkfs.xfs -f /dev/sdf
# mount /dev/sdf /opt -o usrquota
# su nobody -s /bin/bash -c 'touch /opt/barf'
# sync
# xfs_quota -x -c 'report -ahi' /opt
User quota on /opt (/dev/sdf)
Inodes
User ID      Used   Soft   Hard Warn/Grace
---------- ---------------------------------
root            3      0      0  00 [------]
nobody          1      0      0  00 [------]

# xfs_io -x -c 'shutdown' /opt
# umount /opt
# mount /dev/sdf /opt -o usrquota
# touch /opt/man2
# xfs_quota -x -c 'report -ahi' /opt
User quota on /opt (/dev/sdf)
Inodes
User ID      Used   Soft   Hard Warn/Grace
---------- ---------------------------------
root            1      0      0  00 [------]
nobody          1      0      0  00 [------]

# umount /opt

Notice how the initial quotacheck set the root dquot icount to 3
(rootino, rbmino, rsumino), but after shutdown -> remount -> recovery,
xfs_quota reports that the root dquot has only 1 icount.  We haven't
deleted anything from the filesystem, which means that quota is now
under-counting.  This behavior is not limited to icount or the root
dquot, but this is the shortest reproducer.

I traced the cause of this discrepancy to the way that we handle ondisk
dquot updates during quotacheck vs. regular fs activity.  Normally, when
we allocate a disk block for a dquot, we log the buffer as a regular
(dquot) buffer.  Subsequent updates to the dquots backed by that block
are done via separate dquot log item updates, which means that they
depend on the logged buffer update being written to disk before the
dquot items.  Because individual dquots have their own LSN fields, that
initial dquot buffer must always be recovered.

However, the story changes for quotacheck, which can cause dquot block
allocations but persists the final dquot counter values via a delwri
list.  Because recovery doesn't gate dquot buffer replay on an LSN, this
means that the initial dquot buffer can be replayed over the (newer)
contents that were delwritten at the end of quotacheck.  In effect, this
re-initializes the dquot counters after they've been updated.  If the
log does not contain any other dquot items to recover, the obsolete
dquot contents will not be corrected by log recovery.

Because quotacheck uses a transaction to log the setting of the CHKD
flags in the superblock, we skip quotacheck during the second mount
call, which allows the incorrect icount to remain.

Fix this by changing the ondisk dquot initialization function to use
ordered buffers to write out fresh dquot blocks if it detects that we're
running quotacheck.  If the system goes down before quotacheck can
complete, the CHKD flags will not be set in the superblock and the next
mount will run quotacheck again, which can fix uninitialized dquot
buffers.  This requires amending the defer code to maintaine ordered
buffer state across defer rolls for the sake of the dquot allocation
code.

For regular operations we preserve the current behavior since the dquot
items require properly initialized ondisk dquot records.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: don't fail verifier on empty attr3 leaf block

Source kernel commit: f28cef9e4daca11337cb9f144cdebedaab69d78c

The attr fork can transition from shortform to leaf format while
empty if the first xattr doesn't fit in shortform. While this empty
leaf block state is intended to be transient, it is technically not
due to the transactional implementation of the xattr set operation.

We historically have a couple of bandaids to work around this
problem. The first is to hold the buffer after the format conversion
to prevent premature writeback of the empty leaf buffer and the
second is to bypass the xattr count check in the verifier during
recovery. The latter assumes that the xattr set is also in the log
and will be recovered into the buffer soon after the empty leaf
buffer is reconstructed. This is not guaranteed, however.

If the filesystem crashes after the format conversion but before the
xattr set that induced it, only the format conversion may exist in
the log. When recovered, this creates a latent corrupted state on
the inode as any subsequent attempts to read the buffer fail due to
verifier failure. This includes further attempts to set xattrs on
the inode or attempts to destroy the attr fork, which prevents the
inode from ever being removed from the unlinked list.

To avoid this condition, accept that an empty attr leaf block is a
valid state and remove the count check from the verifier. This means
that on rare occasions an attr fork might exist in an unexpected
state, but is otherwise consistent and functional. Note that we
retain the logic to avoid racing with metadata writeback to reduce
the window where this can occur.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: Use the correct style for SPDX License Identifier

Source kernel commit: 508578f2f5601816ea29bec5cda00ea7d95a856d

This patch corrects the SPDX License Identifier style in header files
related to XFS File System support. For C header files
Documentation/process/license-rules.rst mandates C-like comments.
(opposed to C source files where C++ style should be used).

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: Replace zero-length array with flexible-array

Source kernel commit: ee4064e56cd81cd3126805159122f53cf4f12ae6

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: move log recovery buffer cancellation code to xfs_buf_item_recover.c

Source kernel commit: 17d29bf271ea48b253c93969a590a11a51c19c1f

Move the helpers that handle incore buffer cancellation records to
xfs_buf_item_recover.c since they're not directly related to the main
log recovery machinery. No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: refactor releasing finished intents during log recovery

Source kernel commit: 154c733a33d9cdaabec42ae76ca1189044d0447e

Replace the open-coded AIL item walking with a proper helper when we're
trying to release an intent item that has been finished. We add a new
->iop_match method to decide if an intent item matches a supplied ID.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: refactor log recovery buffer item dispatch for pass2 commit functions

Source kernel commit: 1094d3f12363474b2a3d1a6c06124bec25dd1555

Move the log buffer item pass2 commit code into the per-item source code
files and use the dispatch function to call it. We do these one at a
time because there's a lot of code to move. No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: refactor log recovery item dispatch for pass1 commit functions

Source kernel commit: 3304a4fabd099820df99de1acac345dd6fe16d1d

Move the pass1 commit code into the per-item source code files and use
the dispatch function to call them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: refactor log recovery item dispatch for pass2 readhead functions

Source kernel commit: 8ea5682d07115b422e923bb4f55fe081964f484a

Move the pass2 readhead code into the per-item source code files and use
the dispatch function to call them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: refactor log recovery item sorting into a generic dispatch structure

Source kernel commit: 86ffa471d9ce6ac3fda66f704c3143c3d55181f5

Create a generic dispatch structure to delegate recovery of different
log item types into various code modules. This will enable us to move
code specific to a particular log item type out of xfs_log_recover.c and
into the log item source.

The first operation we virtualize is the log item sorting.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: convert xfs_log_recover_item_t to struct xfs_log_recover_item

Source kernel commit: 35f4521fd3a001fb290a1780f8beeffb06d99a04

Remove the old typedefs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unused iget_flags param from xfs_imap_to_bp()

Source kernel commit: c199507993ede3f63d0deae7e2cbc2f5462c6452

iget_flags is unused in xfs_imap_to_bp(). Remove the parameter and
fix up the callers.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: random buffer write failure errortag

Source kernel commit: 7376d74547344598008d00419eae0caa5f50f4f0

Introduce an error tag to randomly fail async buffer writes. This is
primarily to facilitate testing of the XFS error configuration
mechanism.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unnecessary shutdown check from xfs_iflush()

Source kernel commit: 15fab3b9be2255be70ba1c598a11622fa03c9d5e

The shutdown check in xfs_iflush() duplicates checks down in the
buffer code. If the fs is shut down, xfs_trans_read_buf_map() always
returns an error and falls into the same error path. Remove the
unnecessary check along with the warning in xfs_imap_to_bp()
that generates excessive noise in the log if the fs is shut down.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: spell out the parameter name for ->cancel_item

Source kernel commit: 2f88f1efd02ddf76cb5973abc42474c4dac2b03a

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: use a xfs_btree_cur for the ->finish_cleanup state

Source kernel commit: 3ec1b26c04d4910f37cdaad26d14b403c0240e30

Given how XFS is all based around btrees it doesn't make much sense
to offer a totally generic state when we can just use the btree cursor.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: turn dfp_done into a xfs_log_item

Source kernel commit: f09d167c20332ad1298ff82a6f538b4c7ea3fe1b

All defer op instance place their own extension of the log item into
the dfp_done field. Replace that with a xfs_log_item to improve type
safety and make the code easier to follow.

Also use the opportunity to improve the ->finish_item calling conventions
to place the done log item as the higher level structure before the
list_entry used for the individual items.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: refactor xfs_defer_finish_noroll

Source kernel commit: bb47d79750f1a68a75d4c7defc2da934ba31de14

Split out a helper that operates on a single xfs_defer_pending structure
to untangle the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: turn dfp_intent into a xfs_log_item

Source kernel commit: 13a8333339072b8654c1d2c75550ee9f41ee15de

All defer op instance place their own extension of the log item into
the dfp_intent field. Replace that with a xfs_log_item to improve type
safety and make the code easier to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: merge the ->diff_items defer op into ->create_intent

Source kernel commit: d367a868e46b025a8ced8e00ef2b3a3c2f3bf732

This avoids a per-item indirect call, and also simplifies the interface
a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: merge the ->log_item defer op into ->create_intent

Source kernel commit: c1f09188e8de0ae65433cb9c8ace4feb66359bcc

These are aways called together, and my merging them we reduce the amount
of indirect calls, improve type safety and in general clean up the code
a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: factor out a xfs_defer_create_intent helper

Source kernel commit: e046e949486ec92d83b2ccdf0e7e9144f74ef028

Create a helper that encapsulates the whole logic to create a defer
intent. This reorders some of the work that was done, but none of
that has an affect on the operation as only fields that don't directly
interact are affected.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove the xfs_inode_log_item_t typedef

Source kernel commit: fd9cbe51215198ccffa64169c98eae35b0916088

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: stop misusing an onstack inode

The onstack inode in xfs_check's process_inode is a potential landmine
since it's not a /real/ incore inode. The upcoming 5.8 merge will make
this messier wrt inode forks, so just remove the onstack inode and
reference the ondisk fields directly. This also reduces the amount of
thinking that I have to do w.r.t. future libxfs porting efforts.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

repair: remove custom dir2 sf fork verifier from phase6

The custom verifier exists to catch the case of repair setting a
dummy parent value of zero on directory inodes and temporarily
replace it with a valid inode number so the rest of the directory
verification can proceed. The custom verifier is no longer needed
now that the rootino is used as a dummy value for invalid on-disk
parent values.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

repair: use fs rootino for dummy parent value instead of zero

If a directory inode has an invalid parent ino on disk, repair
replaces the invalid value with a dummy value of zero in the buffer
and NULLFSINO in the in-core parent tracking. The zero value serves
no functional purpose as it is still an invalid value and the parent
must be repaired by phase 6 based on the in-core state before the
buffer can be written out. A consequence of using an invalid dummy
value is that phase 6 requires custom verifier infrastructure to
detect the invalid parent inode and temporarily replace it while the
core fork verifier runs. If we use a valid inode number as a dummy
value earlier in repair, this workaround can be removed.

An obvious choice for a valid dummy parent inode value is the
orphanage inode. However, the orphanage inode is not allocated until
much later in repair when the filesystem structure is established as
sound and placement of orphaned inodes is imminent. In this case, it
is too early to know for sure whether the associated inodes are
orphaned because a directory traversal later in repair can locate
references to the inode and repair the parent value based on the
structure of the directory tree.

Given all of this, escalate the preexisting workaround from the
custom verifier in phase 6 and set the root inode value as a dummy
parent for shortform directories with an invalid on-disk parent. The
in-core parent is still tracked as NULLFSINO and so forces repair to
either update the parent or orphan the inode before repair
completes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

repair: don't double check dir2 sf parent in phase 4

The shortform parent ino verification code runs once in phase 3
(ino_discovery == true) and once in phase 4 (ino_discovery ==
false). This is unnecessary and leads to duplicate error messages if
repair replaces an invalid parent value with zero because zero is
still an invalid value. Skip the check in phase 4.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[sandeen: add comments suggested by Darrick during review]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

repair: set the in-core inode parent in phase 3

The inode processing code checks and resets invalid parent values on
physical inodes in phase 3 but waits to update the parent value in
the in-core tracking until phase 4. There doesn't appear to be any
specific reason for the latter beyond caution. In reality, the only
reason this doesn't cause problems is that phase 3 replaces an
invalid on-disk parent with another invalid value, so the in-core
parent returned by phase 4 translates to NULLFSINO.

This is subtle and fragile. To eliminate this duplicate processing
behavior and break the subtle dependency of requiring an invalid
dummy value in physical directory inodes, update the in-core parent
tracking structure at the same point in phase 3 that physical inodes
are updated. Invalid on-disk parent values will still translate to
NULLFSINO for the in-core tracking to be identified by later phases.
This ensures that if a valid dummy value is placed in a physical
inode (such as rootino) with an invalid parent in phase 3, phase 4
won't mistakenly return the valid dummy value to be incorrectly set
in the in-core tracking over the NULLFSINO value that represents the
broken on-disk state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: Remove redundant setting/check for lsattr/stat command

lsattr/stat command can check exclusive options by argmax = 1
so the related setting/check is redundant.

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: Make -D and -R options incompatible explicitly

-D and -R options are mutually exclusive actually but many commands
can accept them at the same time and process them differently(e.g.
chattr alway chooses -D option or cowextsize accepts the last one
specified), so make these commands have the consistent behavior that
don't accept them concurrently.

1) Make them incompatible by setting argmax to 1 if commands can accept
single option(i.e. lsattr, lsproj).
2) Make them incompatible by adding check if commands can accept multiple
options(i.e. chattr, chproj, extsize, cowextsize).

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: skip mount time quotacheck if our quotacheck was ok

If we verified that the incore quota counts match the ondisk quota
contents, we can leave the CHKD flags set so that the next mount doesn't
have to repeat the quotacheck.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: check quota values if quota was loaded

If the filesystem looks like it had up to date quota information, check
it against what's in the filesystem and report if we find discrepancies.
This closes one of the major gaps in corruptions that are detected by
xfs_check vs. xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix clearing of quota CHKD flags

XFS_ALL_QUOTA_CHKD, being a OR of [UGP]QUOTA_CHKD, is a bitset of the
possible *incore* quota checked flags.  This means that it cannot be
used in a comparison with the *ondisk* quota checked flags because V4
filesystems set OQUOTA_CHKD, not the [GU]QUOTA_CHKD flags (which are V5
flags).

If you have a V4 filesystem with user quotas disabled but either group
or project quotas enabled, xfs_repair will /not/ claim that the quota
info will be regenerated on the next mount like it does in any other
situation.  This is because the ondisk qflags field has OQUOTA_CHKD set
but repair fails to notice.

Worse, if you have a V4 filesystem with user and group quotas enabled
and mild corruption, repair will claim that the quota info will be
regenerated.  If you then mount the fs with only group quotas enabled,
quotacheck will not run to correct the data because repair failed to
clear OQUOTA_CHKD properly.

These are fairly benign and unlikely scenarios, but when we add
quotacheck capabilities to xfs_repair, it will complain about the
incorrect quota counts, which causes regressions in xfs/278.

Fixes: 342aef1ec0ec ("xfsprogs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: alphabetize HFILES and CFILES

Convert the definitions of HFILES and CFILES to lists that can be sorted
easily (in vim, anyway), then fix the alphabetization of the makefile
targets.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v5.7.0

Update all the necessary files for a 5.7.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: Document '-q' option for sendfile command

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v5.7.0-rc1

Update all the necessary files for a 5.7.0-rc1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_io: Document '-q' option for pread/pwrite command

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_copy: flush target devices before exiting

Flush the devices we're copying to before exiting, so that we can report
any write errors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: try to fill the AGFL before we fix the freelist

In commit 9851fd79bfb1, we added a slight amount of slack to the free
space btrees being reconstructed so that the initial fix_freelist call
(which is run against a totally empty AGFL) would never have to split
either free space btree in order to populate the free list.

The new btree bulk loading code in xfs_repair can re-create this
situation because it can set the slack values to zero if the filesystem
is very full. However, these days repair has the infrastructure needed
to ensure that overestimations of the btree block counts end up on the
AGFL or get freed back into the filesystem at the end of phase 5.

Fix this problem by reserving extra blocks in the bnobt reservation, and
checking that there are enough overages in the bnobt/cntbt fakeroots to
populate the AGFL with the minimum number of blocks it needs to handle a
split in the bno/cnt/rmap btrees.

Note that we reserve blocks for the new bnobt/cntbt/AGFL at the very end
of the reservation steps in phase 5, so the extra allocation should not
cause repair to fail if it can't find blocks for btrees.

Fixes: 9851fd79bfb1 ("repair: AGFL rebuild fails if btree split required")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: simplify free space btree calculations in init_freespace_cursors

Add a summary variable to the bulkload structure so that we can track
the number of blocks that have been reserved for a particular (btree)
bulkload operation. Doing so enables us to simplify the logic in
init_freespace_cursors that deals with figuring out how many more blocks
we need to fill the bnobt/cntbt properly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: complain about ag header crc errors

Repair doesn't complain about crc errors in the AG headers, and it
should. Otherwise, this gives the admin the wrong impression about the
state of the filesystem after a nomodify check.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: use bitmap to track blocks lost during btree construction

Use the incore bitmap structure to track blocks that were lost
during btree construction. This makes it somewhat more efficient.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: remove old btree rebuild support code

This code isn't needed anymore, so get rid of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: rebuild refcount btrees with bulk loader

Use the btree bulk loading functions to rebuild the refcount btrees
and drop the open-coded implementation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: rebuild reverse mapping btrees with bulk loader

Use the btree bulk loading functions to rebuild the reverse mapping
btrees and drop the open-coded implementation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
[sandeen: adjust for prior inclusion of "fix rebuilding btree block..."]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: rebuild inode btrees with bulk loader

Use the btree bulk loading functions to rebuild the inode btrees
and drop the open-coded implementation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: rebuild free space btrees with bulk loader

Use the btree bulk loading functions to rebuild the free space btrees
and drop the open-coded implementation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
[sandeen: move phase5_func changes into this patch]
[sandeen: adjust for prior inclusion of "fix rebuilding btree block..."]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: create a new class of btree rebuild cursors

Create some new support structures and functions to assist phase5 in
using the btree bulk loader to reconstruct metadata btrees. This is the
first step in removing the open-coded AG btree rebuilding code.

Note: The code in this patch will not be used anywhere until the next
patch, so warnings about unused symbols are expected.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[for all but the slack_node change from 0 to 2:]
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: inject lost blocks back into the fs no matter the owner

In repair phase 5, inject_lost_blocks takes the blocks that we allocated
but didn't use for constructing the new AG btrees and puts them back in
the filesystem by adding them to the free space. The only btree that
can overestimate like that are the free space btrees, but in principle,
any of the btrees can do that. If the others did, the rmap record owner
for those blocks won't necessarily be OWNER_AG, and if it isn't, repair
will fail.

Get rid of this logic bomb so that we can use it for /any/ block count
overestimation, and then we can use it to clean up after all
reconstruction of any btree type.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: remove gratuitous code block in phase5

A commit back in 2008 removed a "for" loop ahead of this code block, but
left the indented code block in place. Remove it for clarity and reflow
comments & lines as needed.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: make container for btree bulkload root and block reservation

Create appropriate data structures to manage the fake btree root and
block reservation lists needed to stage a btree bulkload operation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: rename the agfl index loop variable in build_agf_agfl

The variable 'i' is used to index the AGFL block list, so change the
name to make it clearer what this is to be used for.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: drop lostblocks from build_agf_agfl

We don't do anything with this parameter, so get rid of it.

Fixes: ef4332b8 ("xfs_repair: add freesp btree block overflow to the free space")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: complain about any nonzero inprogress value, not just 1

Complain about the primary superblock having any non-zero sb_inprogress
value, not just 1. This brings repair's behavior into alignment with
xfs_check and the kernel.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: complain about extents in unknown state

During phase 4, if we find any extents that are unaccounted for, report
the entire extent, not just the first block.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: complain about free space only seen by one btree

During the free space btree walk, scan_allocbt claims in a comment that
we'll catch FREE1 blocks (i.e. blocks that were seen by only one free
space btree) later. This never happens, with the result that xfs_repair
in dry-run mode can return 0 on a filesystem with corrupt free space
btrees.

Found by fuzzing xfs/358 with numrecs = middlebit (or sub).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: mark entire free space btree record as free1

In scan_allocbt, we iterate each free space btree record (of both bnobt
and cntbt) in the hopes of pushing all the free space from UNKNOWN to
FREE1 to FREE. Unfortunately, the first time we see a free space record
we only set the first block of that record to FREE1, which means that
the second time we see the record, the first block will get set to FREE,
but the rest of the free space will only make it to FREE1. This is
incorrect state, so we need to fix that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: remove verify_aginum

Replace this homegrown inode pointer verification function with the
libxfs checking helper. This one is a little tricky because this
function (unlike all of its verify_* siblings) returned 1 for bad and 0
for good, so we must invert the checking logic.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: remove verify_dfsbno

Replace this homegrown helper with its libxfs equivalent.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: refactor verify_dfsbno_range

Refactor this function to use libxfs type checking helpers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: convert to libxfs_verify_agbno

Convert the homegrown verify_agbno callers to use the libxfs function,
as needed. In some places we drop the "bno != 0" checks because those
conditionals are checking btree roots; btree roots should never be
zero if the corresponding feature bit is set; and repair skips the if
clause entirely if the feature bit is disabled.

In effect, this strengthens repair to validate that AG btree pointers
neither point to the AG headers nor past the end of the AG.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: complain about bad interior btree pointers

Actually complain about garbage btree node pointers, don't just silently
ignore them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: tag inobt vs finobt errors properly

Amend the generic inode btree block scanner function to tag correctly
which tree it's complaining about. Previously, dubious finobt headers
would be attributed to the "inode btree", which is at best ambiguous
and misleading at worst.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix rmapbt record order check

The rmapbt record order checks here don't quite work properly. For
non-shared filesystems, we fail to check that the startblock of the nth
record comes entirely after the previous record.

However, for filesystems with shared blocks (reflink) we correctly check
that the startblock/owner/offset of the nth record comes after the
previous one.

Therefore, make the reflink fs checks use "laststartblock" to preserve
that functionality while making the non-reflink fs checks use
"lastblock" to fix the problem outlined above.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: check for out-of-order inobt records

Make sure that the inode btree records are in order.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix bnobt and refcountbt record order checks

The bnobt and refcountbt scanners attempt to check that records are in
the correct order. However, the lastblock variable in both functions
ought to be set to the end of the previous record (instead of the start)
because otherwise we fail to catch overlapping records, which are not
allowed in either btree type.

Found by running xfs/410 with recs[1].blockcount = middlebit.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: check for AG btree records that would wrap around

For AG btree types, make sure that each record's length is not so huge
that integer wraparound would happen.

Found via xfs/358 fuzzing recs[1].blockcount = ones.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: warn when we would have rebuilt a directory

longform_dir2_entry_check should warn the user when we would have
rebuilt a directory had -n not been given on the command line. The
missing warning results in repair returning 0 (all clean) when in fact
there were things that it would have fixed.

Found by running xfs/496 against lents[0].hashval = middlebit.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix missing dir buffer corruption checks

The da_read_buf() function operates in "salvage" mode, which means that
if the verifiers fail, it will return a buffer with b_error set.  The
callers of da_read_buf, however, do not adequately check for verifier
errors, which means that repair can fail to flag a corrupt filesystem.

Fix the callers to do this properly.  The dabtree block walker and the
dabtree path checker functions to complain any time the da node / leafn
verifiers fail.  Fix the directory block walking functions to complain
about EFSCORRUPTED, since they already dealt with EFSBADCRC.

Found by running xfs/496 against lhdr.stale = middlebit.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_quota: fix unsigned int id comparisons

Fix compiler warnings about unsigned int comparisons by replacing them
with an explicit check for the one possible invalid value (-1U).
id_from_string sets exitcode to nonzero when it sees this value, so the
call sites don't have to do that.

Coverity-id: 1463855, 1463856, 1463857
Fixes: 67a73d6139d0 ("xfs_quota: refactor code to generate id from name")
Fixes: 36dc471cc9bb ("xfs_quota: allow individual timer extension")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix rebuilding btree block less than minrecs

In production, we found that sometimes xfs_repair phase 5
rebuilds freespace node block with pointers less than minrecs
and if we trigger xfs_repair again it would report such
the following message:

bad btree nrecs (39, min=40, max=80) in btbno block 0/7882

The background is that xfs_repair starts to rebuild AGFL
after the freespace btree is settled in phase 5 so we may
need to leave necessary room in advance for each btree
leaves in order to avoid freespace btree split and then
result in AGFL rebuild fails. The old mathematics uses
ceil(num_extents / maxrecs) to decide the number of node
blocks. That would be fine without leaving extra space
since minrecs = maxrecs / 2 but if some slack was decreased
from maxrecs, the result would be larger than what is
expected and cause num_recs_pb less than minrecs, i.e:

num_extents = 79, adj_maxrecs = 80 - 2 (slack) = 78

so we'd get

num_blocks = ceil(79 / 78) = 2,
num_recs_pb = 79 / 2 = 39, which is less than
minrecs = 80 / 2 = 40

OTOH, btree bulk loading code behaves in a different way.
As in xfs_btree_bload_level_geometry it wrote

num_blocks = floor(num_extents / maxrecs)

which will never go below minrecs. And when it goes above
maxrecs, just increment num_blocks and recalculate so we
can get the reasonable results.

Later, btree bulk loader will replace the current repair code.
But we may still want to look for a backportable solution
for stable versions. Hence, keep the same logic to avoid
the freespace as well as rmap btree minrecs underflow for now.

Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Eric Sandeen <sandeen@sandeen.net>
Fixes: 9851fd79bfb1 ("repair: AGFL rebuild fails if btree split required")
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

metadump: remove redundant bracket and show right SYNOPSIS

The bracket is meaningless, so remove it and show right SYNOPSIS.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: simplify the configured sector sizes setting in validate_sectorsize

There are two places that set the configured sector sizes in
validate_sectorsize, actually we can simplify them and combine into one
if statement. Use the default value structure to set the topology sectors
when probing fails.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_quota: allow individual timer extension

The only grace period which can be set via xfs_quota today is for id 0,
i.e. the default grace period for all users. However, setting an
individual grace period is useful; for example:

Alice has a soft quota of 100 inodes, and a hard quota of 200 inodes
Alice uses 150 inodes, and enters a short grace period
Alice really needs to use those 150 inodes past the grace period
The administrator extends Alice's grace period until next Monday

vfs quota users such as ext4 can do this today, with setquota -T

xfs_quota can now accept an optional user id or name (symmetric with
how warn limits are specified), in which case that user's grace period
is extended to expire the given amount of time from now().

To maintain compatibility with old command lines, if none of
[-d|id|name] are specified, default limits are set as before.

(kernelspace requires updates to enable all this as well.)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_quota: refactor code to generate id from name

There's boilerplate for setting limits and warnings, where we have
a case statement for each of the 3 quota types, and from there call
3 different functions to configure each of the 3 types, each of which
calls its own version of id to string function...

Refactor this so that the main function can call a generic id to string
conversion routine, and then call a common action. This save a lot of
LOC.

I was looking at allowing xfs to bump out individual grace periods like
setquota can do, and this refactoring allows us to add new actions like
that without copying all the boilerplate again.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix progress reporting

The Fixes: commit tried to avoid a segfault in case the progress timer
went off before the first message type had been set up, but this
had the net effect of short-circuiting the pthread start routine,
and so the timer didn't get set up at all and we lost all fine-grained
progress reporting.

The initial problem occurred when log zeroing took more time than the
timer interval.

So, make a new log zeroing progress item and initialize it when we first
set up the timer thread, to be sure that if the timer goes off while we
are still zeroing the log, it will be initialized and correct.

(We can't offer fine-grained status on log zeroing, so it'll go from
zero to $LOGBLOCKS with nothing in between, but it's unlikely that log
zeroing will take so long that this really matters.)

Reported-by: Leonardo Vaz <lvaz@redhat.com>
Fixes: 7f2d6b811755 ("xfs_repair: avoid segfault if reporting progre...")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Donald Douwsma <ddouwsma@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

debian: replace libreadline with libedit

Now that upstream has dropped libreadline support entirely, switch the
debian package over to libedit.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: fix rdbmap_boundscheck

This predicate should check the a rt block number against number of
rtblocks, not the number of AG blocks. Ooops.

Fixes: 7161cd21b3ed ("xfs_db: bounds-check access to the dbmap array")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>