git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

xfsprogs: Release v5.16.0-rc0

Update all the necessary files for a 5.16.0-rc0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: Fix the free logic of state in xfs_attr_node_hasname

Source kernel commit: a1de97fe296c52eafc6590a3506f4bbd44ecb19a

When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine.
Adding a getxattr operation after xattr corrupted, I can reproduce it 100%.

The deadlock as below:
[983.923403] task:setfattr        state:D stack:    0 pid:17639 ppid: 14687 flags:0x00000080
[  983.923405] Call Trace:
[  983.923410]  __schedule+0x2c4/0x700
[  983.923412]  schedule+0x37/0xa0
[  983.923414]  schedule_timeout+0x274/0x300
[  983.923416]  __down+0x9b/0xf0
[  983.923451]  ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[  983.923453]  down+0x3b/0x50
[  983.923471]  xfs_buf_lock+0x33/0xf0 [xfs]
[  983.923490]  xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[  983.923508]  xfs_buf_get_map+0x4c/0x320 [xfs]
[  983.923525]  xfs_buf_read_map+0x53/0x310 [xfs]
[  983.923541]  ? xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923560]  xfs_trans_read_buf_map+0x1cf/0x360 [xfs]
[  983.923575]  ? xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923590]  xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923606]  xfs_da3_node_read+0x1f/0x40 [xfs]
[  983.923621]  xfs_da3_node_lookup_int+0x69/0x4a0 [xfs]
[  983.923624]  ? kmem_cache_alloc+0x12e/0x270
[  983.923637]  xfs_attr_node_hasname+0x6e/0xa0 [xfs]
[  983.923651]  xfs_has_attr+0x6e/0xd0 [xfs]
[  983.923664]  xfs_attr_set+0x273/0x320 [xfs]
[  983.923683]  xfs_xattr_set+0x87/0xd0 [xfs]
[  983.923686]  __vfs_removexattr+0x4d/0x60
[  983.923688]  __vfs_removexattr_locked+0xac/0x130
[  983.923689]  vfs_removexattr+0x4e/0xf0
[  983.923690]  removexattr+0x4d/0x80
[  983.923693]  ? __check_object_size+0xa8/0x16b
[  983.923695]  ? strncpy_from_user+0x47/0x1a0
[  983.923696]  ? getname_flags+0x6a/0x1e0
[  983.923697]  ? _cond_resched+0x15/0x30
[  983.923699]  ? __sb_start_write+0x1e/0x70
[  983.923700]  ? mnt_want_write+0x28/0x50
[  983.923701]  path_removexattr+0x9b/0xb0
[  983.923702]  __x64_sys_removexattr+0x17/0x20
[  983.923704]  do_syscall_64+0x5b/0x1a0
[  983.923705]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  983.923707] RIP: 0033:0x7f080f10ee1b

When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in
xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it
free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job.

Then subsequent removexattr will hang because of it.

This bug was introduced by kernel commit 07120f1abdff ("xfs: Add xfs_has_attr and subroutines").
It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state
in this case. But xfs_attr_node_hasname will free state itself instead of caller if
xfs_da3_node_lookup_int fails.

Fix this bug by moving the step of free state into caller.

Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and
xfs_attr_node_removename_setup function because we should free state ourselves.

Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines")
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: #ifdef out perag code for userspace

Source kernel commit: 29f11fce211c7fcf32713457c031e71785fb6088

The xfs_perag structure and initialization is unused in userspace,
so #ifdef it out with __KERNEL__ to facilitate the xfsprogs sync
and build.

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: use swap() to make dabtree code cleaner

Source kernel commit: 5b068aadf62da006891383f6b23e47bc3ad49995

Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid
opencoding it.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Yang Guang <yang.guang5@zte.com.cn>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unused parameter from refcount code

Source kernel commit: c04c51c524697cd68d668d595f8ebc381ffe426b

The owner info parameter is always NULL, so get rid of the parameter.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: reduce the size of struct xfs_extent_free_item

Source kernel commit: b3b5ff412ab04afd99173bb12d3cc146ee478ae7

We only use EFIs to free metadata blocks -- not regular data/attr fork
extents. Remove all the fields that we never use, for a net reduction
of 16 bytes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: rename xfs_bmap_add_free to xfs_free_extent_later

Source kernel commit: c201d9ca5392b20f04882848a071025b0e194c17

xfs_bmap_add_free isn't a block mapping function; it schedules deferred
freeing operations for a later point in a compound transaction chain.
While it's primarily used by bunmapi, its use has expanded beyond that.
Move it to xfs_alloc.c and rename the function since it's now general
freeing functionality. Bring the slab cache bits in line with the
way we handle the other intent items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: create slab caches for frequently-used deferred items

Source kernel commit: f3c799c22c661e181c71a0d9914fc923023f65fb

Create slab caches for the high-level structures that coordinate
deferred intent items, since they're used fairly heavily.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: compact deferred intent item structures

Source kernel commit: 9e253954acf53227f33d307f5ac5ff94c1ca5880

Rearrange these structs to reduce the amount of unused padding bytes.
This saves eight bytes for each of the three structs changed here, which
means they're now all (rmap/bmap are 64 bytes, refc is 32 bytes) even
powers of two.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: rename _zone variables to _cache

Source kernel commit: 182696fb021fc196e5cbe641565ca40fcf0f885a

Now that we've gotten rid of the kmem_zone_t typedef, rename the
variables to _cache since that's what they are.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove kmem_zone typedef

Source kernel commit: e7720afad068a6729d9cd3aaa08212f2f5a7ceff

Remove these typedefs by referencing kmem_cache directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: use separate btree cursor cache for each btree type

Source kernel commit: 9fa47bdcd33b117599e9ee3f2e315cb47939ac2d

Now that we have the infrastructure to track the max possible height of
each btree type, we can create a separate slab cache for cursors of each
type of btree. For smaller indices like the free space btrees, this
means that we can pack more cursors into a slab page, improving slab
utilization.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: compute absolute maximum nlevels for each btree type

Source kernel commit: 0ed5f7356daee74244b02e100b3cc043e886e686

Add code for all five btree types so that we can compute the absolute
maximum possible btree height for each btree type. This is a setup for
the next patch, which makes every btree type have its own cursor cache.

The functions are exported so that we can have xfs_db report the
absolute maximum btree heights for each btree type, rather than making
everyone run their own ad-hoc computations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: kill XFS_BTREE_MAXLEVELS

Source kernel commit: bc8883eb775dd18d8b84733d8b3a3955b72d103a

Nobody uses this symbol anymore, so kill it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: stop using XFS_BTREE_MAXLEVELS

Use the precomputed per-btree-type max height values.

[sandeen: note that >= changes to > here; The maximal value is
fine, but with the precomputed value specific to this filesystem,
our new limit is the actual acceptable max, vs. XFS_BTREE_MAXLEVELS
which was an absolute design max and was larger than most filesystems
could create.]

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: stop using XFS_BTREE_MAXLEVELS

Use the precomputed per-btree-type max height values.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: compute the maximum height of the rmap btree when reflink enabled

Source kernel commit: 9ec691205e7d4a11190519df6561a168ae6af3a4

Instead of assuming that the hardcoded XFS_BTREE_MAXLEVELS value is big
enough to handle the maximally tall rmap btree when all blocks are in
use and maximally shared, let's compute the maximum height assuming the
rmapbt consumes as many blocks as possible.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: clean up xfs_btree_{calc_size,compute_maxlevels}

Source kernel commit: 1b236ad7ba800bc3e9994881a8a453eb8bf5ca0f

During review of the next patch, Dave remarked that he found these two
btree geometry calculation functions lacking in documentation and that
they performed more work than was really necessary.

These functions take the same parameters and have nearly the same logic;
the only real difference is in the return values. Reword the function
comment to make it clearer what each function does, and move them to be
adjacent to reinforce their relation.

Clean up both of them to stop opencoding the howmany functions, stop
using the uint typedefs, and make them both support computations for
more than 2^32 leaf records, since we're going to need all of the above
for files with large data forks and large rmap btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: compute maximum AG btree height for critical reservation calculation

Source kernel commit: b74e15d720d0764345934ebb599a99a077c52533

Compute the actual maximum AG btree height for deciding if a per-AG
block reservation is critically low. This only affects the sanity check
condition, since we /generally/ will trigger on the 10% threshold. This
is a long-winded way of saying that we're removing one more usage of
XFS_BTREE_MAXLEVELS.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: rename m_ag_maxlevels to m_allocbt_maxlevels

Source kernel commit: 7cb3efb4cfdd4f3eb1f36b0ce39254b848ff2371

Years ago when XFS was thought to be much more simple, we introduced
m_ag_maxlevels to specify the maximum btree height of per-AG btrees for
a given filesystem mount. Then we observed that inode btrees don't
actually have the same height and split that off; and now we have rmap
and refcount btrees with much different geometries and separate
maxlevels variables.

The 'ag' part of the name doesn't make much sense anymore, so rename
this to m_alloc_maxlevels to reinforce that this is the maximum height
of the *free space* btrees. This sets us up for the next patch, which
will add a variable to track the maximum height of all AG btrees.

(Also take the opportunity to improve adjacent comments and fix minor
style problems.)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: dynamically allocate cursors based on maxlevels

Source kernel commit: c940a0c54a2e9333478f1d87ed40006a04fcec7e

To support future btree code, we need to be able to size btree cursors
dynamically for very large btrees. Switch the maxlevels computation to
use the precomputed values in the superblock, and create cursors that
can handle a certain height. For now, we retain the btree cursor cache
that can handle up to 9-level btrees, though a subsequent patch
introduces separate caches for each btree type, where each cache's
objects will be exactly tall enough to handle the specific btree type.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: encode the max btree height in the cursor

Source kernel commit: c0643f6fdd6d3c448142ed1492a9a6b6505f9afb

Encode the maximum btree height in the cursor, since we're soon going to
allow smaller cursors for AG btrees and larger cursors for file btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: refactor btree cursor allocation function

Source kernel commit: 56370ea6e5fe3e3d6e1ca2da58f95fb0d5e1779f

Refactor btree allocation to a common helper.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: rearrange xfs_btree_cur fields for better packing

Source kernel commit: 69724d920e7c30ca4421af615c499e92cfcc550b

Reduce the size of the btree cursor structure some more by rearranging
fields to eliminate unused space. While we're at it, fix the ragged
indentation and a spelling error.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: prepare xfs_btree_cur for dynamic cursor heights

Source kernel commit: 6ca444cfd663545e9e1c19ad2695836ffafad0a6

Split out the btree level information into a separate struct and put it
at the end of the cursor structure as a VLA. Files with huge data forks
(and in the future, the realtime rmap btree) will require the ability to
support many more levels than a per-AG btree cursor, which means that
we're going to create per-btree type cursor caches to conserve memory
for the more common case.

Note that a subsequent patch actually introduces dynamic cursor heights.
This one merely rearranges the structure to prepare for that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: reduce the size of nr_ops for refcount btree cursors

Source kernel commit: efb79ea31067ae3dd0f348eb06e6b9a5e9907078

We're never going to run more than 4 billion btree operations on a
refcount cursor, so shrink the field to an unsigned int to reduce the
structure size. Fix whitespace alignment too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove xfs_btree_cur.bc_blocklog

Source kernel commit: cc411740472d958b718b9c6a7791ba00d88f7cef

This field isn't used by anyone, so get rid of it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix perag reference leak on iteration race with growfs

Source kernel commit: 892a666fafa19ab04b5e948f6c92f98f1dafb489

The for_each_perag*() set of macros are hacky in that some (i.e.
those based on sb_agcount) rely on the assumption that perag
iteration terminates naturally with a NULL perag at the specified
end_agno. Others allow for the final AG to have a valid perag and
require the calling function to clean up any potential leftover
xfs_perag reference on termination of the loop.

Aside from providing a subtly inconsistent interface, the former
variant is racy with growfs because growfs can create discoverable
post-eofs perags before the final superblock update that completes
the grow operation and increases sb_agcount. This leads to the
following assert failure (reproduced by xfs/104) in the perag free
path during unmount:

XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/libxfs/xfs_ag.c, line: 195

This occurs because one of the many for_each_perag() loops in the
code that is expected to terminate with a NULL pag (and thus has no
post-loop xfs_perag_put() check) raced with a growfs and found a
non-NULL post-EOFS perag, but terminated naturally based on the
end_agno check without releasing the post-EOFS perag.

Rework the iteration logic to lift the agno check from the main for
loop conditional to the iteration helper function. The for loop now
purely terminates on a NULL pag and xfs_perag_next() avoids taking a
reference to any perag beyond end_agno in the first place.

Fixes: f250eedcf762 ("xfs: make for_each_perag... a first class citizen")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: terminate perag iteration reliably on agcount

Source kernel commit: 8ed004eb9d07a5d6114db3e97a166707c186262d

The for_each_perag_from() iteration macro relies on sb_agcount to
process every perag currently within EOFS from a given starting
point. It's perfectly valid to have perag structures beyond
sb_agcount, however, such as if a growfs is in progress. If a perag
loop happens to race with growfs in this manner, it will actually
attempt to process the post-EOFS perag where ->pag_agno ==
sb_agcount. This is reproduced by xfs/104 and manifests as the
following assert failure in superblock write verifier context:

XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22

Update the corresponding macro to only process perags that are
within the current sb_agcount.

Fixes: 58d43a7e3263 ("xfs: pass perags around in fsmap data dev functions")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: rename the next_agno perag iteration variable

Source kernel commit: f1788b5e5ee25bedf00bb4d25f82b93820d61189

Rename the next_agno variable to be consistent across the several
iteration macros and shorten line length.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fold perag loop iteration logic into helper function

Source kernel commit: bf2307b195135ed9c95eebb38920d8bd41843092

Fold the loop iteration logic into a helper in preparation for
further fixups. No functional change in this patch.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove the xfs_dqblk_t typedef

Source kernel commit: 11a83f4c393040dc3a6a368c6399785dbfae7602

Remove the few leftover instances of the xfs_dinode_t typedef.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove the xfs_dsb_t typedef

Source kernel commit: ed67ebfd7c4061b4b505ac42eb00e08dd09f4d38

Remove the few leftover instances of the xfs_dinode_t typedef.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove the xfs_dinode_t typedef

Source kernel commit: de38db7239c4bd2f37ebfcb8a5f22b4e8e657737

Remove the few leftover instances of the xfs_dinode_t typedef.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: check that bc_nlevels never overflows

Source kernel commit: 4c175af2ccd3e0d618b2af941e656fabc453c4af

Warn if we ever bump nlevels higher than the allowed maximum cursor
height.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove xfs_btree_cur_t typedef

Source kernel commit: ae127f087dc22b6e37edc870079abf0721a6aed0

Get rid of this old typedef before we start changing other things.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: fix maxlevels comparisons in the btree staging code

Source kernel commit: 78e8ec83a404d63dcc86b251f42e4ee8aff27465

The btree geometry computation function has an off-by-one error in that
it does not allow maximally tall btrees (nlevels == XFS_BTREE_MAXLEVELS).
This can result in repairs failing unnecessarily on very fragmented
filesystems. Subsequent patches to remove MAXLEVELS usage in favor of
the per-btree type computations will make this a much more likely
occurrence.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: port the defer ops capture and continue to resource capture

Source kernel commit: 512edfac85d243ed6a5a5f42f513ebb7c2d32863

When log recovery tries to recover a transaction that had log intent
items attached to it, it has to save certain parts of the transaction
state (reservation, dfops chain, inodes with no automatic unlock) so
that it can finish single-stepping the recovered transactions before
finishing the chains.

This is done with the xfs_defer_ops_capture and xfs_defer_ops_continue
functions. Right now they open-code this functionality, so let's port
this to the formalized resource capture structure that we introduced in
the previous patch. This enables us to hold up to two inodes and two
buffers during log recovery, the same way we do for regular runtime.

With this patch applied, we'll be ready to support atomic extent swap
which holds two inodes; and logged xattrs which holds one inode and one
xattr leaf buffer.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: formalize the process of holding onto resources across a defer roll

Source kernel commit: c5db9f937b2971c78d6c6bbaa61a6450efa8b845

Transaction users are allowed to flag up to two buffers and two inodes
for ownership preservation across a deferred transaction roll. Hoist
the variables and code responsible for this out of xfs_defer_trans_roll
so that we can use it for the defer capture mechanism.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: use kmem_cache_free() for kmem_cache objects

Source kernel commit c30a0cbd07ecc0eec7b3cd568f7b1c7bb7913f93

For kmalloc() allocations SLOB prepends the blocks with a 4-byte header,
and it puts the size of the allocated blocks in that header.
Blocks allocated with kmem_cache_alloc() allocations do not have that
header.

SLOB explodes when you allocate memory with kmem_cache_alloc() and then
try to free it with kfree() instead of kmem_cache_free().
SLOB will assume that there is a header when there is none, read some
garbage to size variable and corrupt the adjacent objects, which
eventually leads to hang or panic.

Let's make XFS work with SLOB by using proper free function.

Fixes: 9749fee83f38 ("xfs: enable the xfs_defer mechanism to process extents to free")
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix AG header btree level comparisons

It's not an error if repair encounters a btree with the maximal
height, so don't print warnings. Also, we don't allow zero-height
btrees.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: fix metadump level comparisons

It's not an error if metadump encounters a btree with the maximal
height, so don't print warnings.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v5.15.0

Update all the necessary files for a 5.15.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: increase the minimum log size to 64MB when possible

(Commit log from Darrick J. Wong):

Recently, the upstream maintainers have been taking a lot of heat on
account of writer threads encountering high latency when asking for log
grant space when the log is small.  The reported use case is a heavily
threaded indexing product logging trace information to a filesystem
ranging in size between 20 and 250GB.  The meetings that result from the
complaints about latency and stall warnings in dmesg both from this use
case and also a large well known cloud product are now consuming 25% of
the maintainer's weekly time and have been for months.

For small filesystems, the log is small by default because we have
defaulted to a ratio of 1:2048 (or even less).  For grown filesystems,
this is even worse, because big filesystems generate big metadata.
However, the log size is still insufficient even if it is formatted at
the larger size.

On a 220GB filesystem, the 99.95% latencies observed with a 200-writer
file synchronous append workload running on a 44-AG filesystem (with 44
CPUs) spread across 4 hard disks showed:

99.5%
Log(MB) Latency(ms) BW (MB/s) xlog_grant_head_wait
10 520 243 1875
20 220 308 540
40 140 360 6
80 92 363 0
160 86 364 0

For 4 NVME, the results were:

10 201 409 898
20 177 488 144
40 122 550 0
80 120 549 0
160 121 545 0

This shows pretty clearly that we could reduce the amount of time that
threads spend waiting on the XFS log by increasing the log size to at
least 40MB regardless of size.  We then repeated the benchmark with a
cloud system and an old machine to see if there were any ill effects on
less stable hardware.

For cloudy iscsi block storage, the results were:

10 390 176 2584
20 173 186 357
40 37 187 0
80 40 183 0
160 37 183 0

A decade-old machine w/ 24 CPUs and a giant spinning disk RAID6 array
produced this:

10 55 5.4 0
20 40 5.9 0
40 62 5.7 0
80 66 5.7 0
160 25 5.4 0

From the first three scenarios, it is clear that there are gains to be
had by sizing the log somewhere between 40 and 80MB -- the long tail
latency drops quite a bit, and programs are no longer blocking on the
log's transaction space grant heads.  Split the difference and set the
log size floor to 64MB.

This patch/behavior was originally proposed by Darrick Wong, rewritten
to avoid extra heuristics and dependencies on other pending changes.

Inspired-by: Darrick J. Wong <djwong@kernel.org>
Commit-log-stolen-from: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[sandeen: move "at least 64mb" threshold to 300mb filesystem]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: retry scrub (and repair) of items that are ok except for XFAIL

Sometimes a metadata object will pass all of the obvious scrubber
checks, but we won't be able to cross-reference the object's records
with other metadata objects (e.g. a file data fork and a free space
btree both claim ownership of an extent). When this happens during the
checking phase, we should queue the object for a repair, which means
that phase 4 will keep re-evaluating the object as repairs proceed.
Eventually, the hope is that we'll fix the filesystem and everything
will scrub cleanly; if not, we recommend running xfs_repair as a second
attempt to fix the inconsistency.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: fix xfrog_scrub_metadata error reporting

Commit de5d20ec converted xfrog_scrub_metadata to return negative error
codes directly, but forgot to fix up the str_errno calls to use
str_liberror. This doesn't result in incorrect error reporting
currently, but (a) the calls in the switch statement are inconsistent,
and (b) this will matter in future patches where we can call library
functions in between xfrog_scrub_metadata and str_liberror.

Fixes: de5d20ec ("libfrog: convert scrub.c functions to negative error codes")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v5.15.0-rc1

Update all the necessary files for a 5.15.0-rc1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_quota: fix up dump and report documentation

Documentation for these commands was a bit of a mess.

1) The help args were respecified in the _help() functions, overwriting
the strings which had been set up in the _init functions as all
other commands do. Worse, in the report case, they differed.

2) The -L/-U dump options were not present in either short help string.

3) The -L/-U dump options were not documented in the xfs_quota manpage.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: don't guess about failure reason in phase6

There are many error messages in phase 6 which say
"filesystem may be out of space," when in reality the failure could
have been corruption or some other issue. Rather than guessing, and
emitting a confusing and possibly-wrong message, use the existing
res_failed() for any xfs_trans_alloc failures, and simply print the
error number in the other cases.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_quota: don't exit on fs_table_insert_project_path failure

If "project -p" fails in fs_table_insert_project_path, it
calls exit() today which is quite unfriendly. Return an error
and return to the command prompt as expected.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[sandeen: move fprintf to caller per request]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs.xfs(8): remove incorrect default inode allocator description

The "maxpct" section of the mkfs.xfs manpage has a gratuitous and
incorrect description of the default inode allocator mode.

inode64 has been the default since 2012, as of

08bf540412ed xfs: make inode64 as the default allocation mode

so the description is wrong. In addition, imaxpct is only
tangentially related to inode allocator behavior, so this section
of the man page is really the wrong place for discussion.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_quota: document unit multipliers used in limit command

The units used to set limits are never specified in the xfs_quota
man page, and in fact for block limits, the standard k/m/g/...
units are accepted. Document all of this.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: add a config file for x86_64 pmem filesystems

We have a handful of users who continually ping the maintainer with
questions about why pmem and dax don't work quite the way they want
(which is to say 2MB extents and PMD mappings) because they copy-pasted
some garbage from Google that's wrong. Encode the correct defaults into
a mkfs config file and ship that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: fix reporting if we can't open raw block devices

The error checking logic for opening the data, log, and rt device is
totally broken. Fix this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_scrub: report optional features in version string

Ted Ts'o reported brittleness in the fstests logic in generic/45[34] to
detect whether or not xfs_scrub is capable of detecting Unicode mischief
in directory and xattr names. This is a compile-time feature, since we
do not assume that all distros will want to ship xfsprogs with libicu.

Rather than relying on ldd tests (which don't work at all if xfs_scrub
is compiled statically), let's have -V print whether or not the feature
is built into the tool. Phase 5 still requires the presence of "UTF-8"
in LC_MESSAGES to enable Unicode confusable detection; this merely makes
the feature easier to discover.

Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: enable inobtcount and bigtime by default

Enable the inode btree counters and large timestamp features by default.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: document sample configuration file location

Update the documentation to note where one can find sample configuration
files. While we're at it, add -c to the topmost list of mkfs.xfs
options.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: add configuration files for the last few LTS kernels

Add some sample mkfs configuration files that capture the mkfs feature
defaults at the time of the release of the last four upstream LTS
kernels.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

mkfs: prevent corruption of passed-in suboption string values

Eric and I were trying to play with mkfs.configuration files, when I
spotted this (with the libini package from Ubuntu 20.04):

# cat << EOF > /tmp/r
[data]
su=2097152
sw=1
EOF
# mkfs.xfs -f -c options=/tmp/r /dev/sda
Parameters parsed from config file /tmp/r successfully
-d su option requires a value

It turns out that libini's parser uses stack variables(!) to store the
value of a key=value pair that it parses, and passes this stack array to
the parse_cfgopt function.  If the particular option calls getstr(),
then we save the value of that pointer (not its contents) to the
cli_params.  Being a stack array, the contents will be overwritten by
other function calls, which means that our value of '2097152' has been
destroyed by the time we actually call getnum when we're validating the
new fs config.

We never noticed this until now because the only other caller was
getsubopt on the argv array, which gets chopped up but left intact in
memory.  The solution is to make a private copy of those strings if we
ever save them for later.  For now we'll be lazy and let the memory
leak, since mkfs is not a long-running process.

Fixes: 33c62516 ("mkfs: add initial ini format config file parsing support")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: update secondary superblocks after changing features

When we add features to an existing filesystem, make sure we update the
secondary superblocks to reflect the new geometry so that if we lose the
primary super in the future, repair will recover correctly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: fix indentation problems in upgrade_filesystem

Indentation is supposed to be tabs, not spaces. Fix that, and unindent
the bwrite clause because do_error aborts the program.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: use format specifier for directory inode numbers in do_warn

Use a format specifier for the ondisk directory inode argument to
do_warn when complaining about corrupt directories. This avoids
build warnings on armv7l.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[sandeen: it's a format specifier not a cast, edit commitlog]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_repair: explicitly cast resource usage counts in do_warn

Explicitly cast the ondisk dquot counter argument to do_warn when
complaining about incorrect quota counts. This avoids build warnings on
ppc64le.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_db: fix nbits parameter in fa_ino[48] functions

Use the proper macro to convert ino4 and ino8 field byte sizes to a bit
count in the functions that navigate shortform directories. This just
happens to work correctly for ino4 entries, but omits the upper 4 bytes
of an ino8 entry. Note that the entries display correctly; it's just
the command "addr u3.sfdir3.list[X].inumber.i8" that won't.

Found by running smatch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs-apply: support filterdiff >= 0.4.2 only

We currently require filterdiff v0.3.4 as a minimum for handling git
based patches. This was the first version to handle git diff
metadata well enough to do patch reformatting. It was, however, very
buggy and required several workarounds to get it to do what we
needed.

However, these bugs have been fixed and on a machine with v0.4.2,
the workarounds result in libxfs-apply breaking and creating corrupt
patches. Rather than try to carry around workarounds for a broken
filterdiff version and one that just works, just increase the
minimum required version to 0.4.2 and remove all the workarounds for
the bugs in 0.3.4.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

misc: add a crc32c self test to mkfs and repair

Enhance mkfs and xfs_repair to run the crc32c self test when they start
up, and refuse to continue if the self test fails. We don't want to
format a filesystem if the checksum algorithm produces incorrect
results, and we especially don't want repair to tear a filesystem apart
because it thinks the checksum is wrong.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libfrog: always use the kernel GETFSMAP definitions

The GETFSMAP ioctl has been a part of the kernel since 4.12. We have no
business shipping a stale copy of kernel header contents in the xfslibs
package, so get rid of it. This means that xfs_scrub now has a hard
dependency on the build system having new kernel headers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: don't leave dangling perag references from xfs_buf

When we're preparing to move a list of xfs_buf(fers) to the freelist, be
sure to detach the perag reference so that we don't leak the reference
or leave dangling pointers. Currently this has no negative effects
since we only call libxfs_bulkrelse while exiting programs, but let's
not be sloppy.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: shut down filesystem if we xfs_trans_cancel with deferred work items

While debugging some very strange rmap corruption reports in connection
with the online directory repair code.  I root-caused the error to the
following incorrect sequence:

<start repair transaction>
<expand directory, causing a deferred rmap to be queued>
<roll transaction>
<cancel transaction>

Obviously, we should have committed the transaction instead of
cancelling it.  Thinking more broadly, however, xfs_trans_cancel should
have warned us that we were throwing away work item that we already
committed to performing.  This is not correct, and we need to shut down
the filesystem.

Change xfs_trans_cancel to complain in the loudest manner if we're
cancelling any transaction with deferred work items attached.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxcmd: use emacs mode for command history editing

Prior to xfsprogs 5.7.0, we built xfsprogs with libreadline support by
default.  In its default configuration, that library interpreted various
keystrokes in a direct manner (e.g. backspace deletes the character to
the left of the cursor), which seems consistent with how emacs behaves.

However, libeditline's default keybindings are consistent with vim,
which means that suddenly users are presented with not the same line
editing interface that they had before.  Since libeditline is
configurable (put "bind -v" in editrc if you really want vim mode),
let's put things back the way they were.  At least as much as we can.

Fixes: bbe12eb9 ("xfsprogs: remove libreadline support")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfsprogs: Release v5.15.0-rc0

Update all the necessary files for a 5.15.0-rc0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: rename buffer cache index variable b_bn

To stop external users from using b_bn as the disk address of the
buffer, rename it to b_rhash_key to indicate that it is the buffer
cache index, not the block number of the buffer. Code that needs the
disk address should use xfs_buf_daddr() to obtain it.

Inspired-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[sandeen: moved the xfs_buf_daddr changes to prior patch]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: convert bp->b_bn references to xfs_buf_daddr()

Source kernel commit: 9343ee76909e3f6466d85c9ebb0e343cdf54de71

Stop directly referencing b_bn in code outside the buffer cache, as
b_bn is supposed to be used only as an internal cache index.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: introduce xfs_buf_daddr()

Source kernel commit: 04fcad80cd068731a779fb442f78234732683755

Introduce a helper function xfs_buf_daddr() to extract the disk
address of the buffer from the struct xfs_buf. This will replace
direct accesses to bp->b_bn and bp->b_maps[0].bm_bn, as well as
the XFS_BUF_ADDR() macro.

This patch introduces the helper function and replaces all uses of
XFS_BUF_ADDR() as this is just a simple sed replacement.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[sandeen: remove b_maps[0].bm_bn assignment in alloc_write_buf now]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: replace XFS_BUF_SET_ADDR with a function

Replace XFS_BUF_SET_ADDR with a new function that will set the buffer
block number correctly, then port the two users to it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[sandeen: leave b_maps[0].bm_bn until next patch]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: always initialize internal buffer map

The __initbuf function is responsible for initializing the fields of an
xfs_buf.  Buffers are always required to have a mapping, though in the
typical case there's only one mapping, so we can use the internal one.

The single-mapping b_maps init code at the end of the function doesn't
quite get this right though -- if a single-mapping buffer in the cache
was allowed to expire and now is being repurposed, it'll come out with
b_maps == &__b_map, in which case we incorrectly skip initializing the
map.  This has gone unnoticed until now because (AFAICT) the code paths
that use b_maps are the same ones that are called with multi-mapping
buffers, which are initialized correctly.

Anyway, the improperly initialized single-mappings will cause problems
in upcoming patches where we turn b_bn into the cache key and require
the use of b_maps[0].bm_bn for the buffer LBA.  Fix this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: clean up remaining LIBXFS_MOUNT flags

Now that userspace libxfs also uses m_opstate to track operational
state, the LIBXFS_MOUNT_* flags are only used for the flags argument
passed to libxfs_mount(). Update the comment to reflect this, and clean
up the flags and function declaration whiel we're at it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

Get rid of these flags and the m_flags field, since none of them do
anything anymore.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
[sandeen: keep xfs_set_inode_alloc similar to kernel for now]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: use opstate flags and functions for libxfs mount options

Port the three LIBXFS_MOUNT flags that actually do anything to set
opstate flags in preparation for removing m_flags in a later patch.
Retain the LIBXFS_MOUNT #defines so that libxfs clients can pass them
into libxfs_mount.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[sandeen: drop now-unused flags arg from rtmount_init]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: kill xfs_sb_version_has_v3inode()

Source kernel commit: cf28e17c9186c83e7e8702f844bc40b6e782ce6c

All callers to xfs_dinode_good_version() and XFS_DINODE_SIZE() in
both the kernel and userspace have a xfs_mount structure available
which means they can use mount features checks instead looking
directly are the superblock.

Convert these functions to take a mount and use a xfs_has_v3inodes()
check and move it out of the libxfs/xfs_format.h file as it really
doesn't have anything to do with the definition of the on-disk
format.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: introduce xfs_sb_is_v5 helper

Source kernel commit: d6837c1aab42e70141fd3875ba05eb69ffb220f0

Rather than open coding XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5
checks everywhere, add a simple wrapper to encapsulate this and make
the code easier to read.

This allows us to remove the xfs_sb_version_has_v3inode() wrapper
which is only used in xfs_format.h now and is just a version number
check.

There are a couple of places where we should be checking the mount
feature bits rather than the superblock version (e.g. remount), so
those are converted to use xfs_has_crc(mp) instead.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: remove unused xfs_sb_version_has wrappers

Source kernel commit: 2beb7b50ddd429f47b6cabd186b3102d2a6aa505

The vast majority of these wrappers are now unused. Remove them
leaving just the small subset of wrappers that are used to either
add feature bits or make the mount features field setup code
simpler.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs_{copy,db,logprint,repair}: replace xfs_sb_version checks with feature flag checks

Convert the xfs_sb_version_hasfoo() to checks against mp->m_features.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxfs: replace xfs_sb_version checks with feature flag checks

Convert the xfs_sb_version_hasfoo() to checks against mp->m_features.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[sandeen: drop one hunk that wasn't really a conversion]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

libxlog: replace xfs_sb_version checks with feature flag checks

Convert the xfs_sb_version_hasfoo() to checks against mp->m_features.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[sandeen: drop hunk to libxfs/init.c that's not a straight replace]
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: convert xfs_sb_version_has checks to use mount features

Source kernel commit: ebd9027d088b3a4e49d294f79e6cadb7b7a88b28

This is a conversion of the remaining xfs_sb_version_has..(sbp)
checks to use xfs_has_..(mp) feature checks.

This was largely done with a vim replacement macro that did:

:0,$s/xfs_sb_version_has$.*$&$.*$->m_sb/xfs_has_\1\2/g<CR>

A couple of other variants were also used, and the rest touched up
by hand.

$ size -t fs/xfs/built-in.a
text    data     bss     dec     hex filename
before  1127533  311352     484 1439369  15f689 (TOTALS)
after   1125360  311352     484 1437196  15ee0c (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: open code sb verifier feature checks

Source kernel commit: fe08cc5044486096bfb5ce9d3db4e915e53281ea

The superblock verifiers are one of the last places that use the sb
version functions to do feature checks. This are all quite simple
uses, and there aren't many of them so open code them all.

Also, move the good version number check into xfs_sb.c instead of it
being an inline function in xfs_format.h

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: convert xfs_fs_geometry to use mount feature checks

Source kernel commit: 03288b19093b9bcff72f0d5f90c578daf053f759

Reporting filesystem features to userspace is currently superblock
based. Now we have a general mount-based feature infrastructure,
switch to using the xfs_mount rather than the superblock directly.

This reduces the size of the function by over 300 bytes.

$ size -t fs/xfs/built-in.a
text    data     bss     dec     hex filename
before  1127855  311352     484 1439691  15f7cb (TOTALS)
after   1127535  311352     484 1439371  15f68b (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown

Source kernel commit: 75c8c50fa16a23f8ac89ea74834ae8ddd1558d75

Remove the shouty macro and instead use the inline function that
matches other state/feature check wrapper naming. This conversion
was done with sed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: convert remaining mount flags to state flags

Source kernel commit: 2e973b2cd4cdb993be94cca4c33f532f1ed05316

The remaining mount flags kept in m_flags are actually runtime state
flags. These change dynamically, so they really should be updated
atomically so we don't potentially lose an update due to racing
modifications.

Convert these remaining flags to be stored in m_opstate and use
atomic bitops to set and clear the flags. This also adds a couple of
simple wrappers for common state checks - read only and shutdown.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: convert mount flags to features

Source kernel commit: 0560f31a09e523090d1ab2bfe21c69d028c2bdf2

Replace m_flags feature checks with xfs_has_<feature>() calls and
rework the setup code to set flags in m_features.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[sandeen: add small_inums to userspace unsupported features]
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: replace xfs_sb_version checks with feature flag checks

Source kernel commit: 38c26bfd90e1999650d5ef40f90d721f05916643

Convert the xfs_sb_version_hasfoo() to checks against
mp->m_features. Checks of the superblock itself during disk
operations (e.g. in the read/write verifiers and the to/from disk
formatters) are not converted - they operate purely on the
superblock state. Everything else should use the mount features.

Large parts of this conversion were done with sed with commands like
this:

for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
sed -i -e 's/xfs_sb_version_has$.*$(&$.*$->m_sb)/xfs_has_\1(\2)/' $f
done

With manual cleanups for things like "xfs_has_extflgbit" and other
little inconsistencies in naming.

The result is ia lot less typing to check features and an XFS binary
size reduced by a bit over 3kB:

$ size -t fs/xfs/built-in.a
text       data     bss     dec     hex filenam
before  1130866  311352     484 1442702  16038e (TOTALS)
after   1127727  311352     484 1439563  15f74b (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: reflect sb features in xfs_mount

Source kernel commit: a1d86e8dec8c1325d301c9d5594bb794bc428fc3

Currently on-disk feature checks require decoding the superblock
fileds and so can be non-trivial. We have almost 400 hundred
individual feature checks in the XFS code, so this is a significant
amount of code. To reduce runtime check overhead, pre-process all
the version flags into a features field in the xfs_mount at mount
time so we can convert all the feature checks to a simple flag
check.

There is also a need to convert the dynamic feature flags to update
the m_features field. This is required for attr, attr2 and quota
features. New xfs_mount based wrappers are added for this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: rework attr2 feature and mount options

Source kernel commit: e23b55d537c9be60ae918fa6c3be0d699986f346

The attr2 feature is somewhat unique in that it has both a superblock
feature bit to enable it and mount options to enable and disable it.

Back when it was first introduced in 2005, attr2 was disabled unless
either the attr2 superblock feature bit was set, or the attr2 mount
option was set. If the superblock feature bit was not set but the
mount option was set, then when the first attr2 format inode fork
was created, it would set the superblock feature bit. This is as it
should be - the superblock feature bit indicated the presence of the
attr2 on disk format.

The noattr2 mount option, however, did not affect the superblock
feature bit. If noattr2 was specified, the on-disk superblock
feature bit was ignored and the code always just created attr1
format inode forks.  If neither of the attr2 or noattr2 mounts
option were specified, then the behaviour was determined by the
superblock feature bit.

This was all pretty sane.

Fast foward 3 years, and we are dealing with fallout from the
botched sb_features2 addition and having to deal with feature
mismatches between the sb_features2 and sb_bad_features2 fields. The
attr2 feature bit was one of these flags. The reconciliation was
done well after mount option parsing and, unfortunately, the feature
reconciliation had a bug where it ignored the noattr2 mount option.

For reasons lost to the mists of time, it was decided that resolving
this issue in commit 7c12f296500e ("[XFS] Fix up noattr2 so that it
will properly update the versionnum and features2 fields.") required
noattr2 to clear the superblock attr2 feature bit.  This greatly
complicated the attr2 behaviour and broke rules about feature bits
needing to be set when those specific features are present in the
filesystem.

By complicated, I mean that it introduced problems due to feature
bit interactions with log recovery. All of the superblock feature
bit checks are done prior to log recovery, but if we crash after
removing a feature bit, then on the next mount we see the feature
bit in the unrecovered superblock, only to have it go away after the
log has been replayed.  This means our mount time feature processing
could be all wrong.

Hence you can mount with noattr2, crash shortly afterwards, and
mount again without attr2 or noattr2 and still have attr2 enabled
because the second mount sees attr2 still enabled in the superblock
before recovery runs and removes the feature bit. It's just a mess.

Further, this is all legacy code as the v5 format requires attr2 to
be enabled at all times and it cannot be disabled.  i.e. the noattr2
mount option returns an error when used on v5 format filesystems.

To straighten this all out, this patch reverts the attr2/noattr2
mount option behaviour back to the original behaviour. There is no
reason for disabling attr2 these days, so we will only do this when
the noattr2 mount option is set. This will not remove the superblock
feature bit. The superblock bit will provide the default behaviour
and only track whether attr2 is present on disk or not. The attr2
mount option will enable the creation of attr2 format inode forks,
and if the superblock feature bit is not set it will be added when
the first attr2 inode fork is created.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: rename xfs_has_attr()

Source kernel commit: 51b495eba84dee8c1df4abfc26fc134ea190e28f

xfs_has_attr() is poorly named. It has global scope as it is defined
in a header file, but it has no namespace scope that tells us what
it is checking has attributes. It's not even clear what "has_attr"
means, because what it is actually doing is an attribute fork lookup
to see if the attribute exists.

Upcoming patches use this "xfs_has_<foo>" namespace for global
filesystem features, which conflicts with this function.

Rename xfs_has_attr() to xfs_attr_lookup() and make it a static
function, freeing up the "xfs_has_" namespace for global scope
usage.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: sb verifier doesn't handle uncached sb buffer

Source kernel commit: 8cf07f3dd56195316be97758cb8b4e1d7183ea84

The verifier checks explicitly for bp->b_bn == XFS_SB_DADDR to match
the primary superblock buffer, but the primary superblock is an
uncached buffer and so bp->b_bn is always -1ULL. Hence this never
matches and the CRC error reporting is wholly dependent on the
mount superblock already being populated so CRC feature checks pass
and allow CRC errors to be reported.

Fix this so that the primary superblock CRC error reporting is not
dependent on already having read the superblock into memory.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: resolve fork names in trace output

Source kernel commit: f93f85f77aa80f3e4d5bada01248c98da32933c5

Emit whichfork values as text strings in the ftrace output.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: constify btree function parameters that are not modified

Source kernel commit: 32816fd7920b32c24e1720ce387482fb430959fc

Constify the rest of the btree functions that take structure and union
pointers and are not supposed to modify them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: make the start pointer passed to btree update_lastrec functions const

Source kernel commit: 60e265f7f85a3d91c368f9284dc6501fa1f41e50

This btree function is called when updating a record in the rightmost
block of a btree so that we can update the AGF's longest free extent
length field. Neither parameter is supposed to be updated, so mark them
both const.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

xfs: make the start pointer passed to btree alloc_block functions const

Source kernel commit: deb06b9ab6dfa167c280a68d5acb2f12e007073f

The @start pointer passed to each per-AG btree type's ->alloc_block
function isn't supposed to be modified, since it's a hint about the
location of the btree block being split that is to be fed to the
allocator, so mark the parameter const.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>