git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

]> git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

projects / thirdparty / xfsprogs-dev.git / log

Eric Sandeen [Mon, 16 Nov 2020 20:27:37 +0000 (15:27 -0500)]

xfsprogs: Release v5.10.0-rc0

Update all the necessary files for a 5.10.0-rc0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Sat, 14 Nov 2020 17:09:13 +0000 (12:09 -0500)]

xfs: fix rmap key and record comparison functions

Source kernel commit: 6ff646b2ceb0eec916101877f38da0b73e3a5b7f

Keys for extent interval records in the reverse mapping btree are
supposed to be computed as follows:

(physical block, owner, fork, is_btree, is_unwritten, offset)

This provides users the ability to look up a reverse mapping from a bmbt
record -- start with the physical block; then if there are multiple
records for the same block, move on to the owner; then the inode fork
type; and so on to the file offset.

However, the key comparison functions incorrectly remove the
fork/btree/unwritten information that's encoded in the on-disk offset.
This means that lookup comparisons are only done with:

(physical block, owner, offset)

This means that queries can return incorrect results. On consistent
filesystems this hasn't been an issue because blocks are never shared
between forks or with bmbt blocks; and are never unwritten. However,
this bug means that online repair cannot always detect corruption in the
key information in internal rmapbt nodes.

Found by fuzzing keys[1].attrfork = ones on xfs/371.

Fixes: 4b8ed67794fe ("xfs: add rmap btree operations")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Sat, 14 Nov 2020 17:08:47 +0000 (12:08 -0500)]

xfs: fix flags argument to rmap lookup when converting shared file rmaps

Source kernel commit: ea8439899c0b15a176664df62aff928010fad276

Pass the same oldext argument (which contains the existing rmapping's
unwritten state) to xfs_rmap_lookup_le_range at the start of
xfs_rmap_convert_shared. At this point in the code, flags is zero,
which means that we perform lookups using the wrong key.

Fixes: 3f165b334e51 ("xfs: convert unwritten status of reverse mappings for shared files")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:39:58 +0000 (17:39 -0500)]

xfs: set xefi_discard when creating a deferred agfl free log intent item

Source kernel commit: 2c334e12f957cd8c6bb66b4aa3f79848b7c33cab

Make sure that we actually initialize xefi_discard when we're scheduling
a deferred free of an AGFL block. This was (eventually) found by the
UBSAN while I was banging on realtime rmap problems, but it exists in
the upstream codebase. While we're at it, rearrange the structure to
reduce the struct size from 64 to 56 bytes.

Fixes: fcb762f5de2e ("xfs: add bmapi nodiscard flag")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:22:04 +0000 (17:22 -0500)]

xfs: fix high key handling in the rt allocator's query_range function

Source kernel commit: d88850bd5516a77c6f727e8b6cefb64e0cc929c7

Fix some off-by-one errors in xfs_rtalloc_query_range.  The highest key
in the realtime bitmap is always one less than the number of rt extents,
which means that the key clamp at the start of the function is wrong.
The 4th argument to xfs_rtfind_forw is the highest rt extent that we
want to probe, which means that passing 1 less than the high key is
wrong.  Finally, drop the rem variable that controls the loop because we
can compare the iteration point (rtstart) against the high key directly.

The sordid history of this function is that the original commit (fb3c3)
incorrectly passed (high_rec->ar_startblock - 1) as the 'limit' parameter
to xfs_rtfind_forw.  This was wrong because the "high key" is supposed
to be the largest key for which the caller wants result rows, not the
key for the first row that could possibly be outside the range that the
caller wants to see.

A subsequent attempt (8ad56) to strengthen the parameter checking added
incorrect clamping of the parameters to the number of rt blocks in the
system (despite the bitmap functions all taking units of rt extents) to
avoid querying ranges past the end of rt bitmap file but failed to fix
the incorrect _rtfind_forw parameter.  The original _rtfind_forw
parameter error then survived the conversion of the startblock and
blockcount fields to rt extents (a0e5c), and the most recent off-by-one
fix (a3a37) thought it was patching a problem when the end of the rt
volume is not in use, but none of these fixes actually solved the
original problem that the author was confused about the "limit" argument
to xfs_rtfind_forw.

Sadly, all four of these patches were written by this author and even
his own usage of this function and rt testing were inadequate to get
this fixed quickly.

Original-problem: fb3c3de2f65c ("xfs: add a couple of queries to iterate free extents in the rtbitmap")
Not-fixed-by: 8ad560d2565e ("xfs: strengthen rtalloc query range checks")
Not-fixed-by: a0e5c435babd ("xfs: fix xfs_rtalloc_rec units")
Fixes: a3a374bf1889 ("xfs: fix off-by-one error in xfs_rtalloc_query_range")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:21:54 +0000 (17:21 -0500)]

xfs: only relog deferred intent items if free space in the log gets low

Source kernel commit: 74f4d6a1e065c92428c5b588099e307a582d79d9

Now that we have the ability to ask the log how far the tail needs to be
pushed to maintain its free space targets, augment the decision to relog
an intent item so that we only do it if the log has hit the 75% full
threshold. There's no point in relogging an intent into the same
checkpoint, and there's no need to relog if there's plenty of free space
in the log.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:21:44 +0000 (17:21 -0500)]

xfs: periodically relog deferred intent items

Source kernel commit: 4e919af7827a6adfc28e82cd6c4ffcfcc3dd6118

There's a subtle design flaw in the deferred log item code that can lead
to pinning the log tail.  Taking up the defer ops chain examples from
the previous commit, we can get trapped in sequences like this:

Caller hands us a transaction t0 with D0-D3 attached.  The defer ops
chain will look like the following if the transaction rolls succeed:

t1: D0(t0), D1(t0), D2(t0), D3(t0)
t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0)
t3: d5(t1), D1(t0), D2(t0), D3(t0)
...
t9: d9(t7), D3(t0)
t10: D3(t0)
t11: d10(t10), d11(t10)
t12: d11(t10)

In transaction 9, we finish d9 and try to roll to t10 while holding onto
an intent item for D3 that we logged in t0.

The previous commit changed the order in which we place new defer ops in
the defer ops processing chain to reduce the maximum chain length.  Now
make xfs_defer_finish_noroll capable of relogging the entire chain
periodically so that we can always move the log tail forward.  Most
chains will never get relogged, except for operations that generate very
long chains (large extents containing many blocks with different sharing
levels) or are on filesystems with small logs and a lot of ongoing
metadata updates.

Callers are now required to ensure that the transaction reservation is
large enough to handle logging done items and new intent items for the
maximum possible chain length.  Most callers are careful to keep the
chain lengths low, so the overhead should be minimal.

The decision to relog an intent item is made based on whether the intent
was logged in a previous checkpoint, since there's no point in relogging
an intent into the same checkpoint.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:21:40 +0000 (17:21 -0500)]

xfs: change the order in which child and parent defer ops are finished

Source kernel commit: 27dada070d59c28a441f1907d2cec891b17dcb26

The defer ops code has been finishing items in the wrong order -- if a
top level defer op creates items A and B, and finishing item A creates
more defer ops A1 and A2, we'll put the new items on the end of the
chain and process them in the order A B A1 A2.  This is kind of weird,
since it's convenient for programmers to be able to think of A and B as
an ordered sequence where all the sub-tasks for A must finish before we
move on to B, e.g. A A1 A2 D.

Right now, our log intent items are not so complex that this matters,
but this will become important for the atomic extent swapping patchset.
In order to maintain correct reference counting of extents, we have to
unmap and remap extents in that order, and we want to complete that work
before moving on to the next range that the user wants to swap.  This
patch fixes defer ops to satsify that requirement.

The primary symptom of the incorrect order was noticed in an early
performance analysis of the atomic extent swap code.  An astonishingly
large number of deferred work items accumulated when userspace requested
an atomic update of two very fragmented files.  The cause of this was
traced to the same ordering bug in the inner loop of
xfs_defer_finish_noroll.

If the ->finish_item method of a deferred operation queues new deferred
operations, those new deferred ops are appended to the tail of the
pending work list.  To illustrate, say that a caller creates a
transaction t0 with four deferred operations D0-D3.  The first thing
defer ops does is roll the transaction to t1, leaving us with:

t1: D0(t0), D1(t0), D2(t0), D3(t0)

Let's say that finishing each of D0-D3 will create two new deferred ops.
After finish D0 and roll, we'll have the following chain:

t2: D1(t0), D2(t0), D3(t0), d4(t1), d5(t1)

d4 and d5 were logged to t1.  Notice that while we're about to start
work on D1, we haven't actually completed all the work implied by D0
being finished.  So far we've been careful (or lucky) to structure the
dfops callers such that D1 doesn't depend on d4 or d5 being finished,
but this is a potential logic bomb.

There's a second problem lurking.  Let's see what happens as we finish
D1-D3:

t3: D2(t0), D3(t0), d4(t1), d5(t1), d6(t2), d7(t2)
t4: D3(t0), d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3)
t5: d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4)

Let's say that d4-d11 are simple work items that don't queue any other
operations, which means that we can complete each d4 and roll to t6:

t6: d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4)
t7: d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4)
...
t11: d10(t4), d11(t4)
t12: d11(t4)
<done>

When we try to roll to transaction #12, we're holding defer op d11,
which we logged way back in t4.  This means that the tail of the log is
pinned at t4.  If the log is very small or there are a lot of other
threads updating metadata, this means that we might have wrapped the log
and cannot get roll to t11 because there isn't enough space left before
we'd run into t4.

Let's shift back to the original failure.  I mentioned before that I
discovered this flaw while developing the atomic file update code.  In
that scenario, we have a defer op (D0) that finds a range of file blocks
to remap, creates a handful of new defer ops to do that, and then asks
to be continued with however much work remains.

So, D0 is the original swapext deferred op.  The first thing defer ops
does is rolls to t1:

t1: D0(t0)

We try to finish D0, logging d1 and d2 in the process, but can't get all
the work done.  We log a done item and a new intent item for the work
that D0 still has to do, and roll to t2:

t2: D0'(t1), d1(t1), d2(t1)

We roll and try to finish D0', but still can't get all the work done, so
we log a done item and a new intent item for it, requeue D0 a second
time, and roll to t3:

t3: D0''(t2), d1(t1), d2(t1), d3(t2), d4(t2)

If it takes 48 more rolls to complete D0, then we'll finally dispense
with D0 in t50:

t50: D<fifty primes>(t49), d1(t1), ..., d102(t50)

We then try to roll again to get a chain like this:

t51: d1(t1), d2(t1), ..., d101(t50), d102(t50)
...
t152: d102(t50)
<done>

Notice that in rolling to transaction #51, we're holding on to a log
intent item for d1 that was logged in transaction #1.  This means that
the tail of the log is pinned at t1.  If the log is very small or there
are a lot of other threads updating metadata, this means that we might
have wrapped the log and cannot roll to t51 because there isn't enough
space left before we'd run into t1.  This is of course problem #2 again.

But notice the third problem with this scenario: we have 102 defer ops
tied to this transaction!  Each of these items are backed by pinned
kernel memory, which means that we risk OOM if the chains get too long.

Yikes.  Problem #1 is a subtle logic bomb that could hit someone in the
future; problem #2 applies (rarely) to the current upstream, and problem
#3 applies to work under development.

This is not how incremental deferred operations were supposed to work.
The dfops design of logging in the same transaction an intent-done item
and a new intent item for the work remaining was to make it so that we
only have to juggle enough deferred work items to finish that one small
piece of work.  Deferred log item recovery will find that first
unfinished work item and restart it, no matter how many other intent
items might follow it in the log.  Therefore, it's ok to put the new
intents at the start of the dfops chain.

For the first example, the chains look like this:

t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0)
t3: d5(t1), D1(t0), D2(t0), D3(t0)
...
t9: d9(t7), D3(t0)
t10: D3(t0)
t11: d10(t10), d11(t10)
t12: d11(t10)

For the second example, the chains look like this:

t1: D0(t0)
t2: d1(t1), d2(t1), D0'(t1)
t3: d2(t1), D0'(t1)
t4: D0'(t1)
t5: d1(t4), d2(t4), D0''(t4)
...
t148: D0<50 primes>(t147)
t149: d101(t148), d102(t148)
t150: d102(t148)
<done>

This actually sucks more for pinning the log tail (we try to roll to t10
while holding an intent item that was logged in t1) but we've solved
problem #1.  We've also reduced the maximum chain length from:

sum(all the new items) + nr_original_items

to:

max(new items that each original item creates) + nr_original_items

This solves problem #3 by sharply reducing the number of defer ops that
can be attached to a transaction at any given time.  The change makes
the problem of log tail pinning worse, but is improvement we need to
solve problem #2.  Actually solving #2, however, is left to the next
patch.

Note that a subsequent analysis of some hard-to-trigger reflink and COW
livelocks on extremely fragmented filesystems (or systems running a lot
of IO threads) showed the same symptoms -- uncomfortably large numbers
of incore deferred work items and occasional stalls in the transaction
grant code while waiting for log reservations.  I think this patch and
the next one will also solve these problems.

As originally written, the code used list_splice_tail_init instead of
list_splice_init, so change that, and leave a short comment explaining
our actions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:21:30 +0000 (17:21 -0500)]

xfs: fix an incore inode UAF in xfs_bui_recover

Source kernel commit: ff4ab5e02a0447dd1e290883eb6cd7d94848e590

In xfs_bui_item_recover, there exists a use-after-free bug with regards
to the inode that is involved in the bmap replay operation.  If the
mapping operation does not complete, we call xfs_bmap_unmap_extent to
create a deferred op to finish the unmapping work, and we retain a
pointer to the incore inode.

Unfortunately, the very next thing we do is commit the transaction and
drop the inode.  If reclaim tears down the inode before we try to finish
the defer ops, we dereference garbage and blow up.  Therefore, create a
way to join inodes to the defer ops freezer so that we can maintain the
xfs_inode reference until we're done with the inode.

Note: This imposes the requirement that there be enough memory to keep
every incore inode in memory throughout recovery.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:21:20 +0000 (17:21 -0500)]

xfs: xfs_defer_capture should absorb remaining transaction reservation

Source kernel commit: 929b92f64048d90d23e40a59c47adf59f5026903

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should record the transaction reservation type
from the old transaction so that when we continue the dfops chain, we
still use the same reservation parameters.

Doing this means that the log item recovery functions get to determine
the transaction reservation instead of abusing tr_itruncate in yet
another part of xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:21:10 +0000 (17:21 -0500)]

xfs: xfs_defer_capture should absorb remaining block reservations

Source kernel commit: 4f9a60c48078c0efa3459678fa8d6e050e8ada5d

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should record the remaining block reservations so
that when we continue the dfops chain, we can reserve the same number of
blocks to use. We capture the reservations for both data and realtime
volumes.

This adds the requirement that every log intent item recovery function
must be careful to reserve enough blocks to handle both itself and all
defer ops that it can queue. On the other hand, this enables us to do
away with the handwaving block estimation nonsense that was going on in
xlog_finish_defer_ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:21:01 +0000 (17:21 -0500)]

xfs: proper replay of deferred ops queued during log recovery

Source kernel commit: e6fff81e487089e47358a028526a9f63cdbcd503

When we replay unfinished intent items that have been recovered from the
log, it's possible that the replay will cause the creation of more
deferred work items.  As outlined in commit 509955823cc9c ("xfs: log
recovery should replay deferred ops in order"), later work items have an
implicit ordering dependency on earlier work items.  Therefore, recovery
must replay the items (both recovered and created) in the same order
that they would have been during normal operation.

For log recovery, we enforce this ordering by using an empty transaction
to collect deferred ops that get created in the process of recovering a
log intent item to prevent them from being committed before the rest of
the recovered intent items.  After we finish committing all the
recovered log items, we allocate a transaction with an enormous block
reservation, splice our huge list of created deferred ops into that
transaction, and commit it, thereby finishing all those ops.

This is /really/ hokey -- it's the one place in XFS where we allow
nested transactions; the splicing of the defer ops list is is inelegant
and has to be done twice per recovery function; and the broken way we
handle inode pointers and block reservations cause subtle use-after-free
and allocator problems that will be fixed by this patch and the two
patches after it.

Therefore, replace the hokey empty transaction with a structure designed
to capture each chain of deferred ops that are created as part of
recovering a single unfinished log intent.  Finally, refactor the loop
that replays those chains to do so using one transaction per chain.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:20:57 +0000 (17:20 -0500)]

xfs: remove xfs_defer_reset

Source kernel commit: b80b29d602a8879829fbf89115e9e6877806a2da

Remove this one-line helper since the assert is trivially true in one
call site and the rest obscures a bitmask operation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:20:52 +0000 (17:20 -0500)]

xfs: avoid shared rmap operations for attr fork extents

Source kernel commit: d7884e6e90da974b50dc2c3bf50e03b70750e5f1

During code review, I noticed that the rmap code uses the (slower)
shared mappings rmap functions for any extent of a reflinked file, even
if those extents are for the attr fork, which doesn't support sharing.
We can speed up rmap a tiny bit by optimizing out this case.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Kaixu Xia [Thu, 12 Nov 2020 22:20:42 +0000 (17:20 -0500)]

xfs: code cleanup in xfs_attr_leaf_entsize_{remote,local}

Source kernel commit: 61ef5230518a3ad224549a50a01b73989acb94b9

Cleanup the typedef usage, the unnecessary parentheses, the unnecessary
backslash and use the open-coded round_up call in
xfs_attr_leaf_entsize_{remote,local}.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Kaixu Xia [Thu, 12 Nov 2020 22:20:38 +0000 (17:20 -0500)]

xfs: remove the redundant crc feature check in xfs_attr3_rmt_verify

Source kernel commit: 3feb4ffbf69321284dc78ac6ca43b4a2afadf243

We already check whether the crc feature is enabled before calling
xfs_attr3_rmt_verify(), so remove the redundant feature check in that
function.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Kaixu Xia [Thu, 12 Nov 2020 22:20:28 +0000 (17:20 -0500)]

xfs: fix some comments

Source kernel commit: a647d109e08ac0961ca0fd511b013d962d256987

Fix the comments to help people understand the code.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
[darrick: fix the indenting problems too]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Kaixu Xia [Thu, 12 Nov 2020 22:20:18 +0000 (17:20 -0500)]

xfs: use the existing type definition for di_projid

Source kernel commit: 9c0fce4c16fc8d4d119cc3a20f1e5ce870206706

We have already defined the project ID type prid_t, so maybe should
use it here.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:20:08 +0000 (17:20 -0500)]

xfs: log new intent items created as part of finishing recovered intent items

Source kernel commit: 93293bcbde93567efaf4e6bcd58cad270e1fcbf5

During a code inspection, I found a serious bug in the log intent item
recovery code when an intent item cannot complete all the work and
decides to requeue itself to get that done.  When this happens, the
item recovery creates a new incore deferred op representing the
remaining work and attaches it to the transaction that it allocated.  At
the end of _item_recover, it moves the entire chain of deferred ops to
the dummy parent_tp that xlog_recover_process_intents passed to it, but
fail to log a new intent item for the remaining work before committing
the transaction for the single unit of work.

xlog_finish_defer_ops logs those new intent items once recovery has
finished dealing with the intent items that it recovered, but this isn't
sufficient.  If the log is forced to disk after a recovered log item
decides to requeue itself and the system goes down before we call
xlog_finish_defer_ops, the second log recovery will never see the new
intent item and therefore has no idea that there was more work to do.
It will finish recovery leaving the filesystem in a corrupted state.

The same logic applies to /any/ deferred ops added during intent item
recovery, not just the one handling the remaining work.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 22:19:52 +0000 (17:19 -0500)]

xfs: don't free rt blocks when we're doing a REMAP bunmapi call

Source kernel commit: 8df0fa39bdd86ca81a8d706a6ed9d33cc65ca625

When callers pass XFS_BMAPI_REMAP into xfs_bunmapi, they want the extent
to be unmapped from the given file fork without the extent being freed.
We do this for non-rt files, but we forgot to do this for realtime
files. So far this isn't a big deal since nobody makes a bunmapi call
to a rt file with the REMAP flag set, but don't leave a logic bomb.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Carlos Maiolino [Thu, 12 Nov 2020 22:18:52 +0000 (17:18 -0500)]

xfs: Convert xfs_attr_sf macros to inline functions

Source kernel commit: e01b7eed5d0a9b101da53701e92136c3985998af

xfs_attr_sf_totsize() requires access to xfs_inode structure, so, once
xfs_attr_shortform_addname() is its only user, move it to xfs_attr.c
instead of playing with more #includes.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Carlos Maiolino [Thu, 12 Nov 2020 21:50:12 +0000 (16:50 -0500)]

xfs: Use variable-size array for nameval in xfs_attr_sf_entry

Source kernel commit: c418dbc9805dbd215586454f0c5729333219aa63

nameval is a variable-size array, so, define it as it, and remove all
the -1 magic number subtractions

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Carlos Maiolino [Thu, 12 Nov 2020 21:50:02 +0000 (16:50 -0500)]

xfs: Remove typedef xfs_attr_shortform_t

Source kernel commit: 47e6cc100054c8c6b809e25c286a2fd82e82bcb7

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Carlos Maiolino [Thu, 12 Nov 2020 21:49:52 +0000 (16:49 -0500)]

xfs: remove typedef xfs_attr_sf_entry_t

Source kernel commit: 6337c84466c250d5da797bc5d6941c501d500e48

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 21:49:42 +0000 (16:49 -0500)]

xfs: widen ondisk quota expiration timestamps to handle y2038+

Source kernel commit: 4ea1ff3b49681af45a4a8c14baf7f0b3d11aa74a

Enable the bigtime feature for quota timers. We decrease the accuracy
of the timers to ~4s in exchange for being able to set timers up to the
bigtime maximum.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 12 Nov 2020 01:08:14 +0000 (20:08 -0500)]

xfs: widen ondisk inode timestamps to deal with y2038+

Source kernel commit: f93e5436f0ee5a85eaa3a86d2614d215873fb18b

Redesign the ondisk inode timestamps to be a simple unsigned 64-bit
counter of nanoseconds since 14 Dec 1901 (i.e. the minimum time in the
32-bit unix time epoch). This enables us to handle dates up to 2486,
which solves the y2038 problem.

sandeen: update xfs_flags2diflags2() as well, to match

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 11 Nov 2020 18:48:47 +0000 (13:48 -0500)]

xfs: redefine xfs_ictimestamp_t

Source kernel commit: 30e05599219f3c15bd5f24190af0e33cdb4a00e5

Redefine xfs_ictimestamp_t as a uint64_t typedef in preparation for the
bigtime functionality. Preserve the legacy structure format so that we
can let the compiler take care of the masking and shifting.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 10 Nov 2020 21:29:40 +0000 (16:29 -0500)]

xfs: redefine xfs_timestamp_t

Source kernel commit: 5a0bb066f60fa02f453d7721844eae59f505c06e

Redefine xfs_timestamp_t as a __be64 typedef in preparation for the
bigtime functionality. Preserve the legacy structure format so that we
can let the compiler take care of masking and shifting.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 10 Nov 2020 20:13:50 +0000 (15:13 -0500)]

xfs: move xfs_log_dinode_to_disk to the log recovery code

Source kernel commit: 88947ea0ba713c9b74b212755b3b58242f0e7a56

Move this function to xfs_inode_item_recover.c since there's only one
caller of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 10 Nov 2020 20:12:50 +0000 (15:12 -0500)]

xfs: refactor quota timestamp coding

Source kernel commit: 9f99c8fe551a056c0929dff13cbce62b6b150156

Refactor quota timestamp encoding and decoding into helper functions so
that we can add extra behavior in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 10 Nov 2020 20:11:43 +0000 (15:11 -0500)]

xfs: refactor default quota grace period setting code

Source kernel commit: ccc8e771aa7a80eb047fc263780816ca76dd02a6

Refactor the code that sets the default quota grace period into a helper
function so that we can override the ondisk behavior later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 10 Nov 2020 20:11:09 +0000 (15:11 -0500)]

xfs: refactor quota expiration timer modification

Source kernel commit: 11d8a9190275855f79d62093d789e962cc7228fb

Define explicit limits on the range of quota grace period expiration
timeouts and refactor the code that modifies the timeouts into helpers
that clamp the values appropriately. Note that we'll refactor the
default grace period timer separately.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 10 Nov 2020 20:11:09 +0000 (15:11 -0500)]

xfs: explicitly define inode timestamp range

Source kernel commit: 876fdc7c4f366a709ac272ef3336ae7dce58f2af

Formally define the inode timestamp ranges that existing filesystems
support, and switch the vfs timetamp ranges to use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 10 Nov 2020 20:11:09 +0000 (15:11 -0500)]

xfs: support inode btree blockcounts in online repair

Source kernel commit: 11f744234f052922db4ed77dad35862b3d3164cf

Add the necessary bits to the online repair code to support logging the
inode btree counters when rebuilding the btrees, and to support fixing
the counters when rebuilding the AGI.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 10 Nov 2020 20:11:09 +0000 (15:11 -0500)]

xfs: use the finobt block counts to speed up mount times

Source kernel commit: 1ac35f061af011442eeb731632f6daae991ecf7c

Now that we have reliable finobt block counts, use them to speed up the
per-AG block reservation calculations at mount time.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 10 Nov 2020 20:11:09 +0000 (15:11 -0500)]

xfs: store inode btree block counts in AGI header

Source kernel commit: 2a39946c984464e4aac82c556ba9915589be7323

Add a btree block usage counters for both inode btrees to the AGI header
so that we don't have to walk the entire finobt at mount time to create
the per-AG reservations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Christoph Hellwig [Tue, 10 Nov 2020 20:11:05 +0000 (15:11 -0500)]

xfs: simplify xfs_trans_getsb

Source kernel commit: cead0b10f557a2331e0e131ce52aaf7ed7f5355f

Remove the mp argument as this function is only called in transaction
context, and open code xfs_getsb given that the function already accesses
the buffer pointer in the mount point directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Christoph Hellwig [Tue, 10 Nov 2020 19:52:42 +0000 (14:52 -0500)]

xfs: remove xlog_recover_iodone

Source kernel commit: 22c10589a10baea8e4695fcf38437b89a41e70a1

The log recovery I/O completion handler does not substancially differ from
the normal one except for the fact that it:

a) never retries failed writes
b) can have log items that aren't on the AIL
c) never has inode/dquot log items attached and thus don't need to
handle them

Add conditionals for (a) and (b) to the ioend code, while (c) doesn't
need special handling anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Christoph Hellwig [Tue, 10 Nov 2020 19:52:42 +0000 (14:52 -0500)]

xfs: move the buffer retry logic to xfs_buf.c

Source kernel commit: 664ffb8a429a800c51964b94c15c6a92c8d8334c

Move the buffer retry state machine logic to xfs_buf.c and call it once
from xfs_ioend instead of duplicating it three times for the three kinds
of buffers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Carlos Maiolino [Tue, 10 Nov 2020 19:52:42 +0000 (14:52 -0500)]

xfs: remove kmem_realloc()

Source kernel commit: 771915c4f68889b8c41092a928c604c9cd279927

Remove kmem_realloc() function and convert its users to use MM API
directly (krealloc())

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 10 Nov 2020 19:52:31 +0000 (14:52 -0500)]

libxfs: refactor NSEC_PER_SEC

Clean up all the open-coded and duplicate definitions of time unit
conversion factors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 10 Nov 2020 17:05:32 +0000 (12:05 -0500)]

libxfs: create a real struct timespec64

Create a real struct timespec64 that supports 64-bit seconds counts.
The C library struct timespec doesn't support this on 32-bit
architectures and we cannot lose the upper bits in the incore inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Eric Sandeen [Tue, 20 Oct 2020 15:41:03 +0000 (11:41 -0400)]

xfsprogs: Release v5.9.0

Update all the necessary files for a 5.9.0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Eric Sandeen [Tue, 13 Oct 2020 16:30:33 +0000 (12:30 -0400)]

xfsprogs: Release v5.9.0-rc1

Update all the necessary files for a 5.9.0-rc1 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Gao Xiang [Mon, 12 Oct 2020 19:40:02 +0000 (15:40 -0400)]

xfsprogs: allow i18n to xfs printk

In preparation to a common stripe validation helper,
allow i18n to xfs_{notice,warn,err,alert} so that
xfsprogs can share code with kernel.

Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Ian Kent [Mon, 12 Oct 2020 19:40:01 +0000 (15:40 -0400)]

xfsprogs: ignore autofs mount table entries

Some of the xfsprogs utilities read the mount table via. getmntent(3).

The mount table may contain (almost always these days since /etc/mtab is
symlinked to /proc/self/mounts) autofs mount entries. During processing
of the mount table entries statfs(2) can be called on mount point paths
which will trigger an automount if those entries are direct or offset
autofs mount triggers (indirect autofs mounts aren't affected).

This can be a problem when there are a lot of autofs direct or offset
mounts because real mounts will be triggered when statfs(2) is called.
This can be particularly bad if the triggered mounts are NFS mounts and
the server is unavailable leading to lengthy boot times or worse.

Simply ignoring autofs mount entries during getmentent(3) traversals
avoids the statfs() call that triggers these mounts. If there are
automounted mounts (real mounts) at the time of reading the mount table
these will still be seen in the list so they will be included if that
actually matters to the reader.

Recent glibc getmntent(3) can ignore autofs mounts but that requires the
autofs user to configure autofs to use the "ignore" pseudo mount option
for autofs mounts. But this isn't yet the autofs default (to prevent
unexpected side effects) so that can't be used.

The autofs direct and offset automount triggers are pseudo file system
mounts and are more or less useless in terms on file system information
so excluding them doesn't sacrifice useful file system information
either.

Consequently excluding autofs mounts shouldn't have any adverse side
effects.

Changes since v1:
- drop hunk from fsr/xfs_fsr.c.

Signed-off-by: Ian Kent <raven@themaw.net>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Eric Sandeen [Mon, 12 Oct 2020 19:40:01 +0000 (15:40 -0400)]

xfsprogs: fix ioctl_xfs_geometry manpage naming

Somehow "fsop_/FSOP_" snuck into this manpage's name, filename, and
ioctl name. It's not XFS_IOC_FSOP_GEOMETRY, it's XFS_IOC_FSGEOMETRY
so change all references, including the man page name, filename, and
references from xfsctl(3).

(the structure and flags do have the fsop_ string, which certainly
makes this a bit confusing)

Fixes: b427c816847e ("man: create a separate GEOMETRY ioctl manpage")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Mon, 12 Oct 2020 19:40:01 +0000 (15:40 -0400)]

xfs_repair: coordinate parallel updates to the rt bitmap

Actually take the rt lock before updating the bitmap from multiple
threads. This fixes an infrequent corruption problem when running
generic/013 and rtinherit=1 is set on the root dir.

Fixes: 2556c98bd9e6 ("Perform true sequential bulk read prefetching in xfs_repair Merge of master-melb:xfs-cmds:29147a by kenmcd.")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Mon, 12 Oct 2020 19:39:52 +0000 (15:39 -0400)]

xfs_scrub: don't use statvfs to collect filesystem summary counts

The function scrub_scan_estimate_blocks naïvely uses the statvfs counts
to estimate the size and free blocks on the data volume. Unfortunately,
it fails to account for the fact that statvfs can return the size and
free counts for the realtime volume if the root directory has the
rtinherit flag set, which leads to phase 7 reporting totally absurd
quantities.

Eric pointed out a further problem with statvfs, which is that the file
counts are clamped to the current user's project quota inode limits.
Therefore, we must not use statvfs for querying the filesystem summary
counts.

The XFS_IOC_FSCOUNTS ioctl returns all the data we need, so use that
instead.

Fixes: 604dd3345f35 ("xfs_scrub: filesystem counter collection functions")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Mon, 12 Oct 2020 15:59:19 +0000 (11:59 -0400)]

libhandle: fix potential unterminated string problem

gcc 10.2 complains about the strncpy call here, since it's possible that
the source string is so long that the fspath inside the fdhash structure
will end up without a null terminator. Work around strncpy braindamage
yet again by forcing the string to be terminated properly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Mon, 12 Oct 2020 15:59:19 +0000 (11:59 -0400)]

libfrog: fix a potential null pointer dereference

Apparently, gcc 10.2 thinks that it's possible for either of the calloc
arguments to be zero here, in which case it will return NULL with a zero
errno. I suppose it's possible to do that via integer overflow in the
macro, though I find it unlikely unless someone passes in a yuuuge value.

Nevertheless, just shut up the warning by hardcoding the error number
so I can move on to nastier bugs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 30 Sep 2020 16:45:13 +0000 (12:45 -0400)]

libxfs: disallow filesystems with reverse mapping and reflink and realtime

Neither the kernel nor the code in xfsprogs support filesystems that
have (either reverse mapping btrees or reflink) enabled and a realtime
volume configured. The kernel rejects such combinations and mkfs
refuses to format such a config, but xfsprogs doesn't check and can do
Bad Things, so port those checks before someone shreds their filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Anthony Iliopoulos [Wed, 30 Sep 2020 15:10:16 +0000 (11:10 -0400)]

mkfs: remove a couple of unused function parameters

initialise_mount does not use mkfs_params, and initialise_ag_headers
does not use the xfs_sb param, remove them.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Eric Sandeen [Wed, 30 Sep 2020 15:10:09 +0000 (11:10 -0400)]

mkfs.xfs: remove comment about needed future work

Remove comment about the need to sync this function with the
kernel; that was mostly taken care of with:

7b7548052 ("mkfs: use libxfs to write out new AGs")

There's maybe a little more samey-samey that we could do here,
but it's not egregiously cut & pasted as it was before.

Fixes: 7b7548052d12 ("mkfs: use libxfs to write out new AGs")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 30 Sep 2020 14:59:15 +0000 (10:59 -0400)]

xfs_repair: don't flag RTINHERIT files when no rt volume

Don't flag directories with the RTINHERIT flag set when the filesystem
doesn't have a realtime volume configured. The kernel has let us set
RTINHERIT without a rt volume for ages, so it's not an invalid state.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 30 Sep 2020 14:59:15 +0000 (10:59 -0400)]

mkfs: don't allow creation of realtime files from a proto file

If someone runs mkfs with rtinherit=1, a realtime volume configured, and
a protofile that creates a regular file in the filesystem, mkfs will
error out with "Function not implemented" because userspace doesn't know
how to allocate extents from the rt bitmap. Catch this specific case
and hand back a somewhat nicer explanation of what happened.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 30 Sep 2020 14:59:15 +0000 (10:59 -0400)]

libxfs: don't propagate RTINHERIT -> REALTIME when there is no rtdev

When creating a file inside a directory that has RTINHERIT set, only
propagate the REALTIME flag to the file if the filesystem actually has a
realtime volume configured. Otherwise, we end up writing inodes that
trip the verifiers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 30 Sep 2020 14:59:15 +0000 (10:59 -0400)]

libxfs: refactor inode flags propagation code

Hoist the code that propagates di_flags from a parent to a new child
into a separate function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 30 Sep 2020 14:59:15 +0000 (10:59 -0400)]

mkfs: set required parts of the realtime geometry before computing log geometry

The minimum log size depends on the transaction reservation sizes, which
in turn depend on the realtime device geometry. Therefore, we need to
set up some of the rt geometry before we can compute the real minimum
log size.

This fixes a problem where mkfs, given a small data device and a
realtime volume, formats a filesystem with a log that is too small to
pass the mount time log size checks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 30 Sep 2020 14:59:15 +0000 (10:59 -0400)]

mkfs: fix reflink/rmap logic w.r.t. realtime devices and crc=0 support

mkfs has some logic to deal with situations where reflink or rmapbt are
turned on and the administrator has configured a realtime device or a V4
filesystem; such configurations are not allowed.

The logic ought to disable reflink and/or rmapbt if they're enabled due
to being the defaults, and it ought to complain and abort if they're
enabled because the admin explicitly turned them on.

Unfortunately, the logic here doesn't do that and makes no sense at all
since usage() exits the program. Fix it to follow what everything else
does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 30 Sep 2020 14:59:15 +0000 (10:59 -0400)]

mkfs.xfs: tweak wording of external log device size complaint

If the external log device is too small to satisfy minimum requirements,
mkfs will complain about "external log device 512 too small...". That
doesn't make any sense, so add a few missing words to clarify what we're
talking about:

"external log device size 512 blocks too small..."

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 30 Sep 2020 14:59:15 +0000 (10:59 -0400)]

man: install all manpages that redirect to another manpage

Some of the ioctl manpages do not contain any information other than a
pointer to a different manpage. These aren't picked up by the install
scripts, so fix them so that they do.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 30 Sep 2020 14:59:15 +0000 (10:59 -0400)]

xfs_repair: use libxfs_verify_rtbno to verify rt extents

Use the existing realtime block validation function to check the first
and last block of an extent in a realtime file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Wed, 30 Sep 2020 14:59:05 +0000 (10:59 -0400)]

xfs_repair: throw away totally bad clusters

If the filesystem supports sparse inodes, we detect that an entire
cluster buffer has no detectable inodes at all, and we can easily mark
that part of the inode chunk sparse, just drop the cluster buffer and
forget about it. This makes repair less likely to go to great lengths
to try to save something that's totally unsalvageable.

This manifested in recs[2].free=zeroes in xfs/364, wherein the root
directory claimed to own block X and the inobt also claimed that X was
inodes; repair tried to create rmaps for both owners, and then the whole
mess blew up because the rmap code aborts on those kinds of anomalies.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Mon, 28 Sep 2020 21:35:37 +0000 (17:35 -0400)]

xfs_repair: fix handling of data blocks colliding with existing metadata

Prior to commit a406779bc8d8, any blocks in a data fork extent that
collided with existing blocks would cause the entire data fork extent to
be rejected.  Unfortunately, the patch to add data block sharing support
suppressed checking for any collision, including metadata.  What we
really wanted to do here during a check_dups==1 scan is to is check for
specific collisions and without updating the block mapping data.

So, move the check_dups test after the for-switch construction.  This
re-enables detecting collisions between data fork blocks and a
previously scanned chunk of metadata, and improves the specificity of
the error message that results.

This was found by fuzzing recs[2].free=zeroes in xfs/364, though this
patch alone does not solve all the problems that scenario presents.

Fixes: a406779bc8d8 ("xfs_repair: handle multiple owners of data blocks")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Mon, 28 Sep 2020 21:35:37 +0000 (17:35 -0400)]

xfs_repair: complain about unwritten extents when they're not appropriate

We don't allow unwritten extents in the attr fork, and we don't allow
them in the data fork except for regular files. Check that this is the
case.

Found by manually fuzzing the extentflag field of an attr fork to one.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Mon, 28 Sep 2020 21:35:37 +0000 (17:35 -0400)]

xfs_repair: junk corrupt xattr root blocks

If reading the root block of an extended attribute structure fails due
to a corruption error, we should junk the block since we know it's bad.
There's no point in moving on to the (rather insufficient) checks in the
attr code.

Found by fuzzing hdr.freemap[1].base = ones in xfs/400.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Mon, 28 Sep 2020 21:35:37 +0000 (17:35 -0400)]

xfs_repair: fix error in process_sf_dir2_fixi8

The goal of process_sf_dir2_fixi8 is to convert an i8 shortform
directory into a (shorter) i4 shortform directory. It achieves this by
duplicating the old sf directory contents (as oldsfp), zeroing i8count
in the caller's directory buffer (i.e. newsfp/sfp), and reinitializing
the new directory with the old directory's entries.

Unfortunately, it copies the parent pointer from sfp (the buffer we've
already started changing), not oldsfp. This leads to directory
corruption since at that point we zeroed i8count, which means that we
save only the upper four bytes from the parent pointer entry.

This was found by fuzzing u3.sfdir3.hdr.i8count = ones in xfs/384.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Mon, 28 Sep 2020 21:35:37 +0000 (17:35 -0400)]

xfs_repair: don't crash on partially sparse inode clusters

While running xfs/364 to fuzz the middle bit of recs[2].holemask, I
observed a crash in xfs_repair stemming from the fact that each sparse
bit accounts for 4 inodes, but inode cluster buffers can map to more
than four inodes.

When the first inode in an inode cluster is marked sparse,
process_inode_chunk won't try to load the inode cluster buffer.
Unfortunately, if the holemask indicates that there are inodes present
anywhere in the rest of the cluster buffer, repair will try to check the
corresponding cluster buffer, even if we didn't load it. This leads to
a null pointer dereference, which crashes repair.

Avoid the null pointer dereference by marking the inode sparse and
moving on to the next inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Pavel Reichl [Mon, 28 Sep 2020 21:31:18 +0000 (17:31 -0400)]

mkfs.xfs: fix ASSERT on too-small device with stripe geometry

When a too-small device is created with stripe geometry, we hit an
assert in align_ag_geometry():

mkfs.xfs: xfs_mkfs.c:2834: align_ag_geometry: Assertion `cfg->agcount != 0' failed.

This is because align_ag_geometry() finds that the size of the last
(only) AG is too small, and attempts to trim it off. Obviously 0
AGs is invalid, and we hit the ASSERT.

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Pavel Reichl <preichl@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Eric Sandeen [Fri, 18 Sep 2020 18:23:53 +0000 (14:23 -0400)]

xfsprogs: Release v5.9.0-rc0

Update all the necessary files for a 5.9.0-rc0 release.

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Thu, 17 Sep 2020 14:16:02 +0000 (10:16 -0400)]

xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files

Source kernel commit: d0c20d38af135b2b4b90aa59df7878ef0c8fbef4

The realtime flag only applies to the data fork, so don't use the
realtime block number checks on the attr fork of a realtime file.

Fixes: 30b0984d9117 ("xfs: refactor bmap record validation")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Darrick J. Wong [Tue, 15 Sep 2020 19:59:38 +0000 (15:59 -0400)]

xfs: initialize the shortform attr header padding entry

Source kernel commit: 125eac243806e021f33a1fdea3687eccbb9f7636

Don't leak kernel memory contents into the shortform attr fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Eric Sandeen [Tue, 15 Sep 2020 19:59:38 +0000 (15:59 -0400)]

xfs: fix boundary test in xfs_attr_shortform_verify

Source kernel commit: f4020438fab05364018c91f7e02ebdd192085933

The boundary test for the fixed-offset parts of xfs_attr_sf_entry in
xfs_attr_shortform_verify is off by one, because the variable array
at the end is defined as nameval[1] not nameval[].
Hence we need to subtract 1 from the calculation.

This can be shown by:

# touch file
# setfattr -n root.a file

and verifications will fail when it's written to disk.

This only matters for a last attribute which has a single-byte name
and no value, otherwise the combination of namelen & valuelen will
push endp further out and this test won't fail.

Fixes: 1e1bbd8e7ee06 ("xfs: create structure verifier function for shortform xattrs")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Brian Foster [Tue, 15 Sep 2020 19:59:38 +0000 (15:59 -0400)]

xfs: fix off-by-one in inode alloc block reservation calculation

Source kernel commit: 657f101930bc6c5b41bd7d6c22565c4302a80d33

The inode chunk allocation transaction reserves inobt_maxlevels-1
blocks to accommodate a full split of the inode btree. A full split
requires an allocation for every existing level and a new root
block, which means inobt_maxlevels is the worst case block
requirement for a transaction that inserts to the inobt. This can
lead to a transaction block reservation overrun when tmpfile
creation allocates an inode chunk and expands the inobt to its
maximum depth. This problem has been observed in conjunction with
overlayfs, which makes frequent use of tmpfiles internally.

The existing reservation code goes back as far as the Linux git repo
history (v2.6.12). It was likely never observed as a problem because
the traditional file/directory creation transactions also include
worst case block reservation for directory modifications, which most
likely is able to make up for a single block deficiency in the inode
allocation portion of the calculation. tmpfile support is relatively
more recent (v3.15), less heavily used, and only includes the inode
allocation block reservation as tmpfiles aren't linked into the
directory tree on creation.

Fix up the inode alloc block reservation macro and a couple of the
block allocator minleft parameters that enforce an allocation to
leave enough free blocks in the AG for a full inobt split.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Jan Kara [Tue, 15 Sep 2020 19:59:38 +0000 (15:59 -0400)]

writeback: Drop I_DIRTY_TIME_EXPIRE

Source kernel commit: 5fcd57505c002efc5823a7355e21f48dd02d5a51

The only use of I_DIRTY_TIME_EXPIRE is to detect in
__writeback_single_inode() that inode got there because flush worker
decided it's time to writeback the dirty inode time stamps (either
because we are syncing or because of age). However we can detect this
directly in __writeback_single_inode() and there's no need for the
strange propagation with I_DIRTY_TIME_EXPIRE flag.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree

Randy Dunlap [Tue, 15 Sep 2020 19:59:38 +0000 (15:59 -0400)]

xfs: delete duplicated words + other fixes

Source kernel commit: b63da6c8dfa9b2ab3554e8c59ef294d1f28bb9bd

Delete repeated words in fs/xfs/.
{we, that, the, a, to, fork}
Change "it it" to "it is" in one location.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
To: linux-fsdevel@vger.kernel.org
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>

commit | commitdiff | tree