xlog_recover_print_quotaoff() was using a static buffer to aggregate
quota option strings to be printed at the end. The buffer size was
miscalculated and when printing all 3 flags, a buffer overflow occurs
crashing xfs_logprint, like:
Fix this by removing the static buffer and using printf() directly to
print each flag. Also add a trailling space before each flag, so they
are a bit more readable on the output.
Reported-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Thu, 15 Apr 2021 19:44:49 +0000 (15:44 -0400)]
repair: scale duplicate name checking in phase 6.
phase 6 on large directories is cpu bound on duplicate name checking
due to the algorithm having effectively O(n^2) scalability. Hence
when the duplicate name hash table size is far smaller than the
number of directory entries, we end up with long hash chains that
are searched linearly on every new entry that is found in the
directory to do duplicate detection.
The in-memory hash table size is limited to 64k entries. Hence when
we have millions of entries in a directory, duplicate entry lookups
on the hash table have substantial overhead. Scale this table out to
larger sizes so that we keep the chain lengths short and hence the
O(n^2) scalability impact is limited because N is always small.
For a 10M entry directory consuming 400MB of directory data, the
hash table now sizes at 6.4 million entries instead of ~64k - it is
~100x larger. While the hash table now consumes ~50MB of RAM, the
xfs_repair footprint barely changes as it's using already consuming
~9GB of RAM at this point in time. IOWs, the incremental memory
usage change is noise, but the directory checking time:
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Thu, 15 Apr 2021 19:44:49 +0000 (15:44 -0400)]
repair: convert the dir byaddr hash to a radix tree
Phase 6 uses a hash table to track the data segment addresses of the
entries it has seen in a directory. This is indexed by the offset
into the data segment for the dirent, and is used to check if the
entry exists, is a duplicate or has a bad hash value. The lookup
operations involve walking long hash chains on large directories and
they are done for every entry in the directory. This means certain
operations have O(n^2) scalability (or worse!) and hence hurt on
very large directories.
It is also used to determine if the directory has unseen entries,
which involves a full hash traversal that is very expensive on large
directories. Hence the directory checking for unseen ends up being
roughly a O(n^2 + n) algorithm.
Switch the byaddr indexing to a radix tree. While a radix tree will
burn more memory than the linked list, it gives us O(log n) lookup
operations instead of O(n) on large directories, and use for tags
gives us O(1) determination of whether all entries have been seen or
not. This brings the "entry seen" algorithm scalability back to
O(nlog n) and so is a major improvement for processing large
directories.
Given a filesystem with 10M empty files in a single directory, we
see:
So there's some improvement, but we are clearly still CPU bound due
to the O(n^2) scalability of the duplicate name checking algorithm.
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Thu, 15 Apr 2021 19:44:49 +0000 (15:44 -0400)]
repair: don't duplicate names in phase 6
The name hash in phase 6 is constructed by using names that point
directly into the directory buffers. Hence before the buffers can be
released, the constructed name hash has to duplicate all those names
into meory it owns via dir_hash_dup_names().
Given that the structure that holds the name is dynamically
allocated, it makes no sense to store a pointer to the name
dir_hash_add() and then later have dynamically allocate the name.
Extend the name hash allocation to contain space for the name
itself, and copy the name into the name hash structure in
dir_hash_add(). This allows us to get rid of dir_hash_dup_names(),
and the directory checking code no longer needs to hold all the
directory buffers in memory until the entire directory walk is
complete and the names duplicated.
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Thu, 15 Apr 2021 19:44:49 +0000 (15:44 -0400)]
repair: parallelise phase 6
A recent metadump provided to us caused repair to take hours in
phase6. It wasn't IO bound - it was fully CPU bound the entire time.
The only way to speed it up is to make phase 6 run multiple
concurrent processing threads.
The obvious way to do this is to spread the concurrency across AGs,
like the other phases, and while this works it is not optimal. When
a processing thread hits a really large directory, it essentially
sits CPU bound until that directory is processed. IF an AG has lots
of large directories, we end up with a really long single threaded
tail that limits concurrency.
Hence we also need to have concurrency /within/ the AG. This is
realtively easy, as the inode chunk records allow for a simple
concurrency mechanism within an AG. We can simply feed each chunk
record to a workqueue, and we get concurrency within the AG for
free. However, this allows prefetch to run way ahead of processing
and this blows out the buffer cache size and can cause OOM.
However, we can use the new workqueue depth limiting to limit the
number of inode chunks queued, and this then backs up the inode
prefetching to it's maximum queue depth. Hence we prevent having the
prefetch code queue the entire AG's inode chunks on the workqueue
blowing out memory by throttling the prefetch consumer.
This takes phase 6 from taking many, many hours down to:
And burning 20-30 cpus that entire time on my test rig.
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Thu, 15 Apr 2021 19:44:49 +0000 (15:44 -0400)]
repair: protect inode chunk tree records with a mutex
Phase 6 accesses inode chunk records mostly in an isolated manner.
However, when it finds a corruption in a directory or there are
multiple hardlinks to an inode, there can be concurrent access
to the inode chunk record to update state.
Hence the inode record itself needs a mutex. This protects all state
changes within the inode chunk record, as well as inode link counts
and chunk references. That allows us to process multiple chunks at
once, providing concurrency within an AG as well as across AGs.
The inode chunk tree itself is not modified in the directory
scanning and rebuilding part of phase 6 which we are making
concurrent, hence we do not need to worry about locking for AVL tree
lookups to find the inode chunk records themselves. Therefore
internal locking is all we need here.
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Thu, 15 Apr 2021 19:44:49 +0000 (15:44 -0400)]
repair: Protect bad inode list with mutex
To enable phase 6 parallelisation, we need to protect the bad inode
list from concurrent modification and/or access. Wrap it with a
mutex and clean up the nasty typedefs.
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Thu, 15 Apr 2021 19:44:49 +0000 (15:44 -0400)]
workqueue: bound maximum queue depth
Existing users of workqueues have bound maximum queue depths in
their external algorithms (e.g. prefetch counts). For parallelising
work that doesn't have an external bound, allow workqueues to
throttle incoming requests at a maximum bound. Bounded workqueues
also need to distribute work over all worker threads themselves as
there is no external bounding or worker function throttling
provided.
Existing callers are not throttled and retain direct control of
worker threads, only users of the new create interface will be
throttled and concurrency managed.
Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Carlos Maiolino [Thu, 15 Apr 2021 19:44:10 +0000 (15:44 -0400)]
Add dax mount option to man xfs(5)
Details are already in kernel's documentation, but make dax mount option
information accessible through xfs(5) manpage.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 15 Apr 2021 19:44:10 +0000 (15:44 -0400)]
xfs_admin: pick up log arguments correctly
In commit ab9d8d69, we added support to xfs_admin to pass an external
log to xfs_db and xfs_repair. Unfortunately, we didn't do this
correctly -- by appending the log arguments to DB_OPTS, we now guarantee
an invocation of xfs_db when we don't have any work for it to do.
Brian Foster noticed that this results in xfs/764 hanging fstests
because xfs_db (when not compiled with libeditline) will wait for input
on stdin. I didn't notice because my build includes libeditline and my
test runner script does silly things with pipes such that xfs_db would
exit immediately.
Reported-by: Brian Foster <bfoster@redhat.com> Fixes: ab9d8d69 ("xfs_admin: support adding features to V5 filesystems") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Because the iomap code using PF_MEMALLOC_NOFS to detect transaction
recursion in XFS is just wrong. Remove it from the iomap code and
replace it with XFS specific internal checks using
current->journal_info instead.
[djwong: This change also realigns the lifetime of NOFS flag changes to
match the incore transaction, instead of the inconsistent scheme we have
now.]
Fixes: 9070733b4efa ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The assert in xfs_btree_del_cursor() checks that the bmapbt block
allocation field has been handled correctly before the cursor is
freed. This field is used for accurate calculation of indirect block
reservation requirements (for delayed allocations), for example.
generic/019 reproduces a scenario where this assert fails because
the filesystem has shutdown while in the middle of a bmbt record
insertion. This occurs after a bmbt block has been allocated via the
cursor but before the higher level bmap function (i.e.
xfs_bmap_add_extent_hole_real()) completes and resets the field.
Update the assert to accommodate the transient state if the
filesystem has shutdown. While here, clean up the indentation and
comments in the function.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
As xfs supports the feature of inode btree block counters now, expose
this feature flag in xfs geometry, for userspace can check if the
inobtcnt is enabled or not.
Signed-off-by: Zorro Lang <zlang@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Make it so that we can reserve rt blocks with the xfs_trans_alloc_inode
wrapper function, then convert a few more callsites.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Create a new helper xfs_trans_alloc_inode that allocates a transaction,
locks and joins an inode to it, and then reserves the appropriate amount
of quota against that transction. Then replace all the open-coded
idioms with a single call to this helper.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Modify xfs_trans_reserve_quota_nblks so that we can reserve data and
realtime blocks from the dquot at the same time. This change has the
theoretical side effect that for allocations to realtime files we will
reserve from the dquot both the number of rtblocks being allocated and
the number of bmbt blocks that might be needed to add the mapping.
However, since the mount code disables quota if it finds a realtime
device, this should not result in any behavior changes.
Now that we've moved the inode creation callers away from using the
_nblks function, we can repurpose the (now unused) ninos argument for
realtime blocks, so make that change. This also replaces the flags
argument with a boolean parameter to force the reservation since we
don't need to distinguish between data and rt quota reservations any
more, and the only flag being passed in was FORCE_RES.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Create a couple of convenience wrappers for creating and deleting quota
block reservations against future changes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Convert a few xfs_trans_*reserve* callsites that are open-coding other
convenience functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
With both CONFIG_XFS_DEBUG and CONFIG_XFS_WARN disabled, the only reference to
local variable "error" in xfs_bmap_compute_alignments() gets eliminated during
pre-processing stage of the compilation process. This causes the compiler to
generate a "set but not used" warning.
Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This commit adds XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag which
helps userspace test programs to get xfs_bmap_btalloc() to always
allocate minlen sized extents.
This is required for test programs which need a guarantee that minlen
extents allocated for a file do not get merged with their existing
neighbours in the inode's BMBT. "Inode fork extent overflow check" for
Directories, Xattrs and extension of realtime inodes need this since the
file offset at which the extents are being allocated cannot be
explicitly controlled from userspace.
One way to use this error tag is to,
1. Consume all of the free space by sequentially writing to a file.
2. Punch alternate blocks of the file. This causes CNTBT to contain
sufficient number of one block sized extent records.
3. Inject XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT error tag.
After step 3, xfs_bmap_btalloc() will issue space allocation
requests for minlen sized extents only.
ENOSPC error code is returned to userspace when there aren't any "one
block sized" extents left in any of the AGs.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This commit moves over the code in xfs_bmap_btalloc() which is
responsible for processing an allocated extent to a new function. Apart
from xfs_bmap_btalloc(), the new function will be invoked by another
function introduced in a future commit.
Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This commit moves over the code which computes stripe alignment and
extent size hint alignment into a separate function. Apart from
xfs_bmap_btalloc(), the new function will be used by another function
introduced in a future commit.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The check for verifying if the allocated extent is from an AG whose
index is greater than or equal to that of tp->t_firstblock is already
done a couple of statements earlier in the same function. Hence this
Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This commit adds XFS_ERRTAG_REDUCE_MAX_IEXTENTS error tag which enables
userspace programs to test "Inode fork extent count overflow detection"
by reducing maximum possible inode fork extent count to 10.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Removing an initial range of source/donor file's extent and adding a new
extent (from donor/source file) in its place will cause extent count to
increase by 1.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Moving an extent to data fork can cause a sub-interval of an existing
extent to be unmapped. This will increase extent count by 1. Mapping in
the new extent can increase the extent count by 1 again i.e.
| Old extent | New extent | Old extent |
Hence number of extents increases by 2.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
A write to a sub-interval of an existing unwritten extent causes
the original extent to be split into 3 extents
i.e. | Unwritten | Real | Unwritten |
Hence extent count can increase by 2.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to be
added. One extra extent for dabtree in case a local attr is large enough
to cause a double split. It can also cause extent count to increase
proportional to the size of a remote xattr's value.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
A rename operation is essentially a directory entry remove operation
from the perspective of parent directory (i.e. src_dp) of rename's
source. Hence the only place where we check for extent count overflow
for src_dp is in xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real()
returns -ENOSPC when it detects a possible extent count overflow and in
response, the higher layers of directory handling code do the following:
1. Data/Free blocks: XFS lets these blocks linger until a future remove
operation removes them.
2. Dabtree blocks: XFS swaps the blocks with the last block in the Leaf
space and unmaps the last block.
For target_dp, there are two cases depending on whether the destination
directory entry exists or not.
When destination directory entry does not exist (i.e. target_ip ==
NULL), extent count overflow check is performed only when transaction
has a non-zero sized space reservation associated with it. With a
zero-sized space reservation, XFS allows a rename operation to continue
only when the directory has sufficient free space in its data/leaf/free
space blocks to hold the new entry.
When destination directory entry exists (i.e. target_ip != NULL), all
we need to do is change the inode number associated with the already
existing entry. Hence there is no need to perform an extent count
overflow check.
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Directory entry removal must always succeed; Hence XFS does the
following during low disk space scenario:
1. Data/Free blocks linger until a future remove operation.
2. Dabtree blocks would be swapped with the last block in the leaf space
and then the new last block will be unmapped.
This facility is reused during low inode extent count scenario i.e. this
that the above mentioned behaviour is exercised causing no change to the
directory's extent count.
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Directory entry addition can cause the following,
1. Data block can be added/removed.
A new extent can cause extent count to increase by 1.
2. Free disk block can be added/removed.
Same behaviour as described above for Data block.
3. Dabtree blocks.
XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these
can be new extents. Hence extent count can increase by
XFS_DA_NODE_MAXDEPTH.
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The extent mapping the file offset at which a hole has to be
inserted will be split into two extents causing extent count to
increase by 1.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
When adding a new data extent (without modifying an inode's existing
extents) the extent count increases only by 1. This commit checks for
extent count overflow in such cases.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
3. The incore inode fork extent counter is a signed 32-bit
quantity. However the on-disk extent counter is an unsigned 16-bit
quantity and hence cannot hold 98511 extents.
4. The following incorrect value is stored in the attr extent counter,
# xfs_db -f -c 'inode 131' -c 'print core.naextents' /dev/loop0
core.naextents = -32561
This commit adds a new helper function (i.e.
xfs_iext_count_may_overflow()) to check for overflow of the per-inode
data and xattr extent counters. Future patches will use this function to
make sure that an FS operation won't cause the extent counter to
overflow.
Suggested-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
When overlayfs is running on top of xfs and the user unlinks a file in
the overlay, overlayfs will create a whiteout inode and ask xfs to
"rename" the whiteout file atop the one being unlinked. If the file
being unlinked loses its one nlink, we then have to put the inode on the
unlinked list.
This requires us to grab the AGI buffer of the whiteout inode to take it
off the unlinked list (which is where whiteouts are created) and to grab
the AGI buffer of the file being deleted. If the whiteout was created
in a higher numbered AG than the file being deleted, we'll lock the AGIs
in the wrong order and deadlock.
Therefore, grab all the AGI locks we think we'll need ahead of time, and
in order of increasing AG number per the locking rules.
Reported-by: wenli xie <wlxie7296@gmail.com> Fixes: 93597ae8dac0 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Bastian Germann [Fri, 12 Mar 2021 19:31:03 +0000 (14:31 -0500)]
debian: Build-depend on libinih-dev with udeb package
The first libinih Debian package version with udeb binary package is 53-1.
Debian bug #981662 documents the need for it:
xfsprogs-udeb depends on libinih1, not libinih1-udeb
Link: https://bugs.debian.org/981662 Signed-off-by: Bastian Germann <bastiangermann@fishpost.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Bastian Germann [Fri, 12 Mar 2021 19:31:03 +0000 (14:31 -0500)]
debian: Regenerate config.guess using debhelper
This is a change introduced in 5.10.0-2ubuntu2 with the changelog:
> xfsprogs upstream has regressed config.guess, so use
> dh_update_autotools_config.
The 5.10.0 tarball has a config.guess that breaks builds on RISC-V:
...
UNAME_MACHINE = riscv64
UNAME_RELEASE = 5.0.0+
UNAME_SYSTEM = Linux
UNAME_VERSION = #2 SMP Sat Mar 9 22:34:53 UTC 2019
configure: error: cannot guess build type; you must specify one
make[1]: *** [Makefile:131: include/builddefs] Error 1
...
Reported-by: Steve Langasek <steve.langasek@ubuntu.com> Signed-off-by: Bastian Germann <bastiangermann@fishpost.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 12 Mar 2021 18:57:27 +0000 (13:57 -0500)]
xfs_admin: don't hide the xfs_repair output when upgrading
Currently, xfs_admin suppresses the output from xfs_repair when it tries
to upgrade a filesystem, and prints a rather unhelpful message if the
upgrade fails.
Neither of these behaviors are useful -- repair can fail for reasons
outside of the filesystem being mounted, and if it does, the admin will
never know what actually happened.
Worse yet, if repair finds corruptions on disk, the upgrade script
silently throws all that away, which means that nobody will ever be able
to report what happened if an upgrade trashes a filesystem.
Therefore, allow the console to capture all of repair's stdout/stderr
reports.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 12 Mar 2021 18:57:27 +0000 (13:57 -0500)]
xfs_admin: don't add '=1' when building repair command line for -O
Geert Hendrickx reported an inconsistency between the xfs_admin manpage
and its behavior -- the documentation says that users must provide the
status explicitly, but the script injects '=1' anyway. While this seems
to work with the glibc getsubopt, it's a bit ugly and isn't consistent
with the docs.
So, get rid of that extra two bytes.
Reported-by: Geert Hendrickx <geert@hendrickx.be> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 22:38:46 +0000 (17:38 -0500)]
man: document XFS_XFLAG_APPEND behavior for directories
For directories, the APPEND flag means that files cannot be unlinked
from the directory. Files can be linked in or created, just not
unlinked. Document this behavior, since it's been in the VFS for years
though not explicitly mentioned. This patch is in preparation for
trying to hoist the fsgetxattr ioctl documentation to the man-pages
project.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 01:20:57 +0000 (20:20 -0500)]
xfs_repair: enable bigtime upgrade via repair
Upgrade existing V5 filesystems to support large timestamps up to 2486.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 01:20:57 +0000 (20:20 -0500)]
xfs_repair: enable inobtcount upgrade via repair
Use xfs_repair to add the inode btree counter feature to a filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 01:20:57 +0000 (20:20 -0500)]
xfs_admin: support adding features to V5 filesystems
Teach the xfs_admin script how to add features to V5 filesystems.
Technically speaking we could add lazycount to the list, but that option
is only useful for the V4 format which is deprecated.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 01:20:42 +0000 (20:20 -0500)]
xfs_repair: allow upgrades on v5 filesystems
Add some helper functions so that we can allow users to upgrade V5
filesystems in a sane manner. This just lands the boilerplate; the
actual feature validation and whatnot will land in the next patches.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
[sandeen: change subject slightly] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:35:55 +0000 (19:35 -0500)]
man: mark all deprecated V4 format options
Update the manual pages for the most popular tools to note which options
are only useful with the V4 XFS format, and that the V4 format is
deprecated and will be removed no later than September 2030.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
[sandeen: tweak wording and formatting a little] Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:10:00 +0000 (19:10 -0500)]
xfs_repair: add post-phase error injection points
Create an error injection point so that we can simulate repair failing
after a certain phase.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:10:00 +0000 (19:10 -0500)]
xfs_repair: factor phase transitions into a helper
Create a helper function to centralize all the stuff we do at the end of
a repair phase (which for now is limited to reporting progress). The
next patch will add more interesting things to this helper.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:10:00 +0000 (19:10 -0500)]
libxfs: simulate system failure after a certain number of writes
Add an error injection knob so that we can simulate system failure after
a certain number of disk writes. This knob is being added so that we
can check repair's behavior after an arbitrary number of tests.
Set LIBXFS_DEBUG_WRITE_CRASH={ddev,logdev,rtdev}=nn in the environment
to make libxfs SIGKILL itself after nn writes to the data, log, or rt
devices. Note that this only applies to xfs_buf writes and zero_range.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:10:00 +0000 (19:10 -0500)]
xfs_repair: set NEEDSREPAIR the first time we write to a filesystem
Add a hook to the buffer cache so that xfs_repair can intercept the
first write to a V5 filesystem to set the NEEDSREPAIR flag. In the
event that xfs_repair dirties the filesystem and goes down, this ensures
that the sysadmin will have to re-start repair before mounting.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:09:28 +0000 (19:09 -0500)]
xfs_repair: clear the needsrepair flag
Clear the needsrepair flag, since it's used to prevent mounting of an
inconsistent filesystem. We only do this if we make it to the end of
repair with a non-zero error code, and all the rebuilt indices and
corrected metadata are persisted correctly.
Note that we cannot combine clearing needsrepair with clearing the quota
checked flags because we need to clear the quota flags even if
reformatting the log fails, whereas we can't clear needsrepair if the
log reformat fails.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:09:28 +0000 (19:09 -0500)]
xfs_repair: clear quota CHKD flags on the incore superblock too
At the end of a repair run, xfs_repair clears the superblock's quota
checked flags if it found mistakes in the quota accounting to force a
quotacheck at the next mount. This is currently the last time repair
modifies the primary superblock, so it is sufficient to update the
ondisk buffer and not the incore mount structure.
However, we're about to introduce code to clear the needsrepair feature
at the very end of repair, after all metadata blocks have been written
to disk and all disk caches flush. Since the convention everywhere else
in xfs is to update the incore superblock, call libxfs_sb_to_disk to
translate that into the ondisk buffer, and then write the buffer to
disk, switch the quota CHKD code to use this mechanism too.
(Get rid of dsb too, since the incore super should be in sync with the
ondisk super.)
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:09:28 +0000 (19:09 -0500)]
xfs_repair: fix unmount error message to have a newline
Add a newline so that this is consistent with the other error messages.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:09:28 +0000 (19:09 -0500)]
xfs_db: don't allow label/uuid setting if the needsrepair flag is set
The NEEDSREPAIR flag can be set on filesystems where we /know/ that
there's something wrong with the metadata and want to force the sysadmin
to run xfs_repair before the next mount. The goal here is to prevent
non-repair changes to a filesystem when we are confident of its
instability. Normally we wouldn't bother with such safety checks for
the debugger, but the label and uuid functions can be called from
xfs_admin, so we should prevent these administrative tasks until the
filesystem can be repaired.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:09:28 +0000 (19:09 -0500)]
xfs_db: report the needsrepair flag in check and version commands
Teach the version and check commands to report the presence of the
NEEDSREPAIR flag.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:09:28 +0000 (19:09 -0500)]
xfs_admin: support filesystems with realtime devices
Add a -r option to xfs_admin so that we can pass the name of the
realtime device to xfs_repair.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 24 Feb 2021 00:09:28 +0000 (19:09 -0500)]
xfs_admin: clean up string quoting
Clean up the string quoting in this script so that we don't trip over
users feeding us arguments like "/dev/sd ha ha ha lol".
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 12 Feb 2021 22:23:06 +0000 (17:23 -0500)]
xfs_scrub: fix weirdness in directory name check code
Remove the redundant second check of fd and ISDIR in check_inode_names,
and rework the comment to describe why we can't run phase 5 if we found
other corruptions in the filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 12 Feb 2021 22:23:06 +0000 (17:23 -0500)]
xfs_repair: check dquot id and type
Make sure that we actually check the type and id of an ondisk dquot.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 12 Feb 2021 22:23:06 +0000 (17:23 -0500)]
xfs_scrub: handle concurrent directory updates during name scan
The name scanner in xfs_scrub cannot lock a namespace (dirent or xattr)
and the kernel does not provide a stable cursor interface, which means
that we can see the same byte sequence multiple times during a scan.
This isn't a confusing name error since the kernel enforces uniqueness
on the byte sequence, so all we need to do here is update the old entry.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 12 Feb 2021 22:23:06 +0000 (17:23 -0500)]
xfs_scrub: load and unload libicu properly
Make sure we actually load and unload libicu properly. This isn't
strictly required since the library can bootstrap itself, but unloading
means fewer things for valgrind to complain about.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 12 Feb 2021 22:23:05 +0000 (17:23 -0500)]
xfs_scrub: detect infinite loops when scanning inodes
During an inode scan (aka phase 3) when we're scanning the inode btree
to find files to check, make sure that each invocation of inumbers
actually gives us an inobt record with a startino that's at least as
large as what we asked for so that we always make forward progress.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 12 Feb 2021 22:23:05 +0000 (17:23 -0500)]
misc: fix valgrind complaints
Zero the memory that we pass to the kernel via ioctls so that we never
pass userspace heap/stack garbage around. This silences valgrind
complaints about uninitialized padding areas.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 12 Feb 2021 22:23:05 +0000 (17:23 -0500)]
xfs_db: add an ls command
Add to xfs_db the ability to list a directory.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 12 Feb 2021 22:23:05 +0000 (17:23 -0500)]
xfs_db: add a directory path lookup command
Add a command to xfs_db so that we can navigate to inodes by path.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Chandan Babu R [Fri, 12 Feb 2021 22:23:05 +0000 (17:23 -0500)]
xfsprogs: xfs_fsr: Verify bulkstat version information in qsort's cmp()
This commit introduces a check to verify that correct bulkstat structures are
being processed by qsort's cmp() function.
Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Chandan Babu R [Fri, 12 Feb 2021 22:23:05 +0000 (17:23 -0500)]
xfsprogs: xfs_fsr: Limit the scope of cmp()
cmp() function is being referred to from within fsr/xfs_fsr.c. Hence
this commit limits its scope to the current file.
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Chandan Babu R [Fri, 12 Feb 2021 22:23:05 +0000 (17:23 -0500)]
xfsprogs: xfs_fsr: Interpret arguments of qsort's compare function correctly
The first argument passed to qsort() in fsrfs() is an array of "struct
xfs_bulkstat". Hence the two arguments to the cmp() function must be
interpreted as being of type "struct xfs_bulkstat *" as against "struct
xfs_bstat *" that is being used to currently typecast them.
Signed-off-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Bastian Germann [Fri, 12 Feb 2021 21:35:22 +0000 (16:35 -0500)]
debian: Prevent installing duplicate changelog
The doc/CHANGES file is both processed by dh_installdocs and
dh_installchangelogs. So it ends up as changelog.gz and CHANGES.gz.
Prevent that by excluding it from dh_installdocs.
Signed-off-by: Bastian Germann <bastiangermann@fishpost.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Bastian Germann [Fri, 12 Feb 2021 21:35:22 +0000 (16:35 -0500)]
debian: Only build for Linux
Use architecture linux-any to exclude kfreebsd and hurd from building
the package. Those will always fail.
Signed-off-by: Bastian Germann <bastiangermann@fishpost.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Bastian Germann [Fri, 12 Feb 2021 21:35:22 +0000 (16:35 -0500)]
debian: Drop unused dh-python from Build-Depends
xfsprogs participates in dependency loops relevant to architecture
bootstrap. Identifying easily droppable dependencies, it was found
that xfsprogs does not use dh-python in any way.
Reported-by: Helmut Grohne <helmut@subdivi.de> Signed-off-by: Bastian Germann <bastiangermann@fishpost.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Bastian Germann [Mon, 18 Jan 2021 21:07:35 +0000 (16:07 -0500)]
debian: new changelog entry
Signed-off-by: Bastian Germann <bastiangermann@fishpost.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Nathan Scott <nathans@debian.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Bastian Germann [Mon, 18 Jan 2021 21:07:35 +0000 (16:07 -0500)]
debian: add missing copyright info
For binary distribution, the copyright info in (L)GPL licensed, compiled
files needs to be retained in the copyright file.
Signed-off-by: Bastian Germann <bastiangermann@fishpost.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Nathan Scott <nathans@debian.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Bastian Germann [Mon, 18 Jan 2021 21:07:35 +0000 (16:07 -0500)]
debian: use Package-Type over its predecessor
The debian/control file contains an XC-Package-Type field.
As of dpkg-dev 1.15.7, the dpkg development utilities recognize
Package-Type as an official field name.
Signed-off-by: Bastian Germann <bastiangermann@fishpost.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Nathan Scott <nathans@debian.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Bastian Germann [Mon, 18 Jan 2021 21:07:35 +0000 (16:07 -0500)]
debian: remove "Priority: extra"
Priority "extra" was replaced by "optional" which is already used by the
package in general. There is one Priority extra left, so remove it.
Signed-off-by: Bastian Germann <bastiangermann@fishpost.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Nathan Scott <nathans@debian.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Bastian Germann [Mon, 18 Jan 2021 21:07:35 +0000 (16:07 -0500)]
debian: remove dependency on essential util-linux
Essential packages must not be part of Depends.
Signed-off-by: Bastian Germann <bastiangermann@fishpost.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Nathan Scott <nathans@debian.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Prepare for kernel xfs_buf alignment by getting rid of the
xfs_buf_t typedef from userspace.
[darrick: This patch is a port of a userspace patch removing the
xfs_buf_t typedef in preparation to make the userspace xfs_buf code
behave more like its kernel counterpart.]
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
It's enough to just use return code, and get rid of an argument.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This patch explicitly separates free inode chunk allocation and
inode allocation into two individual high level operations.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Get rid of the confusing ialloc_context and failure handling around
xfs_dialloc() by moving xfs_dialloc_roll() into xfs_dialloc().
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
So xfs_ialloc() will only address in-core inode allocation then,
Also, rename xfs_ialloc() to xfs_dir_ialloc_init() in order to
keep everything in xfs_inode.c under the same namespace.
[sandeen: make equivalent change in xfsprogs]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Introduce a helper to make the on-disk inode allocation rolling
logic clearer in preparation of the following cleanup.
[sandeen: update xfsprogs struct xfs_trans to match]
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We don't yet support dax on reflinked files, but that is in the works.
Further, having the flag set does not automatically mean that the inode
is actually "in the CPU direct access state," which depends on several
other conditions in addition to the flag being set.
As such, we should not catch this as corruption in the verifier - simply
not actually enabling S_DAX on reflinked files is enough for now.
Fixes: 4f435ebe7d04 ("xfs: don't mix reflink and DAX mode for now") Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
[darrick: fix the scrubber too] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Since *init_cursor() can always return a valid cursor, the NULL check
in caller is unneeded. So clean them up.
This also keeps the behavior consistent with other callers.
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Introduce a common helper to consolidate stripe validation process.
Also make kernel code xfs_validate_sb_common() use it first.
Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
There are no callers of the XFS_B_FSB_OFFSET macro, so remove it.
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Nowadays the only things that the XFS_TRANS_DQ_DIRTY flag seems to do
are indicates the tp->t_dqinfo->dqs[XFS_QM_TRANS_{USR,GRP,PRJ}] values
changed and check in xfs_trans_apply_dquot_deltas() and the unreserve
variant xfs_trans_unreserve_and_mod_dquots(). Actually, we also can
use the tp->t_dqinfo value instead of the XFS_TRANS_DQ_DIRTY flag, that
is to say, we allocate the new tp->t_dqinfo only when the qtrx values
changed, so the tp->t_dqinfo value isn't NULL equals the XFS_TRANS_DQ_DIRTY
flag is set, we only need to check if tp->t_dqinfo == NULL in
xfs_trans_apply_dquot_deltas() and its unreserve variant to determine
whether lock all of the dquots and join them to the transaction.
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>