Chandan Babu R [Sat, 24 Feb 2024 04:48:39 +0000 (10:18 +0530)]
Merge tag 'in-memory-btrees-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: support in-memory btrees
Online repair of the reverse-mapping btrees presens some unique
challenges. To construct a new reverse mapping btree, we must scan the
entire filesystem, but we cannot afford to quiesce the entire filesystem
for the potentially lengthy scan.
For rmap btrees, therefore, we relax our requirements of totally atomic
repairs. Instead, repairs will scan all inodes, construct a new reverse
mapping dataset, format a new btree, and commit it before anyone trips
over the corruption. This is exactly the same strategy as was used in
the quotacheck and nlink scanners.
Unfortunately, the xfarray cannot perform key-based lookups and is
therefore unsuitable for supporting live updates. Luckily, we already a
data structure that maintains an indexed rmap recordset -- the existing
rmap btree code! Hence we port the existing btree and buffer target
code to be able to create a btree using the xfile we developed earlier.
Live hooks keep the in-memory btree up to date for any resources that
have already been scanned.
This approach is not maximally memory efficient, but we can use the same
rmap code that we do everywhere else, which provides improved stability
without growing the code base even more. Note that in-memory btree
blocks are always page sized.
This patchset modifies the kernel xfs buffer cache to be capable of
using a xfile (aka a shmem file) as a backing device. It then augments
the btree code to support creating btree cursors with buffers that come
from a buftarg other than the data device (namely an xfile-backed
buftarg). For the userspace xfs buffer cache, we instead use a memfd or
an O_TMPFILE file as a backing device.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'in-memory-btrees-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: launder in-memory btree buffers before transaction commit
xfs: support in-memory btrees
xfs: add a xfs_btree_ptrs_equal helper
xfs: support in-memory buffer cache targets
xfs: teach buftargs to maintain their own buffer hashtable
Chandan Babu R [Sat, 24 Feb 2024 04:44:43 +0000 (10:14 +0530)]
Merge tag 'buftarg-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: buftarg cleanups
Clean up the buffer target code in preparation for adding the ability to
target tmpfs files. That will enable the creation of in memory btrees.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'buftarg-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: move setting bt_logical_sectorsize out of xfs_setsize_buftarg
xfs: remove xfs_setsize_buftarg_early
xfs: remove the xfs_buftarg_t typedef
Chandan Babu R [Sat, 24 Feb 2024 04:41:25 +0000 (10:11 +0530)]
Merge tag 'btree-readahead-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: btree readahead cleanups
Minor cleanups for the btree block readahead code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'btree-readahead-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: split xfs_buf_rele for cached vs uncached buffers
xfs: move and rename xfs_btree_read_bufl
xfs: remove xfs_btree_reada_bufs
xfs: remove xfs_btree_reada_bufl
Chandan Babu R [Sat, 24 Feb 2024 04:38:27 +0000 (10:08 +0530)]
Merge tag 'btree-check-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: btree check cleanups
Minor cleanups for the btree block pointer checking code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'btree-check-cleanups-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: factor out a __xfs_btree_check_lblock_hdr helper
xfs: rename btree helpers that depends on the block number representation
xfs: consolidate btree block verification
xfs: tighten up validation of root block in inode forks
xfs: remove the crc variable in __xfs_btree_check_lblock
xfs: misc cleanups for __xfs_btree_check_sblock
xfs: consolidate btree ptr checking
xfs: open code xfs_btree_check_lptr in xfs_bmap_btree_to_extents
xfs: simplify xfs_btree_check_lblock_siblings
xfs: simplify xfs_btree_check_sblock_siblings
Chandan Babu R [Sat, 24 Feb 2024 04:34:39 +0000 (10:04 +0530)]
Merge tag 'btree-remove-btnum-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: remove bc_btnum from btree cursors
From Christoph Hellwig,
This series continues the migration of btree geometry information out of
the cursor structure and into the ops structure. This time around, we
replace the btree type enumeration (btnum) with an explicit name string
in the btree ops structure. This enables easy creation of /any/ new
btree type without having to mess with libxfs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'btree-remove-btnum-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: remove xfs_btnum_t
xfs: pass a 'bool is_finobt' to xfs_inobt_insert
xfs: split xfs_inobt_init_cursor
xfs: split xfs_inobt_insert_sprec
xfs: remove the which variable in xchk_iallocbt
xfs: remove the btnum argument to xfs_inobt_count_blocks
xfs: remove xfs_inobt_cur
xfs: split xfs_allocbt_init_cursor
xfs: refactor the btree cursor allocation logic in xchk_ag_btcur_init
xfs: add a sick_mask to struct xfs_btree_ops
xfs: add a name field to struct xfs_btree_ops
xfs: split the agf_roots and agf_levels arrays
xfs: remove xfs_bmbt_stage_cursor
xfs: fold xfs_bmbt_init_common into xfs_bmbt_init_cursor
xfs: make staging file forks explicit
xfs: make full use of xfs_btree_stage_ifakeroot in xfs_bmbt_stage_cursor
xfs: remove xfs_rmapbt_stage_cursor
xfs: fold xfs_rmapbt_init_common into xfs_rmapbt_init_cursor
xfs: remove xfs_refcountbt_stage_cursor
xfs: fold xfs_refcountbt_init_common into xfs_refcountbt_init_cursor
xfs: remove xfs_inobt_stage_cursor
xfs: fold xfs_inobt_init_common into xfs_inobt_init_cursor
xfs: remove xfs_allocbt_stage_cursor
xfs: fold xfs_allocbt_init_common into xfs_allocbt_init_cursor
xfs: don't override bc_ops for staging btrees
xfs: add a xfs_btree_init_ptr_from_cur
xfs: move comment about two 2 keys per pointer in the rmap btree
Chandan Babu R [Sat, 24 Feb 2024 04:31:16 +0000 (10:01 +0530)]
Merge tag 'btree-geometry-in-ops-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: move btree geometry to ops struct
This patchset prepares the generic btree code to allow for the creation
of new btree types outside of libxfs. The end goal here is for online
fsck to be able to create its own in-memory btrees that will be used to
improve the performance (and reduce the memory requirements of) the
refcount btree.
To enable this, I decided that the btree ops structure is the ideal
place to encode all of the geometry information about a btree. The btree
ops struture already contains the buffer ops (and hence the btree block
magic numbers) as well as the key and record sizes, so it doesn't seem
all that farfetched to encode the XFS_BTREE_ flags that determine the
geometry (ROOT_IN_INODE, LONG_PTRS, etc).
The rest of the patchset cleans up the btree functions that initialize
btree blocks and btree buffers. The bulk of this work is to replace
btree geometry related function call arguments with a single pointer to
the ops structure, and then clean up everything else around that. As a
side effect, we rename the functions.
Later, Christoph Hellwig and I merged together a bunch more cleanups
that he wanted to do for a while. All the btree geometry information is
now in the btree ops structure, we've created an explicit btree type
(ag, inode, mem) and moved the per-btree type information to a separate
union.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'btree-geometry-in-ops-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: create predicate to determine if cursor is at inode root level
xfs: split the per-btree union in struct xfs_btree_cur
xfs: split out a btree type from the btree ops geometry flags
xfs: store the btree pointer length in struct xfs_btree_ops
xfs: factor out a btree block owner check
xfs: factor out a xfs_btree_owner helper
xfs: move the btree stats offset into struct btree_ops
xfs: move lru refs to the btree ops structure
xfs: set btree block buffer ops in _init_buf
xfs: remove the unnecessary daddr paramter to _init_block
xfs: btree convert xfs_btree_init_block to xfs_btree_init_buf calls
xfs: rename btree block/buffer init functions
xfs: initialize btree blocks using btree_ops structure
xfs: extern some btree ops structures
xfs: turn the allocbt cursor active field into a btree flag
xfs: consolidate the xfs_alloc_lookup_* helpers
xfs: remove bc_ino.flags
xfs: encode the btree geometry flags in the btree ops structure
xfs: fix imprecise logic in xchk_btree_check_block_owner
xfs: drop XFS_BTREE_CRC_BLOCKS
xfs: set the btree cursor bc_ops in xfs_btree_alloc_cursor
xfs: consolidate btree block allocation tracepoints
xfs: consolidate btree block freeing tracepoints
Chandan Babu R [Sat, 24 Feb 2024 04:28:28 +0000 (09:58 +0530)]
Merge tag 'repair-fscounters-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: online repair for fs summary counters
A longstanding deficiency in the online fs summary counter scrubbing
code is that it hasn't any means to quiesce the incore percpu counters
while it's running. There is no way to coordinate with other threads
are reserving or freeing free space simultaneously, which leads to false
error reports. Right now, if the discrepancy is large, we just sort of
shrug and bail out with an incomplete flag, but this is lame.
For repair activity, we actually /do/ need to stabilize the counters to
get an accurate reading and install it in the percpu counter. To
improve the former and enable the latter, allow the fscounters online
fsck code to perform an exclusive mini-freeze on the filesystem. The
exclusivity prevents userspace from thawing while we're running, and the
mini-freeze means that we don't wait for the log to quiesce, which will
make both speedier.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'repair-fscounters-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: repair summary counters
Chandan Babu R [Sat, 24 Feb 2024 04:25:02 +0000 (09:55 +0530)]
Merge tag 'indirect-health-reporting-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: indirect health reporting
This series enables the XFS health reporting infrastructure to remember
indirect health concerns when resources are scarce. For example, if a
scrub notices that there's something wrong with an inode's metadata but
memory reclaim needs to free the incore inode, we want to record in the
perag data the fact that there was some inode somewhere with an error.
The perag structures never go away.
The first two patches in this series set that up, and the third one
provides a means for xfs_scrub to tell the kernel that it can forget the
indirect problem report.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'indirect-health-reporting-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: update health status if we get a clean bill of health
xfs: remember sick inodes that get inactivated
xfs: add secondary and indirect classes to the health tracking system
Chandan Babu R [Sat, 24 Feb 2024 04:21:32 +0000 (09:51 +0530)]
Merge tag 'corruption-health-reports-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: report corruption to the health trackers
Any time that the runtime code thinks it has found corrupt metadata, it
should tell the health tracking subsystem that the corresponding part of
the filesystem is sick. These reports come primarily from two places --
code that is reading a buffer that fails validation, and higher level
pieces that observe a conflict involving multiple buffers. This
patchset uses automated scanning to update all such callsites with a
mark_sick call.
Doing this enables the health system to record problem observed at
runtime, which (for now) can prompt the sysadmin to run xfs_scrub, and
(later) may enable more targetted fixing of the filesystem.
Note: Earlier reviewers of this patchset suggested that the verifier
functions themselves should be responsible for calling _mark_sick. In a
higher level language this would be easily accomplished with lambda
functions and closures. For the kernel, however, we'd have to create
the necessary closures by hand, pass them to the buf_read calls, and
then implement necessary state tracking to detach the xfs_buf from the
closure at the necessary time. This is far too much work and complexity
and will not be pursued further.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'corruption-health-reports-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: report XFS_IS_CORRUPT errors to the health system
xfs: report realtime metadata corruption errors to the health system
xfs: report quota block corruption errors to the health system
xfs: report inode corruption errors to the health system
xfs: report symlink block corruption errors to the health system
xfs: report dir/attr block corruption errors to the health system
xfs: report btree block corruption errors to the health system
xfs: report block map corruption errors to the health tracking system
xfs: report ag header corruption errors to the health tracking system
xfs: report fs corruption errors to the health tracking system
xfs: separate the marking of sick and checked metadata
Chandan Babu R [Sat, 24 Feb 2024 04:17:39 +0000 (09:47 +0530)]
Merge tag 'scrub-nlinks-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: online repair of file link counts
Now that we've created the infrastructure to perform live scans of every
file in the filesystem and the necessary hook infrastructure to observe
live updates, use it to scan directories to compute the correct link
counts for files in the filesystem, and reset those link counts.
This patchset creates a tailored readdir implementation for scrub
because the regular version has to cycle ILOCKs to copy information to
userspace. We can't cycle the ILOCK during the nlink scan and we don't
need all the other VFS support code (maintaining a readdir cursor and
translating XFS structures to VFS structures and back) so it was easier
to duplicate the code.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'scrub-nlinks-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: teach repair to fix file nlinks
xfs: track directory entry updates during live nlinks fsck
xfs: teach scrub to check file nlinks
xfs: report health of inode link counts
Chandan Babu R [Sat, 24 Feb 2024 04:14:28 +0000 (09:44 +0530)]
Merge tag 'repair-quotacheck-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: online repair of quota counters
This series uses the inode scanner and live update hook functionality
introduced in the last patchset to implement quotacheck on a live
filesystem. The quotacheck scrubber builds an incore copy of the
dquot resource usage counters and compares it to the live dquots to
report discrepancies.
If the user chooses to repair the quota counters, the repair function
visits each incore dquot to update the counts from the live information.
The live update hooks are key to keeping the incore copy up to date.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'repair-quotacheck-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: repair dquots based on live quotacheck results
xfs: repair cannot update the summary counters when logging quota flags
xfs: track quota updates during live quotacheck
xfs: implement live quotacheck inode scan
xfs: create a sparse load xfarray function
xfs: create a helper to count per-device inode block usage
xfs: create a xchk_trans_alloc_empty helper for scrub
xfs: report the health of quota counts
Chandan Babu R [Sat, 24 Feb 2024 04:10:39 +0000 (09:40 +0530)]
Merge tag 'repair-inode-mode-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.9-mergeC
xfs: repair inode mode by scanning dirs
One missing piece of functionality in the inode record repair code is
figuring out what to do with a file whose mode is so corrupt that we
cannot tell us the type of the file. Originally this was done by
guessing the mode from the ondisk inode contents, but Christoph didn't
like that because it read from data fork block 0, which could be user
controlled data.
Therefore, I've replaced all that with a directory scanner that looks
for any dirents that point to the file with the garbage mode. If so,
the ftype in the dirent will tell us exactly what mode to set on the
file. Since users cannot directly write to the ftype field of a dirent,
this should be safe.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'repair-inode-mode-6.9_2024-02-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: repair file modes by scanning for a dirent pointing to us
xfs: create a macro for decoding ftypes in tracepoints
xfs: create a predicate to determine if two xfs_names are the same
xfs: create a static name for the dot entry too
xfs: iscan batching should handle unallocated inodes too
xfs: cache a bunch of inodes for repair scans
xfs: stagger the starting AG of scrub iscans to reduce contention
xfs: allow scrub to hook metadata updates in other writers
xfs: implement live inode scan for scrub
xfs: speed up xfs_iwalk_adjust_start a little bit
Darrick J. Wong [Thu, 22 Feb 2024 20:43:36 +0000 (12:43 -0800)]
xfs: launder in-memory btree buffers before transaction commit
As we've noted in various places, all current users of in-memory btrees
are online fsck. Online fsck only stages a btree long enough to rebuild
an ondisk data structure, which means that the in-memory btree is
ephemeral. Furthermore, if we encounter /any/ errors while updating an
in-memory btree, all we do is tear down all the staged data and return
an errno to userspace. In-memory btrees need not be transactional, so
their buffers should not be committed to the ondisk log, nor should they
be checkpointed by the AIL. That's just as well since the ephemeral
nature of the btree means that the buftarg and the buffers may disappear
quickly anyway.
Therefore, we need a way to launder the btree buffers that get attached
to the transaction by the generic btree code. Because the buffers are
directly mapped to backing file pages, there's no need to bwrite them
back to the tmpfs file. All we need to do is clean enough of the buffer
log item state so that the bli can be detached from the buffer, remove
the bli from the transaction's log item list, and reset the transaction
dirty state as if the laundered items had never been there.
For simplicity, create xfbtree transaction commit and cancel helpers
that launder the in-memory btree buffers for callers. Once laundered,
call the write verifier on non-stale buffers to avoid integrity issues,
or punch a hole in the backing file for stale buffers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:43:35 +0000 (12:43 -0800)]
xfs: support in-memory btrees
Adapt the generic btree cursor code to be able to create a btree whose
buffers come from a (presumably in-memory) buftarg with a header block
that's specific to in-memory btrees. We'll connect this to other parts
of online scrub in the next patches.
Note that in-memory btrees always have a block size matching the system
memory page size for efficiency reasons. There are also a few things we
need to do to finalize a btree update; that's covered in the next patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:43:21 +0000 (12:43 -0800)]
xfs: support in-memory buffer cache targets
Allow the buffer cache to target in-memory files by making it possible
to have a buftarg that maps pages from private shmem files. As the
prevous patch alludes, the in-memory buftarg contains its own cache,
points to a shmem file, and does not point to a block_device.
The next few patches will make it possible to construct an xfs_btree in
pageable memory by using this buftarg.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:42:58 +0000 (12:42 -0800)]
xfs: teach buftargs to maintain their own buffer hashtable
Currently, cached buffers are indexed by per-AG hashtables. This works
great for the data device, but won't work for in-memory btrees. To
handle that use case, buftargs will need to be able to index buffers
independently of other data structures.
We accomplish this by hoisting the rhashtable and its lock into a
separate xfs_buf_cache structure, make the buftarg point to the
_buf_cache structure, and rework various functions to use it. This
will enable the in-memory buftarg to come up with its own _buf_cache.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
xfs: move setting bt_logical_sectorsize out of xfs_setsize_buftarg
bt_logical_sectorsize and the associated mask is set based on the
constant logical block size in the block_device structure and thus
doesn't need to be updated in xfs_setsize_buftarg. Move it into
xfs_alloc_buftarg so that it is only done once per buftarg.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
xfs: split xfs_buf_rele for cached vs uncached buffers
xfs_buf_rele is a bit confusing because it mixes up handling of normal
cached and the special uncached buffers without much explanation.
Split the handling into two different helpers, and use a clearly named
helper that checks the hash key to distinguish the two cases instead
of checking the pag pointer.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Despite its name, xfs_btree_read_bufl doesn't contain any btree-related
functionaliy and isn't used by the btree code. Move it to xfs_bmap.c,
hard code the refval and ops arguments and rename it to
xfs_bmap_read_buf.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
xfs_btree_reada_bufl just wraps xfs_btree_readahead and a agblock
to daddr conversion. Just open code it's three callsites in the
two callers (One of which isn't even btree related).
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
xfs: rename btree helpers that depends on the block number representation
All these helpers hardcode fsblocks or agblocks and not just the pointer
size. Rename them so that the names are still fitting when we add the
long format in-memory blocks and adjust the checks when calling them to
check the btree types and not just pointer length.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Add a __xfs_btree_check_block helper that can be called by the scrub code
to validate a btree block of any form, and move the duplicate error
handling code from xfs_btree_check_sblock and xfs_btree_check_lblock into
xfs_btree_check_block and thus remove these two helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Merge xfs_btree_check_sptr and xfs_btree_check_lptr into a single
__xfs_btree_check_ptr that can be shared between xfs_btree_check_ptr
and the scrub code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Stop using xfs_btree_check_lptr in xfs_btree_check_lblock_siblings,
as it only duplicates the xfs_verify_fsbno call in the other leg of
if / else besides adding a tautological level check.
With this the cur and level arguments can be removed as they are
now unused.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Stop using xfs_btree_check_sptr in xfs_btree_check_sblock_siblings,
as it only duplicates the xfs_verify_agbno call in the other leg of
if / else besides adding a tautological level check.
With this the cur and level arguments can be removed as they are
now unused.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The last checks for bc_btnum can be replaced with helpers that check
the btree ops. This allows adding new btrees to XFS without having
to update a global enum.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: complete the ops predicates] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Split xfs_inobt_init_cursor into separate routines for the inobt and
finobt to prepare for the removal of the xfs_btnum global enumeration
of btree types.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The which variable that holds a btree number is passed to two functions
that ignore it and used in a single check that can check the sm_type
as well. Remove it to unclutter the code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Split xfs_allocbt_init_cursor into separate routines for the by-bno
and by-cnt btrees to prepare for the removal of the xfs_btnum global
enumeration of btree types.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
The btnum in struct xfs_btree_ops is often used for printing a symbolic
name for the btree. Add a name field to the ops structure and use that
directly.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
xfs: fold xfs_bmbt_init_common into xfs_bmbt_init_cursor
Make the levels initialization in xfs_bmbt_init_cursor conditional
and merge the two helpers.
This requires the fakeroot case to now pass a -1 whichfork directly
into xfs_bmbt_init_cursor, and some special casing for that, but
at least this scheme to deal with the fake btree root is handled and
documented in once place now.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: tidy up a multline ternary] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 22 Feb 2024 20:39:43 +0000 (12:39 -0800)]
xfs: make staging file forks explicit
Don't open-code "-1" for whichfork when we're creating a staging btree
for a repair; let's define an actual symbol to make grepping and
understanding easier.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
xfs: make full use of xfs_btree_stage_ifakeroot in xfs_bmbt_stage_cursor
Remove the duplicate cur->bc_nlevels assignment in xfs_bmbt_stage_cursor,
and move the cur->bc_ino.forksize assignment into
xfs_btree_stage_ifakeroot as it is part of setting up the fake btree
root.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Inode-rooted btrees don't need to initialize the root pointer in the
->init_ptr_from_cur method as the root is found by the
xfs_btree_get_iroot method later. Make ->init_ptr_from_cur option
for inode rooted btrees by providing a helper that does the right
thing for the given btree type and also documents the semantics.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 22 Feb 2024 20:37:24 +0000 (12:37 -0800)]
xfs: create predicate to determine if cursor is at inode root level
Create a predicate to decide if the given cursor and level point to the
root block in the inode immediate area instead of a disk block, and get
rid of the open-coded logic everywhere.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
xfs: split the per-btree union in struct xfs_btree_cur
Split up the union that encodes btree-specific fields in struct
xfs_btree_cur. Most fields in there are specific to the btree type
encoded in xfs_btree_ops.type, and we can use the obviously named union
for that. But one field is specific to the bmapbt and two are shared by
the refcount and rtrefcountbt. Move those to a separate union to make
the usage clear and not need a separate struct for the refcount-related
fields.
This will also make unnecessary some very awkward btree cursor
refc/rtrefc switching logic in the rtrefcount patchset.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
xfs: split out a btree type from the btree ops geometry flags
Two of the btree cursor flags are always used together and encode
the fundamental btree type. There currently are two such types:
1) an on-disk AG-rooted btree with 32-bit pointers
2) an on-disk inode-rooted btree with 64-bit pointers
and we're about to add:
3) an in-memory btree with 64-bit pointers
Introduce a new enum and a new type field in struct xfs_btree_geom
to encode this type directly instead of using flags and change most
code to switch on this enum.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: make the pointer lengths explicit] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 22 Feb 2024 20:35:36 +0000 (12:35 -0800)]
xfs: store the btree pointer length in struct xfs_btree_ops
Make the pointer length an explicit field in the btree operations
structure so that the next patch (which introduces an explicit btree
type enum) doesn't have to play a bunch of awkward games with inferring
the pointer length from the enumeration.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:35:22 +0000 (12:35 -0800)]
xfs: factor out a xfs_btree_owner helper
Split out a helper to calculate the owner for a given btree instead of
duplicating the logic in two places. While we're at it, make the
bc_ag/bc_ino switch logic depend on the correct geometry flag.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: break this up into two patches for the owner check] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Thu, 22 Feb 2024 20:35:20 +0000 (12:35 -0800)]
xfs: move lru refs to the btree ops structure
Move the btree buffer LRU refcount to the btree ops structure so that we
can eliminate the last bc_btnum switch in the generic btree code. We're
about to create repair-specific btree types, and we don't want that
stuff cluttering up libxfs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:35:17 +0000 (12:35 -0800)]
xfs: rename btree block/buffer init functions
Rename xfs_btree_init_block_int to xfs_btree_init_block, and
xfs_btree_init_block to xfs_btree_init_buf so that the name suggests the
type that caller are supposed to pass in.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:35:16 +0000 (12:35 -0800)]
xfs: initialize btree blocks using btree_ops structure
Notice now that the btree ops structure encodes btree geometry flags and
the magic number through the buffer ops. Refactor the btree block
initialization functions to use the btree ops so that we no longer have
to open code all that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:34:29 +0000 (12:34 -0800)]
xfs: encode the btree geometry flags in the btree ops structure
Certain btree flags never change for the life of a btree cursor because
they describe the geometry of the btree itself. Encode these in the
btree ops structure and reduce the amount of code required in each btree
type's init_cursor functions. This also frees up most of the bits in
bc_flags.
A previous version of this patch also converted the open-coded flags
logic to helpers. This was removed due to the pending refactoring (that
follows this patch) to eliminate most of the state flags.
Darrick J. Wong [Thu, 22 Feb 2024 20:34:13 +0000 (12:34 -0800)]
xfs: fix imprecise logic in xchk_btree_check_block_owner
A reviewer was confused by the init_sa logic in this function. Upon
checking the logic, I discovered that the code is imprecise. What we
want to do here is check that there is an ownership record in the rmap
btree for the AG that contains a btree block.
For an inode-rooted btree (e.g. the bmbt) the per-AG btree cursors have
not been initialized because inode btrees can span multiple AGs.
Therefore, we must initialize the per-AG btree cursors in sc->sa before
proceeding. That is what init_sa controls, and hence the logic should
be gated on XFS_BTREE_ROOT_IN_INODE, not XFS_BTREE_LONG_PTRS.
In practice, ROOT_IN_INODE and LONG_PTRS are coincident so this hasn't
mattered. However, we're about to refactor both of those flags into
separate btree_ops fields so we want this the logic to make sense
afterwards.
Fixes: 858333dcf021a ("xfs: check btree block ownership with bnobt/rmapbt when scrubbing btree") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:34:12 +0000 (12:34 -0800)]
xfs: drop XFS_BTREE_CRC_BLOCKS
All existing btree types set XFS_BTREE_CRC_BLOCKS when running against a
V5 filesystem. All currently proposed btree types are V5 only and use
the richer XFS_BTREE_CRC_BLOCKS format. Therefore, we can drop this
flag and change the conditional to xfs_has_crc.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:33:05 +0000 (12:33 -0800)]
xfs: repair summary counters
Use the same summary counter calculation infrastructure to generate new
values for the in-core summary counters. The difference between the
scrubber and the repairer is that the repairer will freeze the fs during
setup, which means that the values should match exactly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:33:04 +0000 (12:33 -0800)]
xfs: update health status if we get a clean bill of health
If scrub finds that everything is ok with the filesystem, we need a way
to tell the health tracking that it can let go of indirect health flags,
since indirect flags only mean that at some point in the past we lost
some context.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:33:03 +0000 (12:33 -0800)]
xfs: add secondary and indirect classes to the health tracking system
Establish two more classes of health tracking bits:
* Indirect problems, which suggest problems in other health domains
that we weren't able to preserve.
* Secondary problems, which track state that's related to primary
evidence of health problems; and
The first class we'll use in an upcoming patch to record in the AG
health status the fact that we ran out of memory and had to inactivate
an inode with defective metadata. The second class we use to indicate
that repair knows that an inode is bad and we need to fix it later.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Thu, 22 Feb 2024 20:31:03 +0000 (12:31 -0800)]
xfs: report ag header corruption errors to the health tracking system
Whenever we encounter a corrupt AG header, we should report that to the
health monitoring system for later reporting. Buffer readers that don't
respond to corruption events with a _mark_sick call can be detected with
the following script:
Darrick J. Wong [Thu, 22 Feb 2024 20:31:02 +0000 (12:31 -0800)]
xfs: report fs corruption errors to the health tracking system
Whenever we encounter corrupt fs metadata, we should report that to the
health monitoring system for later reporting. A convenient program for
identifying places to insert xfs_*_mark_sick calls is as follows:
Darrick J. Wong [Thu, 22 Feb 2024 20:31:01 +0000 (12:31 -0800)]
xfs: separate the marking of sick and checked metadata
Split the setting of the sick and checked masks into separate functions
as part of preparing to add the ability for regular runtime fs code
(i.e. not scrub) to mark metadata structures sick when corruptions are
found. Improve the documentation of libxfs' requirements for helper
behavior.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>