Finish off the series by moving the intent item recovery function
pointer to the xfs_defer_op_type struct, since this is really a deferred
work function now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Get rid of the open-coded calls to xfs_defer_finish_one. This also
means that the recovery transaction takes care of cleaning up the dfp,
and we have solved (I hope) all the ownership issues in recovery.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
One thing I never quite got around to doing is porting the log intent
item recovery code to reconstruct the deferred pending work state. As a
result, each intent item open codes xfs_defer_finish_one in its recovery
method, because that's what the EFI code did before xfs_defer.c even
existed.
This is a gross thing to have left unfixed -- if an EFI cannot proceed
due to busy extents, we end up creating separate new EFIs for each
unfinished work item, which is a change in behavior from what runtime
would have done.
Worse yet, Long Li pointed out that there's a UAF in the recovery code.
The ->commit_pass2 function adds the intent item to the AIL and drops
the refcount. The one remaining refcount is now owned by the recovery
mechanism (aka the log intent items in the AIL) with the intent of
giving the refcount to the intent done item in the ->iop_recover
function.
However, if something fails later in recovery, xlog_recover_finish will
walk the recovered intent items in the AIL and release them. If the CIL
hasn't been pushed before that point (which is possible since we don't
force the log until later) then the intent done release will try to free
its associated intent, which has already been freed.
This patch starts to address this mess by having the ->commit_pass2
functions recreate the xfs_defer_pending state. The next few patches
will fix the recovery functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Darrick J. Wong [Mon, 15 Apr 2024 23:07:30 +0000 (16:07 -0700)]
xfs_{db,repair}: use m_blockwsize instead of sb_blocksize for rt blocks
In preparation to add block headers to rt bitmap and summary blocks,
convert all the relevant calculations in the userspace tools to use the
per-block word count instead of the raw blocksize. This is key to
adding this support outside of libxfs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Darrick J. Wong [Mon, 15 Apr 2024 23:07:30 +0000 (16:07 -0700)]
xfs_{db,repair}: use accessor functions for summary info words
Port xfs_db and xfs_repair to use get and set functions for rtsummary
words so that we can redefine the ondisk format with a specific
endianness. Note that this requires the definition of a distinct type
for ondisk summary info words so that the compiler can perform proper
typechecking.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Darrick J. Wong [Mon, 15 Apr 2024 23:07:30 +0000 (16:07 -0700)]
xfs_{db,repair}: use accessor functions for bitmap words
Port xfs_db and xfs_repair to use get and set functions for rtbitmap
words so that we can redefine the ondisk format with a specific
endianness. Note that this requires the definition of a distinct type
for ondisk rtbitmap words so that the compiler can perform proper
typechecking as we go back and forth.
In the upcoming rtgroups feature, we're going to fix the problem that
rtwords are written in host endian order, which means we'll need the
distinct rtword/rtword_raw types.
Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Darrick J. Wong [Mon, 15 Apr 2024 23:07:29 +0000 (16:07 -0700)]
xfs_{db,repair}: convert open-coded xfs_rtword_t pointer accesses to helper
There are a bunch of places in xfs_db and xfs_repair where we use
open-coded logic to find a pointer to an xfs_rtword_t within a rt bitmap
buffer. Convert all that to helper functions for better type safety.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Darrick J. Wong [Mon, 15 Apr 2024 23:07:28 +0000 (16:07 -0700)]
xfs_repair: fix confusing rt space units in the duplicate detection code
Christoph Hellwig stumbled over the crosslinked file data detection code
in xfs_repair. While trying to make sense of his fixpatch, I realized
that the variable names and unit types are very misleading.
The rt dup tree builder inserts records in units of realtime extents.
One query of the rt dup tree passes in a realtime extent number, but one
of them does not. Confusingly, all the variable names have "block" even
though they really mean "extent". This makes a real difference for
rextsize > 1 filesystems, though given the lack of complaints I'm
guessing there aren't many users.
Clean up this whole mess by fixing the variable names of the duplicates
tree and the state array to reflect the units that are stored in the
data structure, and fix the buggy query code. Later on in this patchset
we'll fix the variable types too.
This seems to have been broken since before the start of the git repo.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Notice that mkfs thinks it should format a 32768-fsblock external log,
but the log device itself is 32767 fsblocks. Hence the write goes off
the end of the device and we get ENOSPC.
I tracked this behavior down to align_log_size in mkfs, which first
tries to round the log size up to a stripe boundary, then tries to round
it down. Unfortunately, in the case of an external log we call the
function with XFS_MAX_LOG_BLOCKS without accounting for the possibility
that the log device might be smaller.
Correct the callsite and clean up the open-coded rounding.
Fixes: 8d1bff2be336 ("mkfs: reduce internal log size when log stripe units are in play") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Darrick J. Wong [Mon, 15 Apr 2024 23:07:28 +0000 (16:07 -0700)]
libxfs: fix incorrect porting to 6.7
Userspace libxfs is supposed to match the kernel libxfs except for the
preprocessor include directives. Fix a few discrepancies that came up
for whatever reason.
To fix the build errors resulting from CONFIG_XFS_RT not being defined,
add it to libxfs.h and alter the Makefile to track xfs_rtbitmap.h.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>
Darrick J. Wong [Mon, 15 Apr 2024 23:07:27 +0000 (16:07 -0700)]
debian: fix package configuration after removing platform_defs.h.in
In commit 0fa9dcb61b4f, we made platform_defs.h a static header file
instead of generating it from platform_defs.h.in. Unfortunately, it
turns out that the debian packaging rules use "make
include/platform_defs.h" to run configure with the build options
set via LOCAL_CONFIGURE_OPTIONS.
Since platform_defs.h is no longer generated, the make command in
debian/rules does nothing, which means that the binaries don't get built
the way the packaging scripts specify. This breaks multiarch for
libhandle.so, as well as libeditline and libblkid support for
xfs_db/io/spaceman.
Fix this by correcting debian/rules to make include/builddefs, which
will start ./configure with the desired options.
Fixes: 0fa9dcb61b4f ("include: stop generating platform_defs.h") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
fls should never be provided by system headers. It seems like on MacOS
it did, but as we're not supporting MacOS anymore there is no need to
check for it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
HAVE_SEEK_DATA is never defined, so the code in xfs_io just
unconditionally redefines SEEK_DATA and SEEK_HOLE. Switch to the
system version instead, which has been around since Linux 3.1 and
glibc of similar vintage.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Now that the sizeof checks are gone, we can stop generating platform_defs.h.
The only caveat is that we need to stop undefining ENABLE_GETTEXT, which the
generation process had removed before. The actual ENABLE_GETTEXT will be
passd on the compiler command line, just like other ENABLE or HAVE values
from autoconf.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
SIZEOF_LONG together with the unused SIZEOF_CHAR_P is the last thing that
really needs a generated configuration header. Switch to just using
sizeof(long) so that we can stop generating platform_defs.h.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Thu, 22 Feb 2024 22:04:31 +0000 (14:04 -0800)]
xfs_db: don't hardcode 'type data' size at 512b
On a disk with 4096-byte LBAs, the xfs_db 'type data' subcommand doesn't
work:
# xfs_io -c 'sb' -c 'type data' /dev/sda
xfs_db: read failed: Invalid argument
no current object
The cause of this is the hardcoded initialization of bb_count when we're
setting type data -- it should be the filesystem sector size, not just 1.
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Sam James [Mon, 5 Feb 2024 23:23:21 +0000 (23:23 +0000)]
build: Request 64-bit time_t where possible
Suggested by Darrick during LFS review. We take the same approach as in 5c0599b721d1d232d2e400f357abdf2736f24a97 ('Fix building xfsprogs on 32-bit platforms')
to avoid autoconf hell - just take the tried & tested approach which is working
fine for us with LFS already.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Sam James [Mon, 5 Feb 2024 23:23:20 +0000 (23:23 +0000)]
io: Adapt to >= 64-bit time_t
We now require (at least) 64-bit time_t, so we need to adjust some printf
specifiers accordingly.
Unfortunately, we've stumbled upon a ridiculous C mmoment whereby there's
no neat format specifier (not even one of the inttypes ones) for time_t, so
we cast to intmax_t and use %jd.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Violet Purcell [Mon, 5 Feb 2024 23:23:19 +0000 (23:23 +0000)]
Remove use of LFS64 interfaces
LFS64 interfaces are non-standard and are being removed in the upcoming musl
1.2.5. Setting _FILE_OFFSET_BITS=64 (which is currently being done) makes all
interfaces on glibc 64-bit by default, so using the LFS64 interfaces is
redundant. This commit replaces all occurences of off64_t with off_t,
stat64 with stat, and fstat64 with fstat.
Link: https://bugs.gentoo.org/907039 Cc: Felix Janda <felix.janda@posteo.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Violet Purcell <vimproved@inventati.org> Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Discovered when trying to track down a weird recovery corruption
issue that wasn't detected at recovery time.
The specific corruption was a zero extent count field when big
extent counts are in use, and it turns out the dinode verifier
doesn't detect that specific corruption case, either. So fix it too.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
XFS: Internal error i != 1 at line 3526 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_btree_insert+0x1ec/0x280
...
Call Trace:
xfs_corruption_error+0x94/0xa0
xfs_btree_insert+0x221/0x280
xfs_alloc_fixup_trees+0x104/0x3e0
xfs_alloc_ag_vextent_size+0x667/0x820
xfs_alloc_fix_freelist+0x5d9/0x750
xfs_free_extent_fix_freelist+0x65/0xa0
__xfs_free_extent+0x57/0x180
...
This is the XFS_IS_CORRUPT() check in xfs_btree_insert() when
xfs_btree_insrec() fails.
After converting this into a panic and dissecting the core dump, I found
that xfs_btree_insrec() is failing because it's trying to split a leaf
node in the cntbt when the AG free list is empty. In particular, it's
failing to get a block from the AGFL _while trying to refill the AGFL_.
If a single operation splits every level of the bnobt and the cntbt (and
the rmapbt if it is enabled) at once, the free list will be empty. Then,
when the next operation tries to refill the free list, it allocates
space. If the allocation does not use a full extent, it will need to
insert records for the remaining space in the bnobt and cntbt. And if
those new records go in full leaves, the leaves (and potentially more
nodes up to the old root) need to be split.
Fix it by accounting for the additional splits that may be required to
refill the free list in the calculation for the minimum free list size.
P.S. As far as I can tell, this bug has existed for a long time -- maybe
back to xfs-history commit afdf80ae7405 ("Add XFS_AG_MAXLEVELS macros
...") in April 1994! It requires a very unlucky sequence of events, and
in fact we didn't hit it until a particular sparse mmap workload updated
from 5.12 to 5.19. But this bug existed in 5.12, so it must've been
exposed by some other change in allocation or writeback patterns. It's
also much less likely to be hit with the rmapbt enabled, since that
increases the minimum free list size and is unlikely to split at the
same time as the bnobt and cntbt.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
When recovering intents, we capture newly created intent items as part of
point, we forget to remove those newly created intent items from the AIL
and hang:
When newly created intent items fail to commit via transaction, intent
recovery hasn't created done items for these newly created intent items,
so the capture structure is the sole owner of the captured intent items.
We must release them explicitly or else they leak:
Fix the problem above by abort intent items that don't have a done item
when recovery intents fail.
Fixes: e6fff81e4870 ("xfs: proper replay of deferred ops queued during log recovery") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Factor out xfs_defer_pending_abort() from xfs_defer_trans_abort(), which
not use transaction parameter, so it can be used after the transaction
life cycle.
Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
In commit 355e3532132b ("xfs: cache minimum realtime summary level"), I
added a cache of the minimum level of the realtime summary that has any
free extents. However, it turns out that the _maximum_ level is more
useful for upcoming optimizations, and basically equivalent for the
existing usage. So, let's change the meaning of the cache to be the
maximum level + 1, or 0 if there are no free extents.
For example, if the cache contains:
{0, 4}
then there are no free extents starting in realtime bitmap block 0, and
there are no free extents larger than or equal to 2^4 blocks starting in
realtime bitmap block 1. The cache is a loose upper bound, so there may
or may not be free extents smaller than 2^4 blocks in realtime bitmap
block 1.
Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Now that xfs_rtalloc_args holds references to the last-read bitmap and
summary blocks, we don't need to pass the buffer pointer out of
xfs_rtbuf_get.
Callers no longer have to xfs_trans_brelse on their own, though they are
required to call xfs_rtbuf_cache_relse before the xfs_rtalloc_args goes
out of scope.
While we're at it, create some trivial helpers so that we don't have to
remember if "0" means "bitmap" and "1" means "summary".
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Profiling a workload on a highly fragmented realtime device showed a ton
of CPU cycles being spent in xfs_trans_read_buf() called by
xfs_rtbuf_get(). Further tracing showed that much of that was repeated
calls to xfs_rtbuf_get() for the same block of the realtime bitmap.
These come from xfs_rtallocate_extent_block(): as it walks through
ranges of free bits in the bitmap, each call to xfs_rtcheck_range() and
xfs_rtfind_{forw,back}() gets the same bitmap block. If the bitmap block
is very fragmented, then this is _a lot_ of buffer lookups.
The realtime allocator already passes around a cache of the last used
realtime summary block to avoid repeated reads (the parameters rbpp and
rsb). We can do the same for the realtime bitmap.
This replaces rbpp and rsb with a struct xfs_rtbuf_cache, which caches
the most recently used block for both the realtime bitmap and summary.
xfs_rtbuf_get() now handles the caching instead of the callers, which
requires plumbing xfs_rtbuf_cache to more functions but also makes sure
we don't miss anything.
Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Consolidate the arguments passed around the rt allocator into a
struct xfs_rtalloc_arg similar to how the btree allocator arguments
are consolidated in a struct xfs_alloc_arg....
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Create get and set functions for rtsummary words so that we can redefine
the ondisk format with a specific endianness. Note that this requires
the definition of a distinct type for ondisk summary info words so that
the compiler can perform proper typechecking.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Create get and set functions for rtbitmap words so that we can redefine
the ondisk format with a specific endianness. Note that this requires
the definition of a distinct type for ondisk rtbitmap words so that the
compiler can perform proper typechecking as we go back and forth.
In the upcoming rtgroups feature, we're going to fix the problem that
rtwords are written in host endian order, which means we'll need the
distinct rtword/rtword_raw types.
Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Create an explicit helper function to log parts of rt bitmap and summary
blocks. While we're at it, fix an off-by-one error in two of the
rtbitmap logging calls that led to unnecessarily large log items but was
otherwise benign.
Note that the upcoming rtgroups patchset will add block headers to the
rtbitmap and rtsummary files. The helpers in this and the next few
patches take a less than direct route through xfs_rbmblock_wordptr and
xfs_rsumblock_infoptr to avoid helper churn in that patchset.
Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
There are a bunch of places where we use open-coded logic to find a
pointer to an xfs_rtword_t within a rt bitmap buffer. Convert all that
to helper functions for better type safety.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Replace these macros with typechecked helper functions. Eventually
we're going to add more logic to the helpers and it'll be easier if we
don't have to macro it up.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Convert these calls to use the helpers, and clean up all these places
where the same variable can have different units depending on where it
is in the function.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Create helpers to do unit conversions of rt block numbers to rt extent
numbers. There are three variations -- one to compute the rt extent
number from an rt block number; one to compute the offset of an rt block
within an rt extent; and one to extract both.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Further disambiguate the xfs_rtblock_t uses by creating a new type,
xfs_rtxnum_t, to store the position of an extent within the realtime
section, in units of rtextents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
This helper function validates that a range of *blocks* in the
realtime section is completely contained within the realtime section.
It does /not/ validate ranges of *rtextents*. Rename the function to
avoid suggesting that it does, and change the type of the @len parameter
since xfs_rtblock_t is a position unit, not a length unit.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
XFS uses xfs_rtblock_t for many different uses, which makes it much more
difficult to perform a unit analysis on the codebase. One of these
(ab)uses is when we need to store the length of a free space extent as
stored in the realtime bitmap. Because there can be up to 2^64 realtime
extents in a filesystem, we need a new type that is larger than
xfs_rtxlen_t for callers that are querying the bitmap directly. This
means scrub and growfs.
Create this type as "xfs_rtbxlen_t" and use it to store 64-bit rtx
lengths. 'b' stands for 'bitmap' or 'big'; reader's choice.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
In most of the filesystem, we use xfs_extlen_t to store the length of a
file (or AG) space mapping in units of fs blocks. Unfortunately, the
realtime allocator also uses it to store the length of a rt space
mapping in units of rt extents. This is confusing, since one rt extent
can consist of many fs blocks.
Separate the two by introducing a new type (xfs_rtxlen_t) to store the
length of a space mapping (in units of realtime extents) that would be
found in a file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
The unit conversions in this function do not make sense. First we
convert a block count to bytes, then divide that bytes value by
rextsize, which is in blocks, to get an rt extent count. You can't
divide bytes by blocks to get a (possibly multiblock) extent value.
Fortunately nobody uses delalloc on the rt volume so this hasn't
mattered.
Fixes: fa5c836ca8eb5 ("xfs: refactor xfs_bunmapi_cow") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Currently, xfs_bmap_del_extent_real contains a bunch of code to convert
the physical extent of a data fork mapping for a realtime file into rt
extents and pass that to the rt extent freeing function. Since the
details of this aren't needed when CONFIG_XFS_REALTIME=n, move it to
xfs_rtbitmap.c to reduce code size when realtime isn't enabled.
This will (one day) enable realtime EFIs to reuse the same
unit-converting call with less code duplication.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
The latest version of the fs geometry structure is v5. Bump this
constant so that xfs_db and mkfs calls to libxfs_fs_geometry will fill
out all the fields.
IOWs, this commit is a no-op for the kernel, but will be useful for
userspace reporting in later changes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Carlos Maiolino [Tue, 23 Jan 2024 10:31:17 +0000 (11:31 +0100)]
Merge tag 'scruball-service-fixes-6.6_2024-01-11' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next
xfs_scrub_all: fixes for systemd services [v28.3 5/6]
This patchset ties up some problems in the xfs_scrub_all program and
service, which are essential for finding mounted filesystems to scrub
and creating the background service instances that do the scrub.
First, we need to fix various errors in pathname escaping, because
systemd does /not/ like slashes in service names. Then, teach
xfs_scrub_all to deal with systemd restarts causing it to think that a
scrub has finished before the service actually finishes. Finally,
implement a signal handler so that SIGINT (console ^C) and SIGTERM
(systemd stopping the service) shut down the xfs_scrub@ services
correctly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Carlos Maiolino [Tue, 23 Jan 2024 10:30:03 +0000 (11:30 +0100)]
Merge tag 'scrub-service-fixes-6.6_2024-01-11' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next
xfs_scrub: fixes for systemd services [v28.3 4/6]
This series fixes deficiencies in the systemd services that were created
to manage background scans. First, improve the debian packaging so that
services get installed at package install time. Next, fix copyright and
spdx header omissions.
Finally, fix bugs in the mailer scripts so that scrub failures are
reported effectively. Finally, fix xfs_scrub_all to deal with systemd
restarts causing it to think that a scrub has finished before the
service actually finishes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Carlos Maiolino [Tue, 23 Jan 2024 10:28:50 +0000 (11:28 +0100)]
Merge tag 'scrub-repair-fixes-6.6_2024-01-11' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next
xfs_scrub: fixes to the repair code [v28.3 3/6]
Now that we've landed the new kernel code, it's time to reorganize the
xfs_scrub code that handles repairs. Clean up various naming warts and
misleading error messages. Move the repair code to scrub/repair.c as
the first step. Then, fix various issues in the repair code before we
start reorganizing things.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Carlos Maiolino [Tue, 23 Jan 2024 10:27:23 +0000 (11:27 +0100)]
Merge tag 'scrub-fix-legalese-6.6_2024-01-11' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev into for-next
xfs_scrub: fix licensing and copyright notices [v28.3 2/6]
Fix various attribution problems in the xfs_scrub source code, such as
the author's contact information, out of date SPDX tags, and a rough
estimate of when the feature was under heavy development. The most
egregious parts are the files that are missing license information
completely.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Fri, 12 Jan 2024 02:07:07 +0000 (18:07 -0800)]
xfs_scrub_all: fix termination signal handling
Currently, xfs_scrub_all does not handle termination signals well.
SIGTERM and SIGINT are left to their default handlers, which are
immediate termination of the process group in the case of SIGTERM and
raising KeyboardInterrupt in the case of SIGINT.
Terminating the process group is fine when the xfs_scrub processes are
direct children, but this completely doesn't work if we're farming the
work out to systemd services since we don't terminate the child service.
Instead, they keep going.
Raising KeyboardInterrupt doesn't work because once the main thread
calls sys.exit at the bottom of main(), it blocks in the python runtime
waiting for child threads to terminate. There's no longer any context
to handle an exception, so the signal is ignored and no child processes
are killed.
In other words, if you try to kill a running xfs_scrub_all, chances are
good it won't kill the child xfs_scrub processes. This is undesirable
and egregious since we actually have the ability to track and kill all
the subprocesses that we create.
Solve the subproblem of getting stuck in the python runtime by calling
it repeatedly until we no longer have subprocesses. This means that the
main thread loops until all threads have exited.
Solve the subproblem of the signals doing the wrong thing by setting up
our own signal handler that can wake up the main thread and initiate
subprocess shutdown, no matter whether the subprocesses are systemd
services or directly fork/exec'd.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 12 Jan 2024 02:07:06 +0000 (18:07 -0800)]
xfs_scrub_all.cron: move to package data directory
cron jobs don't belong in /usr/lib. Since the cron job is also
secondary to the systemd timer, it's really only provided as a courtesy
for distributions that don't use systemd. Move it to @datadir@, aka
/usr/share/xfsprogs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Neal Gompa <neal@gompa.dev> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 12 Jan 2024 02:07:06 +0000 (18:07 -0800)]
xfs_scrub_fail: move executable script to /usr/libexec
Per FHS 3.0, non-PATH executable binaries are supposed to live under
/usr/libexec, not /usr/lib. xfs_scrub_fail is an executable script,
so move it to libexec in case some distro some day tries to mount
/usr/lib as noexec or something.
Darrick J. Wong [Fri, 12 Jan 2024 02:07:06 +0000 (18:07 -0800)]
xfs_scrub_all: survive systemd restarts when waiting for services
If xfs_scrub_all detects a running systemd, it will use it to invoke
xfs_scrub subprocesses in a sandboxed and resource-controlled
environment. Unfortunately, if you happen to restart dbus or systemd
while it's running, you get this:
systemd[1]: Reexecuting.
xfs_scrub_all[9958]: Warning! D-Bus connection terminated.
xfs_scrub_all[9956]: Warning! D-Bus connection terminated.
xfs_scrub_all[9956]: Failed to wait for response: Connection reset by peer
xfs_scrub_all[9958]: Failed to wait for response: Connection reset by peer
xfs_scrub_all[9930]: Scrubbing / done, (err=1)
xfs_scrub_all[9930]: Scrubbing /storage done, (err=1)
The xfs_scrub units themselves are still running, it's just that the
`systemctl start' command that xfs_scrub_all uses to start and wait for
the unit lost its connection to dbus and hence is no longer monitoring
sub-services.
When this happens, we don't have great options -- systemctl doesn't have
a command to wait on an activating (aka running) unit. Emulate the
functionality we normally get by polling the failed/active statuses.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 12 Jan 2024 02:07:06 +0000 (18:07 -0800)]
xfs_scrub_all: fix argument passing when invoking xfs_scrub manually
Currently, xfs_scrub_all will try to invoke xfs_scrub with argv[1] being
"-n -x". This of course is recognized by C getopt as a weird looking
string, not two individual arguments, and causes the child process to
exit with complaints about CLI usage.
What we really want is to split the string into a proper array and then
add them to the xfs_scrub command line. The code here isn't strictly
correct, but as @scrub_args@ is controlled by us in the Makefile, it'll
do for now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>