Darrick J. Wong [Wed, 20 Dec 2023 16:53:46 +0000 (08:53 -0800)]
xfs_scrub: don't retry unsupported optimizations
If the kernel says it doesn't support optimizing a data structure, we
should mark it done and move on. This is much better than requeuing the
repair, in which case it will likely keep failing. Eventually these
requeued repairs end up in the single-threaded last resort at the end of
phase 4, which makes things /very/ slow.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 20 Dec 2023 16:53:46 +0000 (08:53 -0800)]
xfs_scrub: handle spurious wakeups in scan_fs_tree
Coverity reminded me that the pthread_cond_wait can wake up and return
without the predicate variable (sft.nr_dirs > 0) actually changing.
Therefore, one has to retest the condition after each wakeup.
Coverity-id: 1554280 Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 20 Dec 2023 16:53:46 +0000 (08:53 -0800)]
xfs_io: extract control number parsing routines
Break out the parts of parse_args that extract control numbers from the
CLI arguments, so that the function isn't as long. This isn't all that
exciting now, but the scrub vectorization speedups will introduce a new
ioctl. For the new command that comes with that, we'll want the control
number parsing helpers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 20 Dec 2023 16:53:45 +0000 (08:53 -0800)]
xfs_io: collapse trivial helpers
Simply the call chain by having parse_args set the scrub ioctl
parameters in the caller's object. The parse_args callers can then
invoke the ioctl directly, eliminating one function and one indirect
call.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 20 Dec 2023 16:53:45 +0000 (08:53 -0800)]
xfs_mdrestore: refactor progress printing and sb fixup code
Now that we've fixed the dissimilarities between the two progress
printing callsites, refactor them into helpers. Do the same for the
duplicate code that clears sb_inprogress from the primary superblock
after the copy succeeds.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 20 Dec 2023 16:53:45 +0000 (08:53 -0800)]
xfs_mdrestore: fix missed progress reporting
Currently, the progress reporting only triggers when the number of bytes
read is exactly a multiple of a megabyte. This isn't always guaranteed,
since AG headers can be 512 bytes in size. Fix the algorithm by
recording the number of megabytes we've reported as being read, and emit
a new report any time the bytes_read count, once converted to megabytes,
doesn't match.
Fix the v2 code to emit one final status message in case the last
extent restored is more than a megabyte.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
Darrick J. Wong [Wed, 20 Dec 2023 16:53:44 +0000 (08:53 -0800)]
xfs_mdrestore: fix uninitialized variables in mdrestore main
Coverity complained about the "is fd a file?" flags being uninitialized.
Clean this up.
Coverity-id: 1554270 Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
Darrick J. Wong [Wed, 20 Dec 2023 16:53:43 +0000 (08:53 -0800)]
xfs_copy: actually do directio writes to block devices
Not sure why block device targets don't get O_DIRECT in !buffered mode,
but it's misleading when the copy completes instantly only to stall
forever due to fsync-on-close. Adjust the "write last sector" code to
allocate a properly aligned buffer.
In removing the onstack buffer for EOD writes, this also corrects the
buffer being larger than necessary -- the old code declared an array of
32768 pointers, whereas all we really need is an aligned 32768-byte
buffer.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 20 Dec 2023 16:53:43 +0000 (08:53 -0800)]
libxfs: don't UAF a requeued EFI
In the kernel, commit 8ebbf262d4684 ("xfs: don't block in busy flushing
when freeing extents") changed the allocator behavior such that AGFL
fixing can return -EAGAIN in response to detection of a deadlock with
the transaction busy extent list. If this happens, we're supposed to
requeue the EFI so that we can roll the transaction and try the item
again.
If a requeue happens, we should not free the xefi pointer in
xfs_extent_free_finish_item or else the retry will walk off a dangling
pointer. There is no extent busy list in userspace so this should
never happen, but let's fix the logic bomb anyway.
We should have ported kernel commit 0853b5de42b47 ("xfs: allow extent
free intents to be retried") to userspace, but neither Carlos nor I
noticed this fine detail. :(
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanbabu@kernel.org>
libxfs: split out a libxfs_dev structure from struct libxfs_init
Most of the content of libxfs_init is members duplicated for each of the
data, log and RT devices. Split those members into a separate
libxfs_dev structure.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxfs: stash away the device fd in struct xfs_buftarg
Cache the open file descriptor for each device in the buftarg
structure and remove the now unused dev_map infrastructure.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs_repair: remove various libxfs_device_to_fd calls
A few places in xfs_repair call libxfs_device_to_fd to get the data
device fd from the data device dev_t stored in the libxfs_init
structure. Just use the file descriptor stored right there directly.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
No need to do a dev_t to fd lookup when the caller already has the fd.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxfs: return the opened fd from libxfs_device_open
So that the caller can stash it away without having to call
xfs_device_to_fd.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxfs_device_open and libxfs_device_close are only used in init.c.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxfs: remove dead size < 0 checks in libxfs_init
libxfs_init initializes the device size to 0 at the start of the function
and libxfs_open_device never sets the size to a negativ value. Remove
these checks as they are dead code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libfrog: make platform_set_blocksize exit on fatal failure
platform_set_blocksize has a fatal argument that is currently only
used to change the printed message. Make it actually fatal similar to
other libfrog platform helpers to simplify the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxfs: remove the setblksize == 1 case in libxfs_device_open
All callers of libxfs_init always pass an actual sector size or zero in
the setblksize member. Remove the unreachable setblksize == 1 case.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxfs: making passing flags to libxfs_init less confusing
The libxfs_xinit stucture has four different ways to pass flags to
libxfs_init:
- the isreadonly argument despite it's name contains various LIBXFS_
flags that go beyond just the readonly flag
- the isdirect flag contains a single LIBXFS_ flag from the same name
- the usebuflock is an integer used as bool
- the bcache_flags member is used to pass flags directly to cache_init()
for the buffer cache
While there is good arguments for keeping the last one separate, all the
others are rather confusing. Consolidate them into a single flags member
using flags in the LIBXFS_* namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxfs: merge the file vs device cases in libxfs_init
The only special handling for an XFS device on a regular file is that
we skip the checks in check_open. Simplify perform those conditionally
instead of duplicating the entire sequence.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxfs: pass a struct libxfs_init to libxfs_alloc_buftarg
Pass a libxfs_init structure to libxfs_alloc_buftarg instead of three
separate dev_t values.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Pass a libxfs_init structure to libxfs_mount instead of three separate
dev_t values.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Make the struct name more usual, and remove the libxfs_init_t typedef.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxlog: remove the global libxfs_xinit x structure
There is no need to export a libxfs_xinit with the somewhat unsuitable
name x from libxlog. Move it into the tools linking against libxlog
that actually need it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxlog: don't require a libxfs_xinit structure for xlog_init
xlog_init currently requires a libxfs_args structure to be passed in,
and then clobbers various log-related arguments to it. There is no
good reason for that as all the required information can be calculated
without it.
Remove the x argument to xlog_init and xlog_is_dirty and the now unused
logBBstart member in struct libxfs_xinit.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxlog: add a helper to initialize a xlog without clobbering the x structure
xfsprogs has three copies of a code sequence to initialize an xlog
structure from a libxfs_init structure. Factor the code into a helper.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxlog: remove the verbose argument to xlog_is_dirty
No caller passes a non-zero verbose argument to xlog_is_dirty.
Remove the argument the code keyed off by it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
xfs_logprint: move all code to set up the fake xlog into logstat()
Isolate the code that sets up the fake xlog into the logstat() helper to
prepare for upcoming changes.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
IRIX has the concept of a volume that has data/log/rt subvolumes (that's
where the subvolume name in Linux comes from), but in the current
Linux-only xfsprogs version trying to pretend we do anything with that
it is just utterly confusing. The volname is basically just a very
obsfucated second way to pass the data device name, so get rid of it
in the libxfs and progs internals.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Stop pretending we try to distinguish between the legacy Unix raw and
block devices nodes. Linux as the only currently support platform never
had them, but other modern Unix variants like FreeBSD also got rid of
this distinction years ago.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxfs: remove the dead {d,log,rt}path variables in libxfs_init
These variables are only initialized, and then unlink is called if they
were changed from the initial value, which can't happen. Remove the
variables and the conditional unlink calls.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
libxfs: remove the unused icache_flags member from struct libxfs_xinit
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Eric Biggers [Fri, 13 Oct 2023 06:26:39 +0000 (23:26 -0700)]
xfs_io/encrypt: support specifying crypto data unit size
Add an '-s' option to the 'set_encpolicy' command of xfs_io to allow
exercising the log2_data_unit_size field that is being added to struct
fscrypt_policy_v2 (kernel patch:
https://lore.kernel.org/linux-fscrypt/20230925055451.59499-6-ebiggers@kernel.org).
The xfs_io support is needed for xfstests
(https://lore.kernel.org/fstests/20231013061403.138425-1-ebiggers@kernel.org),
which currently relies on xfs_io to access the encryption ioctls.
Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:54 +0000 (18:40 +0530)]
mdrestore: Add support for passing log device as an argument
metadump v2 format allows dumping metadata from external log devices. This
commit allows passing the device file to which log data must be restored from
the corresponding metadump file.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:53 +0000 (18:40 +0530)]
mdrestore: Define mdrestore ops for v2 format
This commit adds functionality to restore metadump stored in v2 format.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:52 +0000 (18:40 +0530)]
mdrestore: Extract target device size verification into a function
A future commit will need to perform the device size verification on an
external log device. In preparation for this, this commit extracts the
relevant portions into a new function. No functional changes have been
introduced.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:51 +0000 (18:40 +0530)]
mdrestore: Introduce mdrestore v1 operations
In order to indicate the version of metadump files that they can work with,
this commit renames read_header(), show_info() and restore() functions to
read_header_v1(), show_info_v1() and restore_v1() respectively.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:50 +0000 (18:40 +0530)]
mdrestore: Replace metadump header pointer argument with a union pointer
We will need two variants of read_header(), show_info() and restore() helper
functions to support two versions of metadump formats. To this end, A future
commit will introduce a vector of function pointers to work with the two
metadump formats. To have a common function signature for the function
pointers, this commit replaces the first argument of the previously listed
function pointers from "struct xfs_metablock *" with "union
mdrestore_headers *".
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:49 +0000 (18:40 +0530)]
mdrestore: Add open_device(), read_header() and show_info() functions
This commit moves functionality associated with opening the target device,
reading metadump header information and printing information about the
metadump into their respective functions. There are no functional changes made
by this commit.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:48 +0000 (18:40 +0530)]
mdrestore: Detect metadump v1 magic before reading the header
In order to support both v1 and v2 versions of metadump, mdrestore will have
to detect the format in which the metadump file has been stored on the disk
and then read the ondisk structures accordingly. In a step in that direction,
this commit splits the work of reading the metadump header from disk into two
parts,
1. Read the first 4 bytes containing the metadump magic code.
2. Read the remaining part of the header.
A future commit will take appropriate action based on the value of the magic
code.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:47 +0000 (18:40 +0530)]
mdrestore: Define and use struct mdrestore
This commit collects all state tracking variables in a new "struct mdrestore"
structure. This is done to collect all the global variables in one place
rather than having them spread across the file. A new structure member of type
"struct mdrestore_ops *" will be added by a future commit to support the two
versions of metadump.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:46 +0000 (18:40 +0530)]
mdrestore: Declare boolean variables with bool type
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:45 +0000 (18:40 +0530)]
xfs_db: Add support to read from external log device
This commit introduces a new function set_log_cur() allowing xfs_db to read
from an external log device. This is required by a future commit which will
add the ability to dump metadata from external log devices.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:44 +0000 (18:40 +0530)]
metadump: Define metadump ops for v2 format
This commit adds functionality to dump metadata from an XFS filesystem in
newly introduced v2 format.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
The "struct xfs_metadump_header" is followed by alternating series of "struct
xfs_meta_extent" and the extent itself.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:42 +0000 (18:40 +0530)]
metadump: Rename XFS_MD_MAGIC to XFS_MD_MAGIC_V1
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:41 +0000 (18:40 +0530)]
metadump: Introduce metadump v1 operations
This commit moves functionality associated with writing metadump to disk into
a new function. It also renames metadump initialization, write and release
functions to reflect the fact that they work with v1 metadump files.
The metadump initialization, write and release functions are now invoked via
metadump_ops->init(), metadump_ops->write() and metadump_ops->release()
respectively.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:40 +0000 (18:40 +0530)]
metadump: Introduce struct metadump_ops
We will need two sets of functions to implement two versions of metadump. This
commit adds the definition for 'struct metadump_ops' to hold pointers to
version specific metadump functions.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:39 +0000 (18:40 +0530)]
metadump: Postpone invocation of init_metadump()
The metadump v2 initialization function (introduced in a later commit) writes
the header structure into the metadump file. This will require the program to
open the metadump file before the initialization function has been invoked.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:38 +0000 (18:40 +0530)]
metadump: Add initialization and release functions
Move metadump initialization and release functionality into corresponding
functions. There are no functional changes made in this commit.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:37 +0000 (18:40 +0530)]
metadump: Define and use struct metadump
This commit collects all state tracking variables in a new "struct metadump"
structure. This is done to collect all the global variables in one place
rather than having them spread across the file. A new structure member of type
"struct metadump_ops *" will be added by a future commit to support the two
versions of metadump.
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:36 +0000 (18:40 +0530)]
metadump: Declare boolean variables with bool type
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:35 +0000 (18:40 +0530)]
mdrestore: Fix logic used to check if target device is large enough
The device size verification code should be writing XFS_MAX_SECTORSIZE bytes
to the end of the device rather than "sizeof(char *) * XFS_MAX_SECTORSIZE"
bytes.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Chandan Babu R [Mon, 6 Nov 2023 13:10:34 +0000 (18:40 +0530)]
metadump: Use boolean values true/false instead of 1/0
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
repair: fix the call to search_rt_dup_extent in scan_bmapbt
search_rt_dup_extent expects an RT extent number and not a fsbno.
Convert the units before the call. Without this we are unlikely
to ever found a legit duplicate extent on the RT subvolume because
the search will always be off the end.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Pavel Reichl [Wed, 11 Oct 2023 20:50:54 +0000 (22:50 +0200)]
xfs_quota: fix missing mount point warning
When user have mounted an XFS volume, and defined project in
/etc/projects file that points to a directory on a different volume,
then:
`xfs_quota -xc "report -a" $path_to_mounted_volume'
complains with:
"xfs_quota: cannot find mount point for path \
`directory_from_projects': Invalid argument"
unlike `xfs_quota -xc "report -a"' which works as expected and no
warning is printed.
This is happening because in the 1st call we pass to xfs_quota command
the $path_to_mounted_volume argument which says to xfs_quota not to
look for all mounted volumes on the system, but use only those passed
to the command and ignore all others (This behavior is intended as an
optimization for systems with huge number of mounted volumes). After
that, while projects are initialized, the project's directories on
other volumes are obviously not in searched subset of volumes and
warning is printed.
I propose to fix this behavior by conditioning the printing of warning
only if all mounted volumes are searched.
Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
If we reduce the number of blocks in an AG, we must update the incore
geometry values as well.
Fixes: 0800169e3e2c9 ("xfs: Pre-calculate per-AG agbno geometry") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Users reported regressions due to enabling multi-grained timestamps
unconditionally. As no clear consensus on a solution has come up and the
discussion has gone back to the drawing board revert the infrastructure
changes for. If it isn't code that's here to stay, make it go away.
Message-ID: <20230920-keine-eile-c9755b5825db@brauner> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Log recovery has always run on read only mounts, even where the primary
superblock advertises unknown rocompat bits. Due to a misunderstanding
between Eric and Darrick back in 2018, we accidentally changed the
superblock write verifier to shutdown the fs over that exact scenario.
As a result, the log cleaning that occurs at the end of the mounting
process fails if there are unknown rocompat bits set.
As we now allow writing of the superblock if there are unknown rocompat
bits set on a RO mount, we no longer want to turn off RO state to allow
log recovery to succeed on a RO mount. Hence we also remove all the
(now unnecessary) RO state toggling from the log recovery path.
Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier" Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Enable multigrain timestamps, which should ensure that there is an
apparent change to the timestamp whenever it has been written after
being actively observed via getattr.
Also, anytime the mtime changes, the ctime must also change, and those
are now the only two options for xfs_trans_ichgtime. Have that function
unconditionally bump the ctime, and ASSERT that XFS_ICHGTIME_CHG is
always set.
Acked-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230807-mgctime-v7-11-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Add a new (superuser-only) flag to the online metadata repair ioctl to
force it to rebuild structures, even if they're not broken. We will use
this to move metadata structures out of the way during a free space
defragmentation operation.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-80-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
CRC selftests running on a build host were useful long time ago, when
CRC support was added to the on-disk support. Now it's purpose is
replaced by fstests. Also mkfs.xfs and xfs_repair have their own
selftests.
On top of that, it adds a dependency on liburcu on the build host for
no reason - liburcu is not used by the crc32 selftest.
Signed-off-by: Krzesimir Nowak <knowak@microsoft.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Tue, 12 Sep 2023 19:47:51 +0000 (12:47 -0700)]
libxfs: fix atomic64_t detection on x86 32-bit architectures
xfsprogs during compilation tries to detect if liburcu supports atomic
64-bit ops on the platform it is being compiled on, and if not it falls
back to using pthread mutex locks.
The detection logic for that fallback relies on _uatomic_link_error()
which is a link-time trick used by liburcu that will cause compilation
errors on archs that lack the required support. That only works for the
generic liburcu code though, and it is not implemented for the
x86-specific code.
In practice this means that when xfsprogs is compiled on 32-bit x86
archs will successfully link to liburcu for atomic ops, but liburcu does
not support atomic64_t on those archs. It indicates this during runtime
by generating an illegal instruction that aborts execution, and thus
causes various xfsprogs utils to be segfaulting.
Fix this by requiring that unsigned longs are at least 64 bits in size,
which /usually/ means that 64-bit atomic counters are supported. We
can't simply execute the liburcu atomic64_t detection code during
configure instead of only relying on the linker error because that
doesn't work for cross-compiled packages.
Fixes: 7448af588a2e ("libxfs: fix atomic64_t poorly for 32-bit architectures") Reported-by: Anthony Iliopoulos <ailiop@suse.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Tue, 12 Sep 2023 19:40:04 +0000 (12:40 -0700)]
xfs_repair: set aformat and anextents correctly when clearing the attr fork
Ever since commit b42db0860e130 ("xfs: enhance dinode verifier"), we've
required that inodes with zero di_forkoff must also have di_aformat ==
EXTENTS and di_naextents == 0. clear_dinode_attr actually does this,
but then both callers inexplicably set di_format = LOCAL. That in turn
causes a verifier failure the next time the xattrs of that file are
read by the kernel. Get rid of the bogus field write.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Tue, 12 Sep 2023 19:39:52 +0000 (12:39 -0700)]
libxfs: use XFS_IGET_CREATE when creating new files
Use this flag to check that newly allocated inodes are, in fact,
unallocated. This matches the kernel, and prevents userspace programs
from making latent corruptions worse by unintentionally crosslinking
files.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Tue, 12 Sep 2023 19:39:41 +0000 (12:39 -0700)]
libfrog: fix overly sleep workqueues
I discovered the following bad behavior in the workqueue code when I
noticed that xfs_scrub was running single-threaded despite having 4
virtual CPUs allocated to the VM. I observed this sequence:
workqueue_add
next_item != NULL
<do not pthread_cond_signal>
<receives wakeup>
<run first item>
workqueue_add
next_item != NULL
<do not pthread_cond_signal>
<run second item>
<run third item>
pthread_cond_wait
workqueue_terminate
pthread_cond_broadcast
<receives wakeup>
<nothing to do, exits>
<wakes up again>
<nothing to do, exits>
Notice how threads WQ2...N are completely idle while WQ1 ends up doing
all the work! That wasn't the point of a worker pool! Observe that
thread 1 manages to queue two work items before WQ1 pulls the first item
off the queue. When thread 1 queues the third item, it sees that
next_item is not NULL, so it doesn't wake a worker. If thread 1 queues
all the N work that it has before WQ1 empties the queue, then none of
the other thread get woken up.
Fix this by maintaining a count of the number of active threads, and
using that to wake either the sole idle thread, or all the threads if
there are many that are idle. This dramatically improves startup
behavior of the workqueue and eliminates the collapse case.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Mon, 25 Sep 2023 21:59:16 +0000 (14:59 -0700)]
xfs_db: use directio for device access
XFS and tools (mkfs, copy, repair) don't generally rely on the block
device page cache, preferring instead to use directio. For whatever
reason, the debugger was never made to do this, but let's do that now.
This should eliminate the weird fstests failures resulting from
udev/blkid pinning a cache page while the unmounting filesystem writes
to the superblock such that xfs_db finds the stale pagecache instead of
the post-unmount superblock.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Mon, 25 Sep 2023 21:59:10 +0000 (14:59 -0700)]
libxfs: make platform_set_blocksize optional with directio
If we're accessing the block device with directio (and hence bypassing
the page cache), then don't fail on BLKBSZSET not working. We don't
care what happens to the pagecache bufferheads.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Mon, 25 Sep 2023 21:59:25 +0000 (14:59 -0700)]
mkfs: enable large extent counts by default
Format filesystems with the large extent counter feature turned on.
We shall now support 64-bit extent counts for the data fork and 32-bit
extent counts for the attr fork.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Mon, 25 Sep 2023 21:59:51 +0000 (14:59 -0700)]
xfs_db: create unlinked inodes
Create an expert-mode debugger command to create unlinked inodes.
This will hopefully aid in simulation of leaked unlinked inode handling
in the kernel and elsewhere.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Darrick J. Wong [Mon, 25 Sep 2023 21:59:45 +0000 (14:59 -0700)]
xfs_db: dump unlinked buckets
Create a new command to dump the resource usage of files in the unlinked
buckets.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
As of 6.5-rc1, UBSAN trips over the ondisk extended attribute shortform
definitions using an array length of 1 to pretend to be a flex array.
Kernel compilers have to support unbounded array declarations, so let's
correct this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
As of 6.5-rc1, UBSAN trips over the ondisk extended attribute leaf block
definitions using an array length of 1 to pretend to be a flex array.
Kernel compilers have to support unbounded array declarations, so let's
correct this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
As of 6.5-rc1, UBSAN trips over the attrlist ioctl definitions using an
array length of 1 to pretend to be a flex array. Kernel compilers have
to support unbounded array declarations, so let's correct this. This
may cause friction with userspace header declarations, but suck is life.
The kernel and xfsprogs code that uses these structures will not have
problems, but the long tail of external user programs might.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Similar to the recent patch strengthening the AGF agf_length
verification, the AGI verifier does not check that the AGI length field
is within known good bounds. This isn't currently checked by runtime
kernel code, yet we assume in many places that it is correct and verify
other metadata against it.
Add length verification to the AGI verifier. Just like the AGF length
checking, the length of the AGI must be equal to the size of the AG
specified in the superblock, unless it is the last AG in the filesystem.
In that case, it must be less than or equal to sb->sb_agblocks and
greater than XFS_MIN_AG_BLOCKS, which is the smallest AG a growfs
operation will allow to exist.
There's only one place in the filesystem that actually uses agi_length,
but let's not leave it vulnerable to the same weird nonsense that
generates syzbot bugs, eh?
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Use struct initializers to ensure that the xfs_btree_irecs passed into
the query_range function are completely initialized. No functional
changes, just closing some sloppy hygiene.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Need to happen before we allocate and then leak the xefi. Found by
coverity via an xfsprogs libxfs scan.
[djwong: This also fixes the type of the @agbno argument.]
Fixes: 7dfee17b13e5 ("xfs: validate block number being freed before adding to xefi") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
The AGF verifier does not check that the AGF length field is within
known good bounds. This has never been checked by runtime kernel
code (i.e. the lack of verification goes back to 1993) yet we assume
in many places that it is correct and verify other metdata against
it.
Add length verification to the AGF verifier. The length of the AGF
must be equal to the size of the AG specified in the superblock,
unless it is the last AG in the filesystem. In that case, it must be
less than or equal to sb->sb_agblocks and greater than
XFS_MIN_AG_BLOCKS, which is the smallest AG a growfs operation will
allow to exist.
This requires a bit of rework of the verifier function. We want to
verify metadata before we use it to verify other metadata. Hence
we need to verify the AGF sequence numbers before using them to
verify the length of the AGF. Then we can verify the AGF length
before we verify AGFL fields. Then we can verifier other fields that
are bounds limited by the AGF length.
And, finally, by calculating agf_length only once into a local
variable, we can collapse repeated "if (xfs_has_foo() &&"
conditionaly checks into single checks. This makes the code much
easier to follow as all the checks for a given feature are obviously
in the same place.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
If the journal geometry results in a sector or log stripe unit
validation problem, it indicates that we cannot set the log up to
safely write to the the journal. In these cases, we must abort the
mount because the corruption needs external intervention to resolve.
Similarly, a journal that is too large cannot be written to safely,
either, so we shouldn't allow those geometries to mount, either.
If the log is too small, we risk having transaction reservations
overruning the available log space and the system hanging waiting
for space it can never provide. This is purely a runtime hang issue,
not a corruption issue as per the first cases listed above. We abort
mounts of the log is too small for V5 filesystems, but we must allow
v4 filesystems to mount because, historically, there was no log size
validity checking and so some systems may still be out there with
undersized logs.
The problem is that on V4 filesystems, when we discover a log
geometry problem, we skip all the remaining checks and then allow
the log to continue mounting. This mean that if one of the log size
checks fails, we skip the log stripe unit check. i.e. we allow the
mount because a "non-fatal" geometry is violated, and then fail to
check the hard fail geometries that should fail the mount.
Move all these fatal checks to the superblock verifier, and add a
new check for the two log sector size geometry variables having the
same values. This will prevent any attempt to mount a log that has
invalid or inconsistent geometries long before we attempt to mount
the log.
However, for the minimum log size checks, we can only do that once
we've setup up the log and calculated all the iclog sizes and
roundoffs. Hence this needs to remain in the log mount code after
the log has been initialised. It is also the only case where we
should allow a v4 filesystem to continue running, so leave that
handling in place, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
If the current transaction holds a busy extent and we are trying to
allocate a new extent to fix up the free list, we can deadlock if
the AG is entirely empty except for the busy extent held by the
transaction.
This can occur at runtime processing an XEFI with multiple extents
in this path:
To avoid this deadlock, we should not block in
xfs_extent_busy_flush() if we hold a busy extent in the current
transaction.
Now that the EFI processing code can handle requeuing a partially
completed EFI, we can detect this situation in
xfs_extent_busy_flush() and return -EAGAIN rather than going to
sleep forever. The -EAGAIN get propagated back out to the
xfs_trans_free_extent() context, where the EFD is populated and the
transaction is rolled, thereby moving the busy extents into the CIL.
At this point, we can retry the extent free operation again with a
clean transaction. If we hit the same "all free extents are busy"
situation when trying to fix up the free list, we can safely call
xfs_extent_busy_flush() and wait for the busy extents to resolve
and wake us. At this point, the allocation search can make progress
again and we can fix up the free list.
This deadlock was first reported by Chandan in mid-2021, but I
couldn't make myself understood during review, and didn't have time
to fix it myself.
It was reported again in March 2023, and again I have found myself
unable to explain the complexities of the solution needed during
review.
As such, I don't have hours more time to waste trying to get the
fix written the way it needs to be written, so I'm just doing it
myself. This patchset is largely based on Wengang Wang's last patch,
but with all the unnecessary stuff removed, split up into multiple
patches and cleaned up somewhat.
Reported-by: Chandan Babu R <chandanrlinux@gmail.com> Reported-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
To avoid blocking in xfs_extent_busy_flush() when freeing extents
and the only busy extents are held by the current transaction, we
need to pass the XFS_ALLOC_FLAG_FREEING flag context all the way
into xfs_extent_busy_flush().
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Btrees that aren't freespace management trees use the normal extent
allocation and freeing routines for their blocks. Hence when a btree
block is freed, a direct call to xfs_free_extent() is made and the
extent is immediately freed. This puts the entire free space
management btrees under this path, so we are stacking btrees on
btrees in the call stack. The inobt, finobt and refcount btrees
all do this.
However, the bmap btree does not do this - it calls
xfs_free_extent_later() to defer the extent free operation via an
XEFI and hence it gets processed in deferred operation processing
during the commit of the primary transaction (i.e. via intent
chaining).
We need to change xfs_free_extent() to behave in a non-blocking
manner so that we can avoid deadlocks with busy extents near ENOSPC
in transactions that free multiple extents. Inserting or removing a
record from a btree can cause a multi-level tree merge operation and
that will free multiple blocks from the btree in a single
transaction. i.e. we can call xfs_free_extent() multiple times, and
hence the btree manipulation transaction is vulnerable to this busy
extent deadlock vector.
To fix this, convert all the remaining callers of xfs_free_extent()
to use xfs_free_extent_later() to queue XEFIs and hence defer
processing of the extent frees to a context that can be safely
restarted if a deadlock condition is detected.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Pointers drop_leaf and save_leaf are initialized with values that are never
read, they are being re-assigned later on just before they are used. Remove
the redundant early initializations and keep the later assignments at the
point where they are used. Cleans up two clang scan build warnings:
fs/xfs/libxfs/xfs_attr_leaf.c:2288:29: warning: Value stored to 'drop_leaf'
during its initialization is never read [deadcode.DeadStores]
fs/xfs/libxfs/xfs_attr_leaf.c:2289:29: warning: Value stored to 'save_leaf'
during its initialization is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
The root cause is that during growfs, user space passed in a large value
of newblcoks to xfs_growfs_data_private(), due to current sb_agblocks is
too small, new AG count will exceed UINT_MAX. Because of AG number type
is unsigned int and it would overflow, that caused nagcount much smaller
than the actual value. During AG extent space, delta blocks in
xfs_resizefs_init_new_ags() will much larger than the actual value due to
incorrect nagcount, even exceed UINT_MAX. This will cause corruption and
be detected in __xfs_free_extent. Fix it by growing the filesystem to up
to the maximally allowed AGs and not return EINVAL when new AG count
overflow.
Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>