Currently on-disk feature checks require decoding the superblock
fileds and so can be non-trivial. We have almost 400 hundred
individual feature checks in the XFS code, so this is a significant
amount of code. To reduce runtime check overhead, pre-process all
the version flags into a features field in the xfs_mount at mount
time so we can convert all the feature checks to a simple flag
check.
There is also a need to convert the dynamic feature flags to update
the m_features field. This is required for attr, attr2 and quota
features. New xfs_mount based wrappers are added for this.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The attr2 feature is somewhat unique in that it has both a superblock
feature bit to enable it and mount options to enable and disable it.
Back when it was first introduced in 2005, attr2 was disabled unless
either the attr2 superblock feature bit was set, or the attr2 mount
option was set. If the superblock feature bit was not set but the
mount option was set, then when the first attr2 format inode fork
was created, it would set the superblock feature bit. This is as it
should be - the superblock feature bit indicated the presence of the
attr2 on disk format.
The noattr2 mount option, however, did not affect the superblock
feature bit. If noattr2 was specified, the on-disk superblock
feature bit was ignored and the code always just created attr1
format inode forks. If neither of the attr2 or noattr2 mounts
option were specified, then the behaviour was determined by the
superblock feature bit.
This was all pretty sane.
Fast foward 3 years, and we are dealing with fallout from the
botched sb_features2 addition and having to deal with feature
mismatches between the sb_features2 and sb_bad_features2 fields. The
attr2 feature bit was one of these flags. The reconciliation was
done well after mount option parsing and, unfortunately, the feature
reconciliation had a bug where it ignored the noattr2 mount option.
For reasons lost to the mists of time, it was decided that resolving
this issue in commit 7c12f296500e ("[XFS] Fix up noattr2 so that it
will properly update the versionnum and features2 fields.") required
noattr2 to clear the superblock attr2 feature bit. This greatly
complicated the attr2 behaviour and broke rules about feature bits
needing to be set when those specific features are present in the
filesystem.
By complicated, I mean that it introduced problems due to feature
bit interactions with log recovery. All of the superblock feature
bit checks are done prior to log recovery, but if we crash after
removing a feature bit, then on the next mount we see the feature
bit in the unrecovered superblock, only to have it go away after the
log has been replayed. This means our mount time feature processing
could be all wrong.
Hence you can mount with noattr2, crash shortly afterwards, and
mount again without attr2 or noattr2 and still have attr2 enabled
because the second mount sees attr2 still enabled in the superblock
before recovery runs and removes the feature bit. It's just a mess.
Further, this is all legacy code as the v5 format requires attr2 to
be enabled at all times and it cannot be disabled. i.e. the noattr2
mount option returns an error when used on v5 format filesystems.
To straighten this all out, this patch reverts the attr2/noattr2
mount option behaviour back to the original behaviour. There is no
reason for disabling attr2 these days, so we will only do this when
the noattr2 mount option is set. This will not remove the superblock
feature bit. The superblock bit will provide the default behaviour
and only track whether attr2 is present on disk or not. The attr2
mount option will enable the creation of attr2 format inode forks,
and if the superblock feature bit is not set it will be added when
the first attr2 inode fork is created.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
xfs_has_attr() is poorly named. It has global scope as it is defined
in a header file, but it has no namespace scope that tells us what
it is checking has attributes. It's not even clear what "has_attr"
means, because what it is actually doing is an attribute fork lookup
to see if the attribute exists.
Upcoming patches use this "xfs_has_<foo>" namespace for global
filesystem features, which conflicts with this function.
Rename xfs_has_attr() to xfs_attr_lookup() and make it a static
function, freeing up the "xfs_has_" namespace for global scope
usage.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The verifier checks explicitly for bp->b_bn == XFS_SB_DADDR to match
the primary superblock buffer, but the primary superblock is an
uncached buffer and so bp->b_bn is always -1ULL. Hence this never
matches and the CRC error reporting is wholly dependent on the
mount superblock already being populated so CRC feature checks pass
and allow CRC errors to be reported.
Fix this so that the primary superblock CRC error reporting is not
dependent on already having read the superblock into memory.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Emit whichfork values as text strings in the ftrace output.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Constify the rest of the btree functions that take structure and union
pointers and are not supposed to modify them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This btree function is called when updating a record in the rightmost
block of a btree so that we can update the AGF's longest free extent
length field. Neither parameter is supposed to be updated, so mark them
both const.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The @start pointer passed to each per-AG btree type's ->alloc_block
function isn't supposed to be modified, since it's a hint about the
location of the btree block being split that is to be fed to the
allocator, so mark the parameter const.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The pointer passed to each per-AG btree type's ->set_root function isn't
supposed to be modified (that function sets an external pointer to the
root block) so mark them const.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
xchk_btree calls a user-supplied function to validate each btree record
that it finds. Those functions are not supposed to change the record
data, so mark the parameter const.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The inorder functions are simple predicates, which means that they don't
modify the parameters. Mark them all const.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
These functions initialize a key from a record, but they aren't supposed
to modify the record. Mark it const.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The query_range functions are supposed to call a caller-supplied
function on each record found in the dataset. These functions don't
own the memory storing the record, so don't let them change the record.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Range query functions are not supposed to modify the query keys that are
being passed in, so mark them all const.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The btree key comparison functions are not allowed to change the keys
that are passed in, so mark them const. We'll need this for the next
patch, which adds const to the btree range query functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In commit 8ad560d2565e, we changed xfs_rtalloc_query_range to constrain
the range of bits in the realtime bitmap file that would actually be
searched. In commit a3a374bf1889, we changed the range again
(incorrectly), leading to the fix in commit d88850bd5516, which finally
corrected the range check code. Unfortunately, the author never noticed
that the function modifies its input parameters, which is a totaly no-no
since none of the other range query functions change their input
parameters.
So, fix this function yet again to stash the upper end of the query
range (i.e. the high key) in a local variable and hope this is the last
time I have to fix my own function. While we're at it, mark the key
inputs const so nobody makes this mistake again. :(
Fixes: 8ad560d2565e ("xfs: strengthen rtalloc query range checks") Not-fixed-by: a3a374bf1889 ("xfs: fix off-by-one error in xfs_rtalloc_query_range") Not-fixed-by: d88850bd5516 ("xfs: fix high key handling in the rt allocator's query_range function") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Now that xfs_attr_rmtval_remove is gone, rename __xfs_attr_rmtval_remove
to xfs_attr_rmtval_remove
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This is a quick patch to add a new xfs_attr_*_return tracepoints. We
use these to track when ever a new state is set or -EAGAIN is returned
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Log incompat feature flags in the superblock exist for one purpose: to
protect the contents of a dirty log from replay on a kernel that isn't
prepared to handle those dirty contents. This means that they can be
cleared if (a) we know the log is clean and (b) we know that there
aren't any other threads in the system that might be setting or relying
upon a log incompat flag.
Therefore, clear the log incompat flags when we've finished recovering
the log, when we're unmounting cleanly, remounting read-only, or
freezing; and provide a function so that subsequent patches can start
using this.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
There is no reason for this wrapper existing anymore. All the places
that use KM_NOFS allocation are within transaction contexts and
hence covered by memalloc_nofs_save/restore contexts. Hence we don't
need any special handling of vmalloc for large IOs anymore and
so special casing this code isn't necessary.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
These only made a difference when quotaoff supported disabling quota
accounting on a mounted file system, so we can switch everyone to use
a single set of flags and helpers now. Note that the *QUOTA_ON naming
for the helpers is kept as it was the much more commonly used one.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Disabling quota accounting is hairy, racy code with all kinds of pitfalls.
And it has a very strange mind set, as quota accounting (unlike
enforcement) really is a propery of the on-disk format. There is no good
use case for supporting this.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Mon, 31 Jan 2022 20:25:41 +0000 (15:25 -0500)]
xfs_{copy,db,logprint,repair}: pass xfs_mount pointers instead of xfs_sb pointers
Where possible, convert these four programs to pass a pointer to a
struct xfs_mount instead of the struct xfs_sb inside the mount. This
will make it easier to convert some of the code to the new
xfs_has_FEATURE predicates later on.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Theodore Ts'o [Mon, 31 Jan 2022 20:24:54 +0000 (15:24 -0500)]
xfsprogs: fix static build problems caused by liburcu
The liburcu library has a dependency on pthreads. Hence, in order for
static builds of xfsprogs to work, $(LIBPTHREAD) needs to appear
*after* $(LUBURCU) in LLDLIBS. Otherwise, static links of xfs_* will
fail due to undefined references of pthread_create, pthread_exit,
et. al.
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Mon, 6 Dec 2021 19:26:04 +0000 (14:26 -0500)]
libxfs: hide the drainbamaged fallthrough macro from xfslibs
Back in mid-2021, Kees and Gustavo rammed into the kernel a bunch of
static checker "improvements" that redefined '/* fallthrough */'
comments for switch statements as a macro that virtualizes either that
same comment, a do-while loop, or a compiler __attribute__. This was
necessary to work around the poor decision-making of the clang, gcc, and
C language standard authors, who collectively came up with four mutually
incompatible ways to document a lack of branching in a code flow.
Having received ZERO HELP porting this to userspace, Eric and I
foolishly dumped that crap into linux.h, which was a poor decision
because we keep forgetting that linux.h is exported as a userspace
header. This has now caused downstream regressions in Debian[1] and
will probably cause more problems in the other distros.
Move it to platform_defs.h since that's not shipped publicly and leave a
warning to anyone else who dare modify linux.h.
Fixes: df9c7d8d ("xfs: Fix fall-through warnings for Clang") Cc: 1000974@bugs.debian.org, gustavoars@kernel.org, keescook@chromium.org Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
[sandeen: add comment to top of linux.h per Dave's suggestion] Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 2 Dec 2021 20:24:33 +0000 (15:24 -0500)]
libxfs: fix atomic64_t poorly for 32-bit architectures
In commit de555f66, we converted the atomic64_t implementation to use
the liburcu uatomic_* functions. Regrettably, nobody tried to build
xfsprogs on a 32-bit architecture (hint: maintainers don't scale well
anymore) so nobody noticed that the build fails due to the unknown
symbol _uatomic_link_error. This is what happens when liburcu doesn't
know how to perform atomic updates to a variable of a certain size, due
to some horrid macro magic in urcu.h.
Rather than a strict revert to non-atomic updates for these platforms or
(which would introduce a landmine) or roll everything back for the sake
of older platforms, I went with providing a custom atomic64_t
implementation that uses a single pthread mutex. This enables us to
work around the fact that the kernel atomic64_t API doesn't require a
special initializer function, and is probably good enough since there
are only a handful of atomic64_t counters in the kernel.
Clean up the type declarations of a couple of variables in libxlog to
match the kernel usage, though that's probably overkill.
Eventually we'll want to decide if we're deprecating 32-bit, but this
fixes them in the mean time.
[sandeen: make the custom atomic64_add() take an int64_t]
Fixes: de555f66 ("atomic: convert to uatomic") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 2 Dec 2021 20:24:32 +0000 (15:24 -0500)]
libfrog: fix crc32c self test code on cross builds
Helmut Grohne reported that the crc32c self test program fails to cross
build on 5.14.0 if the build host doesn't have liburcu installed. We
don't need userspace RCU functionality to test crc32 on the build host,
so twiddle the header files to include only the two header files that we
actually need.
Note: Build-time testing of crc32c is useful for upstream developers so
that we can check that we haven't broken the checksum code, but we
really ought to be testing this in mkfs and repair on the user's system
so that they don't end up with garbage filesystems. A future patch will
introduce that.
Reported-by: Helmut Grohne <helmut@subdivi.de> Cc: Bastian Germann <bage@debian.org> Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Bastian Germann [Fri, 19 Nov 2021 03:05:11 +0000 (22:05 -0500)]
debian: Pass --build and --host to configure
xfsprogs fails to cross build because it fails to pass --host to configure.
Thus it selects the build architecture as host architecture and fails
configure, because the requested libraries are only installed for the host
architecture.
Link: https://bugs.debian.org/794158 Reported-by: Helmut Grohne <helmut@subdivi.de> Signed-off-by: Bastian Germann <bage@debian.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 12 Nov 2021 20:27:59 +0000 (15:27 -0500)]
mkfs: warn about V4 deprecation when creating new V4 filesystems
The V4 filesystem format is deprecated in the upstream Linux kernel. In
September 2025 it will be turned off by default in the kernel and five
years after that, support will be removed entirely. Warn people
formatting new filesystems with the old format, particularly since V4 is
not the default.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The xfs_perag structure and initialization is unused in userspace,
so #ifdef it out with __KERNEL__ to facilitate the xfsprogs sync
and build.
Signed-off-by: Eric Sandeen <esandeen@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
to format the log inode. It writes the LSN from the inode item into
the log inode, and if recovery decides the inode item needs to be
replayed, it recovers the log inode LSN field and writes it into the
on disk inode LSN field.
Now this might seem like a reasonable thing to do, but it is wrong
on multiple levels. Firstly, if the item is not yet in the AIL,
item->li_lsn is zero. i.e. the first time the inode it is logged and
formatted, the LSN we write into the log inode will be zero. If we
only log it once, recovery will run and can write this zero LSN into
the inode.
This means that the next time the inode is logged and log recovery
runs, it will *always* replay changes to the inode regardless of
whether the inode is newer on disk than the version in the log and
that violates the entire purpose of recording the LSN in the inode
at writeback time (i.e. to stop it going backwards in time on disk
during recovery).
Secondly, if we commit the CIL to the journal so the inode item
moves to the AIL, and then relog the inode, the LSN that gets
stamped into the log inode will be the LSN of the inode's current
location in the AIL, not it's age on disk. And it's not the LSN that
will be associated with the current change. That means when log
recovery replays this inode item, the LSN that ends up on disk is
the LSN for the previous changes in the log, not the current
changes being replayed. IOWs, after recovery the LSN on disk is not
in sync with the LSN of the modifications that were replayed into
the inode. This, again, violates the recovery ordering semantics
that on-disk writeback LSNs provide.
Hence the inode LSN in the log dinode is -always- invalid.
Thirdly, recovery actually has the LSN of the log transaction it is
replaying right at hand - it uses it to determine if it should
replay the inode by comparing it to the on-disk inode's LSN. But it
doesn't use that LSN to stamp the LSN into the inode which will be
written back when the transaction is fully replayed. It uses the one
in the log dinode, which we know is always going to be incorrect.
Looking back at the change history, the inode logging was broken by
back in 2016 by a stupid idiot who thought he knew how this code
worked. i.e. me. That commit replaced an in memory di_lsn field that
was updated only at inode writeback time from the inode item.li_lsn
value - and hence always contained the same LSN that appeared in the
on-disk inode - with a read of the inode item LSN at inode format
time. CLearly these are not the same thing.
Before 93f958f9c41f, the log recovery behaviour was irrelevant,
because the LSN in the log inode always matched the on-disk LSN at
the time the inode was logged, hence recovery of the transaction
would never make the on-disk LSN in the inode go backwards or get
out of sync.
A symptom of the problem is this, caught from a failure of
generic/482. Before log recovery, the inode has been allocated but
never used:
You can see that the LSN of the on-disk inode is 0, even though it
clearly has been written to disk. I point out this inode, because
the generic/482 failure occurred because several adjacent inodes in
this specific inode cluster were not replayed correctly and still
appeared to be zero on disk when all the other metadata (inobt,
finobt, directories, etc) indicated they should be allocated and
written back.
The fix for this is two-fold. The first is that we need to either
revert the LSN changes in 93f958f9c41f or stop logging the inode LSN
altogether. If we do the former, log recovery does not need to
change but we add 8 bytes of memory per inode to store what is
largely a write-only inode field. If we do the latter, log recovery
needs to stamp the on-disk inode in the same manner that inode
writeback does.
I prefer the latter, because we shouldn't really be trying to log
and replay changes to the on disk LSN as the on-disk value is the
canonical source of the on-disk version of the inode. It also
matches the way we recover buffer items - we create a buf_log_item
that carries the current recovery transaction LSN that gets stamped
into the buffer by the write verifier when it gets written back
when the transaction is fully recovered.
However, this might break log recovery on older kernels even more,
so I'm going to simply ignore the logged value in recovery and stamp
the on-disk inode with the LSN of the transaction being recovered
that will trigger writeback on transaction recovery completion. This
will ensure that the on-disk inode LSN always reflects the LSN of
the last change that was written to disk, regardless of whether it
comes from log recovery or runtime writeback.
Fixes: 93f958f9c41f ("xfs: cull unnecessary icdinode fields") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
While auditing the realtime growfs code, I realized that the GROWFSRT
ioctl (and by extension xfs_growfs) has always allowed sysadmins to
change the realtime extent size when adding a realtime section to the
filesystem. Since we also have always allowed sysadmins to set
RTINHERIT and EXTSZINHERIT on directories even if there is no realtime
device, this invalidates the premise laid out in the comments added in
In other words, this is not a case of inadequate metadata validation.
This is a case of nearly forgotten (and apparently untested) but
supported functionality. Update the comments to reflect what we've
learned, and remove the log message about correcting the misalignment.
Fixes: 603f000b15f2 ("xfs: validate extsz hints against rt extent size when rtinherit is set") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
While running xfs/168, I noticed occasional write verifier shutdowns
involving inodes at the very end of the filesystem. Existing inode
btree validation code checks that all inode clusters are fully contained
within the filesystem.
However, due to inadequate checking in the fs shrink code, it's possible
that there could be a sparse inode cluster at the end of the filesystem
where the upper inodes of the cluster are marked as holes and the
corresponding blocks are free. In this case, the last blocks in the AG
are listed in the bnobt. This enables the shrink to proceed but results
in a filesystem that trips the inode verifiers. Fix this by disallowing
the shrink.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In preparation to enable -Wimplicit-fallthrough for Clang, fix
the following warnings by replacing /* fallthrough */ comments,
and its variants, with the new pseudo-keyword macro fallthrough:
fs/xfs/libxfs/xfs_attr.c:487:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:500:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:532:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:594:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:607:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:1410:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:1445:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_attr.c:1473:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
Notice that Clang doesn't recognize /* fallthrough */ comments as
implicit fall-through markings, so in order to globally enable
-Wimplicit-fallthrough for Clang, these comments need to be
replaced with fallthrough; in the whole codebase.
Link: https://github.com/KSPP/linux/issues/115 Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
A recent bug report generated a warning that a code path in
xfs_attr_remove_iter could potentially return error uninitialized in the
case of XFS_DAS_RM_SHRINK state. Fix this by initializing error.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <bodonnel@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The AGI buffer is in big-endian format, so we must convert the
endianness to CPU format to do any comparisons.
Fixes: 46141dc891f7 ("xfs: introduce xfs_ag_shrink_space()") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In doing an investigation into AIL push stalls, I was looking at the
log force code to see if an async CIL push could be done instead.
This lead me to xfs_log_force_lsn() and looking at how it works.
xfs_log_force_lsn() is only called from inode synchronisation
contexts such as fsync(), and it takes the ip->i_itemp->ili_last_lsn
value as the LSN to sync the log to. This gets passed to
xlog_cil_force_lsn() via xfs_log_force_lsn() to flush the CIL to the
journal, and then used by xfs_log_force_lsn() to flush the iclogs to
the journal.
The problem is that ip->i_itemp->ili_last_lsn does not store a
log sequence number. What it stores is passed to it from the
->iop_committing method, which is called by xfs_log_commit_cil().
The value this passes to the iop_committing method is the CIL
context sequence number that the item was committed to.
As it turns out, xlog_cil_force_lsn() converts the sequence to an
actual commit LSN for the related context and returns that to
xfs_log_force_lsn(). xfs_log_force_lsn() overwrites it's "lsn"
variable that contained a sequence with an actual LSN and then uses
that to sync the iclogs.
This caused me some confusion for a while, even though I originally
wrote all this code a decade ago. ->iop_committing is only used by
a couple of log item types, and only inode items use the sequence
number it is passed.
Let's clean up the API, CIL structures and inode log item to call it
a sequence number, and make it clear that the high level code is
using CIL sequence numbers and not on-disk LSNs for integrity
synchronisation purposes.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We don't need to look at the xfs_mount and superblock every time we
need to do an iclog roundoff calculation. The property is fixed for
the life of the log, so store the roundoff in the log at mount time
and use that everywhere.
On a debug build:
$ size fs/xfs/xfs_log.o.*
text data bss dec hex filename
27360 560 8 27928 6d18 fs/xfs/xfs_log.o.orig
27219 560 8 27787 6c8b fs/xfs/xfs_log.o.patched
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The patch 7b13c5155182: "xfs: use perag for ialloc btree cursors"
from Jun 2, 2021, leads to the following Smatch complaint:
fs/xfs/libxfs/xfs_ialloc.c:2403 xfs_imap()
error: we previously assumed 'pag' could be null (see line 2294)
And it's right. Fix it.
Fixes: 7b13c5155182 ("xfs: use perag for ialloc btree cursors") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This patch renames the following functions to make the nameing scheme more consistent:
xfs_attr_shortform_remove -> xfs_attr_sf_removename
xfs_attr_node_remove_name -> xfs_attr_node_removename
xfs_attr_set_fmt -> xfs_attr_sf_addname
Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This ASSERT checks for the state value of RM_SHRINK in the set path
which should never happen. Change to ASSERT(0);
Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Stephen Rothwell reported this compiler warning from linux-next:
fs/xfs/libxfs/xfs_ialloc.c: In function 'xfs_difree_finobt':
fs/xfs/libxfs/xfs_ialloc.c:2032:20: warning: unused variable 'agi' [-Wunused-variable]
2032 | struct xfs_agi *agi = agbp->b_addr;
Which is fallout from agno -> perag conversions that were done in
this function. xfs_check_agi_freecount() is the only user of "agi"
in xfs_difree_finobt() now, and it only uses the agi to get the
current free inode count. We hold that in the perag structure, so
there's not need to directly reference the raw AGI to get this
information.
The btree cursor being passed to xfs_check_agi_freecount() has a
reference to the perag being operated on, so use that directly in
xfs_check_agi_freecount() rather than passing an AGI.
Fixes: 7b13c5155182 ("xfs: use perag for ialloc btree cursors") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Radix tree tags are supposed to be unsigned ints, so fix the callers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
xfs_bmap_set_attrforkoff is only used inside of xfs_bmap.c, so mark it
static.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Variable busy is set to false, but this value is never read as it is
overwritten or not used later on, hence it is a redundant assignment
and can be removed.
Clean up the following clang-analyzer warning:
fs/xfs/libxfs/xfs_alloc.c:1679:2: warning: Value stored to 'busy' is
never read [clang-analyzer-deadcode.DeadStores].
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Variable 'xfs_agf_buf_ops', 'xfs_agi_buf_ops', 'xfs_dquot_buf_ops' and
'xfs_symlink_buf_ops' are declared twice, so sort these variables
alphabetically and remove the repeated declaration.
Cc: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Unlinked lists are held in the perag, and freeing of inodes needs to
be passed a perag, too, so look up the perag early in the unlink
processing and use it throughout.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Now that we've internalised the two-phase inode allocation, we can
now easily make the AG selection and allocation atomic from the
perspective of a single perag context. This will ensure AGs going
offline/away cannot occur between the selection and allocation
steps.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This is just a simple wrapper around the per-ag inode allocation
that doesn't need to exist. The internal mechanism to select and
allocate within an AG does not need to be exposed outside
xfs_ialloc.c, and it being exposed simply makes it harder to follow
the code and simplify it.
This is simplified by internalising xf_dialloc_select_ag() and
xfs_dialloc_ag() into a single xfs_dialloc() function and then
xfs_dir_ialloc() can go away.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
xfs_dialloc_select_ag() does a lot of repetitive work. It first
calls xfs_ialloc_ag_select() to select the AG to start allocation
attempts in, which can do up to two entire loops across the perags
that inodes can be allocated in. This is simply checking if there is
spce available to allocate inodes in an AG, and it returns when it
finds the first candidate AG.
xfs_dialloc_select_ag() then does it's own iterative walk across
all the perags locking the AGIs and trying to allocate inodes from
the locked AG. It also doesn't limit the search to mp->m_maxagi,
so it will walk all AGs whether they can allocate inodes or not.
Hence if we are really low on inodes, we could do almost 3 entire
walks across the whole perag range before we find an allocation
group we can allocate inodes in or report ENOSPC.
Because xfs_ialloc_ag_select() returns on the first candidate AG it
finds, we can simply do these checks directly in
xfs_dialloc_select_ag() before we lock and try to allocate inodes.
This reduces the inode allocation pass down to 2 perag sweeps at
most - one for aligned inode cluster allocation and if we can't
allocate full, aligned inode clusters anywhere we'll do another pass
trying to do sparse inode cluster allocation.
This also removes a big chunk of duplicate code.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
The only caller of xfs_dialloc_select_ag() will always return
-ENOSPC to it's caller if the agbp returned from
xfs_dialloc_select_ag() is NULL. IOWs, failure to find a candidate
AGI we can allocate inodes from is always an ENOSPC condition, so
move this logic up into xfs_dialloc_select_ag() so we can simplify
the return logic in this function.
xfs_dialloc_select_ag() now only ever returns 0 with a locked
agbp, or an error with no agbp.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Now that everything passes a perag, the agno is not needed anymore.
Convert all the users to use pag->pag_agno instead and remove the
agno from the cursor. This was largely done as an automated search
and replace.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Which will eventually completely replace the agno in it.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Needs a [from, to] ranged AG walk, and the perag to be stuffed into
the info structure for callouts to use.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We currently pass an agno from the AG reservation functions to the
individual feature accounting functions, which in future may have to
do perag lookups to access per-AG state. Instead, pre-emptively
plumb the perag through from the highest AG reservation layer to the
feature callouts so they won't have to look it up again.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
All of the callers of the busy extent API either have perag
references available to use so we can pass a perag to the busy
extent functions rather than having them have to do unnecessary
lookups.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Rather than manually walking the ags and passing agnunbers around,
pass the perag for the AG we are currently working on around in the
iwalk structure.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Convert the raw walks to an iterator, pulling the current AG out of
pag->pag_agno instead of the loop iterator variable.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
for_each_perag_tag() is defined in xfs_icache.c for local use.
Promote this to xfs_ag.h and define equivalent iteration functions
so that we can use them to iterate AGs instead to replace open coded
perag walks and perag lookups.
We also convert as many of the straight forward open coded AG walks
to use these iterators as possible. Anything that is not a direct
conversion to an iterator is ignored and will be updated in future
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Move the xfs_perag infrastructure to the libxfs files that contain
all the per AG infrastructure. This helps set up for passing perags
around all the code instead of bare agnos with minimal extra
includes for existing files.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
They are AG functions, not superblock functions, so move them to the
appropriate location.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Replace some open-coded fs block unit conversions with the standard
conversion macro.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
We can use the helper function xfs_attr_node_remove_name to reduce
duplicate code in this function
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This function is no longer used, so it is safe to remove
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This patch modifies the attr set routines to be delay ready. This means
they no longer roll or commit transactions, but instead return -EAGAIN
to have the calling routine roll and refresh the transaction. In this
series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a
state machine like switch to keep track of where it was when EAGAIN was
returned. See xfs_attr.h for a more detailed diagram of the states.
Two new helper functions have been added: xfs_attr_rmtval_find_space and
xfs_attr_rmtval_set_blk. They provide a subset of logic similar to
xfs_attr_rmtval_set, but they store the current block in the delay attr
context to allow the caller to roll the transaction between allocations.
This helps to simplify and consolidate code used by
xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has
now become a simple loop to refresh the transaction until the operation
is completed. Lastly, xfs_attr_rmtval_remove is no longer used, and is
removed.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This patch modifies the attr remove routines to be delay ready. This
means they no longer roll or commit transactions, but instead return
-EAGAIN to have the calling routine roll and refresh the transaction. In
this series, xfs_attr_remove_args is merged with
xfs_attr_node_removename become a new function, xfs_attr_remove_iter.
This new version uses a sort of state machine like switch to keep track
of where it was when EAGAIN was returned. A new version of
xfs_attr_remove_args consists of a simple loop to refresh the
transaction until the operation is completed. A new XFS_DAC_DEFER_FINISH
flag is used to finish the transaction where ever the existing code used
to.
Calls to xfs_attr_rmtval_remove are replaced with the delay ready
version __xfs_attr_rmtval_remove. We will rename
__xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
done.
xfs_attr_rmtval_remove itself is still in use by the set routines (used
during a rename). For reasons of preserving existing function, we
modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
set. Similar to how xfs_attr_remove_args does here. Once we transition
the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
used and will be removed.
This patch also adds a new struct xfs_delattr_context, which we will use
to keep track of the current state of an attribute operation. The new
xfs_delattr_state enum is used to track various operations that are in
progress so that we know not to repeat them, and resume where we left
off before EAGAIN was returned to cycle out the transaction. Other
members take the place of local variables that need to retain their
values across multiple function calls. See xfs_attr.h for a more
detailed diagram of the states.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This patch basically hoists the node transaction handling around the
leaf code we just hoisted. This will helps setup this area for the
state machine since the goto is easily replaced with a state since it
ends with a transaction roll.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This patch hoists xfs_attr_leaf_addname into the calling function. The
goal being to get all the code that will require state management into
the same scope. This isn't particularly aesthetic right away, but it is a
preliminary step to merging in the state machine code.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This patch hoists the later half of xfs_attr_node_addname into
the calling function. We do this because it is this area that
will need the most state management, and we want to keep such
code in the same scope as much as possible
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This patch separates the first half of xfs_attr_node_addname into a
helper function xfs_attr_node_addname_find_attr. It also replaces the
restart goto with an EAGAIN return code driven by a loop in the calling
function. This looks odd now, but will clean up nicly once we introduce
the state machine. It will also enable hoisting the last state out of
xfs_attr_node_addname with out having to plumb in a "done" parameter to
know if we need to move to the next state or not.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This patch separate xfs_attr_node_addname into two functions. This will
help to make it easier to hoist parts of xfs_attr_node_addname that need
state management
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This patch is actually the combination of patches from the previous
version (v18). Initially patch 3 hoisted xfs_attr_set_shortform, and
the next added the helper xfs_attr_set_fmt. xfs_attr_set_fmt is similar
the old xfs_attr_set_shortform. It returns 0 when the attr has been set
and no further action is needed. It returns -EAGAIN when shortform has
been transformed to leaf, and the calling function should proceed the
set the attr in leaf form.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This patch pulls a new helper function xfs_attr_node_remove_name out
of xfs_attr_node_remove_step. This helps to modularize
xfs_attr_node_remove_step which will help make the delayed attribute
code easier to follow
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Originally we added this patch to help modularize the attr code in
preparation for delayed attributes and the state machine it requires.
However, later reviews found that this slightly alters the transaction
handling as the helper function is ambiguous as to whether the
transaction is diry or clean. This may cause a dirty transaction to be
included in the next roll, where previously it had not. To preserve the
existing code flow, we reverse apply this commit.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
This is because libxfs_init() registers the main thread with liburcu,
but doesn't unregister the thread if libxfs library initialization
fails. When repair sets more dangerous options and tries again, the
second initialization attempt causes liburcu to abort. Fix this by
unregistering the thread with liburcu if libxfs initialization fails.
Observed by running xfs/284.
Fixes: e4da1b16 ("xfsprogs: introduce liburcu support") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Use the newly factored out page allocation code. This adds
automatic buffer zeroing for non-read uncached buffers.
This also allows us to greatly simply the error handling in
xfs_buf_get_uncached(). Because xfs_buf_alloc_pages() cleans up
partial allocation failure, we can just call xfs_buf_free() in all
error cases now to clean up after failures.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Thu, 14 Oct 2021 16:35:43 +0000 (12:35 -0400)]
libxfs: fix call_rcu crash when unmounting the fake mount in mkfs
In commit a6fb6abe, we simplified the process by which mkfs.xfs computes
the minimum log size calculation by creating a dummy xfs_mount with the
draft superblock image, using the dummy to compute the log geometry, and
then unmounting the dummy.
Note that creating a dummy mount with no data device is supported by
libxfs, though with the caveat that we don't set up any perag structures
at all. Up until this point this has worked perfectly well since free()
(and hence kmem_free()) are perfectly happy to ignore NULL pointers.
Unfortunately, this will cause problems with the upcoming patch to shift
per-AG setup and teardown to libxfs because call_rcu in the liburcu
library actually tries to access the rcu_head of the passed-in perag
structure, but they're all NULL in the dummy mount case. IOWs,
xfs_free_perag requires that every AG have a per-AG structure, and it's
too late to change the 5.14 kernel libxfs now, so work around this by
altering libxfs_mount to remember when it has initialized the perag
structures and libxfs_umount to skip freeing them when the flag isn't
set.
Just to be clear: This fault has no user-visible consequences right now;
it's a fixup to avoid problems in the libxfs sync series for 5.14.
Fixes: a6fb6abe ("mkfs: simplify minimum log size calculation") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
In preparation to enable -Wimplicit-fallthrough for Clang, fix
the following warnings by replacing /* fall through */ comments,
and its variants, with the new pseudo-keyword macro fallthrough:
fs/xfs/libxfs/xfs_alloc.c:3167:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_da_btree.c:286:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_ag_resv.c:346:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/libxfs/xfs_ag_resv.c:388:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_bmap_util.c:246:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_export.c:88:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_export.c:96:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_file.c:867:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_ioctl.c:562:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_ioctl.c:1548:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_iomap.c:1040:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_inode.c:852:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_log.c:2627:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/xfs_trans_buf.c:298:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/bmap.c:275:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/btree.c:48:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/common.c:85:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/common.c:138:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/common.c:698:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/dabtree.c:51:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/repair.c:951:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
fs/xfs/scrub/agheader.c:89:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
Notice that Clang doesn't recognize /* fall through */ comments as
implicit fall-through markings, so in order to globally enable
-Wimplicit-fallthrough for Clang, these comments need to be
replaced with fallthrough; in the whole codebase.
Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Fri, 1 Oct 2021 18:00:34 +0000 (14:00 -0400)]
libxfs: port xfs_set_inode_alloc from the kernel
To prepare to perag initialization code move to libxfs, port the
xfs_set_inode_alloc function from the kernel and make
libxfs_initialize_perag use it. The code isn't 1:1 identical, but
AFAICT it behaves the same way. In a future kernel release we'll
move the function into xfs_ag.c and update xfsprogs.
(sandeen: Note that this is in effect simply syncing up several
kernel commits in this code which was not shared via libxfs.)
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 29 Sep 2021 20:57:10 +0000 (16:57 -0400)]
libfrog: move topology.[ch] to libxfs
The topology code depends on a few libxfs structures and is only needed
by mkfs and xfs_repair. Move this code to libxfs to reduce the size of
libfrog and to avoid build failures caused by "xfs: move perag structure
and setup to libxfs/xfs_ag.[ch]".
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Darrick J. Wong [Wed, 29 Sep 2021 20:57:05 +0000 (16:57 -0400)]
mkfs: move mkfs/proto.c declarations to mkfs/proto.h
These functions are only used by mkfs, so move them to a separate header
file that isn't in an internal library.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
[sandeen: cosmetic tidyups as suggested by Christoph] Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Dave Chinner [Wed, 29 Sep 2021 20:57:02 +0000 (16:57 -0400)]
atomic: convert to uatomic
Now we have liburcu, we can make use of it's atomic variable
implementation. It is almost identical to the kernel API - it's just
got a "uatomic" prefix. liburcu also provides all the same aomtic
variable memory barriers as the kernel, so if we pull memory barrier
dependent kernel code across, it will just work with the right
barrier wrappers.
This is preparation the addition of more extensive atomic operations
the that kernel buffer cache requires to function correctly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[chandan.babu@oracle.com: Swap order of arguments provided to atomic[64]_[add|sub]()] Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
[sandeen: fix whitespace]
[sandeen: make #defines a little more consistent about (a,v)] Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Eric Sandeen <sandeen@sandeen.net>