Jan Tulak [Tue, 18 Aug 2015 07:53:17 +0000 (17:53 +1000)]
xfsprogs: undefined variable fix
Typo fix, which wasn't catch earlier due to #ifdef branching. The
'rmnttomname' does not exists anywhere and looks like a hybrid between
rmntfromname and rmntonname. And because the previous if has has
'fromname' on both arguments of realpath, I choose the same approach
when fixing it.
Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Mon, 3 Aug 2015 22:36:44 +0000 (08:36 +1000)]
repair: use sb_meta_uuid for checking of metadata headers
Now that we can change the uuid on v5 filesystems, we always need to
verify the metadata uuid against sb_meta_uuid, not sb_uuid. This
fixes quite a few xfstests failures when UUIDs are changed before
executing tests.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Eric Sandeen [Mon, 3 Aug 2015 22:36:41 +0000 (08:36 +1000)]
xfs_repair: Fix malloc size of rt_ext_tree_ptr
rt_ext_tree_ptr points to an avl64tree_desc_t, but we malloc memory
according to the size of avltree_desc_t. Oddly, the latter happens
to be larger, so we're ok, but may as well make it correct.
Addresses-Coverity-Id: 1297533 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Eric Sandeen [Mon, 3 Aug 2015 00:45:00 +0000 (10:45 +1000)]
xfsprogs: Add new sb_meta_uuid field, update userspace tools to manipulate it
This adds a new superblock field, sb_meta_uuid. This allows us to
change the use-visible UUID on crc-enabled filesytems from userspace
if desired, by copying the existing UUID to the new location for
metadata comparisons. If this is done, an incompat flag must be
set to prevent older filesystems from mounting the filesystem, but
the original UUID can be restored, and the incompat flag removed,
with a new xfs_db / xfs_admin UUID command, "restore."
Much of this patch mirrors the kernel patch in simply renaming
the field used for metadata uuid comparison; other bits:
* Teach xfs_db to print the new meta_uuid field
* Allow xfs_db to generate a new UUID for CRC-enabled filesystems
* Allow xfs_db to revert to the original UUID and clear the flag
* Fix up xfs_copy to work with CRC-enabled filesystems
* Update the xfs_admin manpage to show the UUID "restore" command
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Theodore Ts'o [Mon, 3 Aug 2015 00:17:02 +0000 (10:17 +1000)]
xfsprogs: use "unsigned short" instead of ushort
Android's bionic libc doesn't define ushort. There isn't a real
benefit (other than perhaps conciseness) to use ushort over "unsigned
short", and it's only used in a handful of files in xfsprogs. So
change over to using unsigned short everywhere.
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Theodore Ts'o [Mon, 3 Aug 2015 00:16:48 +0000 (10:16 +1000)]
xfsprogs: avoid use of si_tid in struct xlog_split_item
In Android's bionic libc (as well as the Linux kernel's
include/uapi/asm-generic/siginfo.h), si_tid is a #define to provide
backwards compatibility for the timerid in the siginfo structure.
This breaks the compile of logprint/log_misc.c. Change this to be
si_xtid in order to avoid a namespace collision
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Theodore Ts'o [Mon, 3 Aug 2015 00:16:21 +0000 (10:16 +1000)]
xfsprogs: define and use BUILD_CC in configure.ac for cross compilation
In order to support cross-compilation, we need to build gen_crc32table
using the C compiler targetted for the build platform, since it is run
as part of the build process.
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Theodore Ts'o [Mon, 3 Aug 2015 00:16:01 +0000 (10:16 +1000)]
xfsprogs: pull in libgen.h to get prototype for basename()
The function prototype for basename() is in <libgen.h>, per Posix.
Without the the function prototype, the build will throw errors due to
the missing prototype.
On glibc, using libgen.h will force the use of Poxis's basename(),
instead of glibc's basename() with GNU extensions. However, xfsprogs
doesn't depend on any of the GNU extensions, so this is fine.
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Theodore Ts'o [Mon, 3 Aug 2015 00:15:45 +0000 (10:15 +1000)]
xfsprogs: define NBBY if not defined by the system header files
Android's bionic libc doesn't define NBBY; this isn't a standard
define, and since all modern/sane platforms have 8 bits per byte, use
this as a default.
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Jan Tulak [Mon, 3 Aug 2015 00:06:21 +0000 (10:06 +1000)]
xfsprogs: Don't Make .po files with gettext disabled
"po" target is added only if gettext binary is found.
Without this patch, Make tried to build the target even
with --enable-gettext=no configure option, which led
to a failing build.
Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Jan Tulak [Mon, 3 Aug 2015 00:05:35 +0000 (10:05 +1000)]
xfsprogs: Search path for utilities unified
Currently, when autoconf is checking for an utility, every utility has
its own pathes defined independently. Unify it in a single variable
used for (almost) all utilities.
Also, add /opt/local/bin to the path.
Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Jan Tulak [Mon, 3 Aug 2015 00:05:08 +0000 (10:05 +1000)]
xfsprogs: blkid is now mandatory
Because blkid is here for a long time, I hereby propose a patch for
removing support for NOT having blkid. The current support through
set of #ifdef is prone to errors like
making a patch just in one of the branches, and according to a
recent talk between Christoph and Eric, it is not necessary to keep
it supported.
Remove code for checking ENABLE_BLKID, and the code when
ENABLE_BLKID is not defined. The only use of libdisk was in the
removed code, so remove libdisk too. It makes blkid required for
compilation.
[dchinner: also remove include/volume.h and include/dvh.h as
suggested by Christoph during review. ]
Signed-off-by: Jan Tulak <jtulak@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
platform_defs.h is a generated header file, which causes all kinds
of problems when installed on multiarch systems, and requires
workarounds in distribution packages. Instead move the small parts
of it needed in the installed xfs.h into xfs.h and keep
platform_defs.h private to xfsprogs.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
xfsprogs: move __u*/__s* typedefs to per-port headers
Currently we have to install the autoconf-generated platform_defs.h
to get the defintions for these. But they are clearly a feature
of Linux vs non-Linux platforms so move them to the per-port headers
instead.
Note: in the long run it might be a good idea to just the standard
uint*_t/int*_t types instead.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
We don't need the xfs/ prefix for local includes if we just add the
libxfs directory to the include path. Once that is done we only
need to link the installed headers into include/xfs.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
xfsprogs: only install *format.h headers in install-qa
Now that we've properly split up the headers we don't need to install all
the libxfs-internal headers for xfstests. Just install the three headers
defining the on-disk format and xfs_arch.h which is required to compile
them instead.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Fri, 31 Jul 2015 04:44:52 +0000 (14:44 +1000)]
build: create include/xfs before installing headers
Currently the install-headers rule from include/Makefile creates
include/xfs, but there is no guarantee that it will be the first
directory that make executes that rule in. Hence other directories
can race with the creation on include/xfs and fail.
Move the creation of include/xfs to occur before running the
install_headers rules on the subdirectories to avoid any possible
races with creation.
Brian Foster [Fri, 31 Jul 2015 01:12:44 +0000 (11:12 +1000)]
xfs: check min blks for random debug mode sparse allocations
The inode allocator enables random sparse inode chunk allocations in
DEBUG mode to facilitate testing. Sparse inode allocations are not
always possible, however, depending on the fs geometry. For example,
there is no possibility for a sparse inode allocation on filesystems
where the block size is large enough to fit one or more inode chunks
within a single block.
Fix up the DEBUG mode sparse inode allocation logic to trigger random
sparse allocations only when the geometry of the fs allows it.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
George Wang [Fri, 31 Jul 2015 01:12:44 +0000 (11:12 +1000)]
xfs: use percpu_counter_read_positive for mp->m_icount
Function percpu_counter_read just return the current counter, which can be
negative. This will cause the checking of "allocated inode
counts <= m_maxicount" false positive. Use percpu_counter_read_positive can
solve this problem, and be consistent with the purpose to introduce percpu
mechanism to xfs.
Signed-off-by: George Wang <xuw2015@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Fri, 31 Jul 2015 01:11:56 +0000 (11:11 +1000)]
xfs: clean up XFS_MIN_FREELIST macros
We no longer calculate the minimum freelist size from the on-disk
AGF, so we don't need the macros used for this. That means the
nested macros can be cleaned up, and turn this into an actual
function so the logic is clear and concise. This will make it much
easier to add support for the rmap btree when the time comes.
This also gets rid of the XFS_AG_MAXLEVELS macro used by these
freelist macros as it is simply a wrapper around a single variable.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Fri, 31 Jul 2015 01:10:56 +0000 (11:10 +1000)]
xfs: sanitise error handling in xfs_alloc_fix_freelist
The error handling is currently an inconsistent mess as every error
condition handles return values and releasing buffers individually.
Clean this up by using gotos and a sane error label stack.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Fri, 31 Jul 2015 01:09:56 +0000 (11:09 +1000)]
xfs: factor out free space extent length check
The longest extent length checks in xfs_alloc_fix_freelist() are now
essentially identical. Factor them out into a helper function, so we
know they are checking exactly the same thing before and after we
lock the AGF.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Fri, 31 Jul 2015 01:08:56 +0000 (11:08 +1000)]
xfs: xfs_alloc_fix_freelist() can use incore perag structures
At the moment, xfs_alloc_fix_freelist() uses a mix of per-ag based
access and agf buffer based access to freelist and space usage
information. However, once the AGF buffer is locked inside this
function, it is guaranteed that both the in-memory and on-disk
values are identical. xfs_alloc_fix_freelist() doesn't modify the
values in the structures directly, so it is a read-only user of the
infomration, and hence can use the per-ag structure exclusively for
determining what it should do.
This opens up an avenue for cleaning up a lot of duplicated logic
whose only difference is the structure it gets the data from, and in
doing so removes a lot of needless byte swapping overhead when
fixing up the free list.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Fri, 31 Jul 2015 01:07:56 +0000 (11:07 +1000)]
xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
xfs_attr_inactive() is supposed to clean up the attribute fork when
the inode is being freed. While it removes attribute fork extents,
it completely ignores attributes in local format, which means that
there can still be active attributes on the inode after
xfs_attr_inactive() has run.
This leads to problems with concurrent inode writeback - the in-core
inode attribute fork is removed without locking on the assumption
that nothing will be attempting to access the attribute fork after a
call to xfs_attr_inactive() because it isn't supposed to exist on
disk any more.
To fix this, make xfs_attr_inactive() completely remove all traces
of the attribute fork from the inode, regardless of it's state.
Further, also remove the in-core attribute fork structure safely so
that there is nothing further that needs to be done by callers to
clean up the attribute fork. This means we can remove the in-core
and on-disk attribute forks atomically.
Also, on error simply remove the in-memory attribute fork. There's
nothing that can be done with it once we have failed to remove the
on-disk attribute fork, so we may as well just blow it away here
anyway.
cc: <stable@vger.kernel.org> # 3.12 to 4.0 Reported-by: Waiman Long <waiman.long@hp.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Fri, 31 Jul 2015 01:06:56 +0000 (11:06 +1000)]
xfs: always log the inode on unwritten extent conversion
The fsync() requirements for crash consistency on XFS are to flush file
data and force any in-core inode updates to the log. We currently check
whether the inode is pinned to identify whether the log needs to be
forced, since a non-zero pin count generally represents an inode that
has transactions awaiting a flush to the on-disk log.
This is not sufficient in all cases, however. Reports of xfstests test
generic/311 failures on ppc64/s390x hosts have identified failures to
fsync outstanding inode modifications due to the inode not being pinned
at the time of the fsync. This occurs because certain bmap updates can
complete by logging bmapbt buffers but without ever dirtying (and thus
pinning) the core inode. The following is a specific incarnation of this
problem:
In short, the unwritten extent conversion for the last write is lost
despite the fact that an fsync executed before the filesystem was
shutdown. Note that this is impossible to reproduce on v5 supers due to
unconditional time callbacks for di_changecount and highly difficult to
reproduce on CONFIG_HZ=1000 kernels due to those same callbacks
frequently updating cmtime prior to the bmap update. CONFIG_HZ=100
reduces timer granularity enough to increase the odds that time updates
are skipped and allows this to reproduce within a handful of attempts.
To deal with this problem, unconditionally log the core in the unwritten
extent conversion path. Fix up logflags after the extent conversion to
keep the extent update code consistent with the other extent update
helpers. This fixup is not necessary for the other (hole, delay) extent
helpers because they execute in the block allocation codepath, which
already logs the inode for other reasons (e.g., for di_nblocks).
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Clearly indicating that the extent length is greater than MAXEXTLEN,
which is 2097151. A prior trace point shows the allocation was an
exact size match and that a length greater than MAXEXTLEN was asked
for:
We don't see this problem with extent size hints through the IO path
because we can't do single IOs large enough to trigger MAXEXTLEN
allocation. fallocate(), OTOH, is not limited in it's allocation
sizes and so needs help here.
The issue is that the extent size hint alignment is rounding up the
extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking
into account extent size hints when calculating the maximum extent
length to allocate. xfs_bmapi_reserve_delalloc() is already doing
this, but direct extent allocation is not.
Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is
wrong, and it works only because delayed allocation extents are not
limited in size to MAXEXTLEN in the in-core extent tree. hence this
calculation does not work for direct allocation, and the delalloc
code needs fixing. This may, in fact be the underlying bug that
occassionally causes transaction overruns in delayed allocation
extent conversion, so now we know it's wrong we should fix it, too.
Many thanks to Brian Foster for finding this problem during review
of this patch.
Hence the fix, after much code reading, is to allow
xfs_bmap_extsize_align() to align partial extents when full
alignment would extend the alignment past MAXEXTLEN. We can safely
do this because all callers have higher layer allocation loops that
already handle short allocations, and so will simply run another
allocation to cover the remainder of the requested allocation range
that we ignored during alignment. The advantage of this approach is
that it also removes the need for callers to do anything other than
limit their requests to MAXEXTLEN - they don't really need to be
aware of extent size hints at all.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Fri, 31 Jul 2015 01:03:56 +0000 (11:03 +1000)]
repair: helper to transition inode blocks to inode state
The state of each block in an inode chunk transitions from free state to
inode state as we process physical inodes on disk. We take care to
detect invalid transitions and warn the user if multiply claimed blocks
are detected.
This block of code is a largish switch statement that is executed twice
due to the implementation details of the inode processing loop. Factor
it into a new helper.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Fri, 31 Jul 2015 01:03:53 +0000 (11:03 +1000)]
repair: helper to import on-disk inobt records to in-core trees
In the common case, the in-core inode state from the on-disk inobt
records is imported from the inobt and validated against the finobt (if
one exists). When both trees exist along with some form of corruption,
it's possible to find inodes in the finobt not tracked by the inobt.
While this is unexpected, we attempt to repair by importing the inodes
from the finobt.
The associated code in the finobt scan function mirrors the associated
code in the inobt scan function. Factor this into a separate helper that
can be called by either tree scan.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Fri, 31 Jul 2015 01:03:52 +0000 (11:03 +1000)]
repair: helper for inode chunk alignment and start/end ino number verification
The inobt scan code executes different routines for processing inobt
records and finobt records. While some verification differs between the
trees, much of it is the same. One such example of this is the inode
record alignment and start/end inode number verification. The only
difference between the inobt and finobt verification is the error
message that is generated as a result of failure.
Factor out these alignment checks into a new helper that takes an enum
parameter that identifies which tree is undergoing the scan. Use a new
string array for this function and subsequent common inobt scan helpers
to convert the enum to the name of the tree for the purposes of
including in any resulting warning messages.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Fri, 31 Jul 2015 01:03:51 +0000 (11:03 +1000)]
repair: access helpers for on-disk inobt record freecount
The on-disk inobt record has two formats depending on whether sparse
inode support is enabled or not. If so, the freecount field is a single
byte and does not require byte-conversion. Otherwise, it is a 4-byte
field and does.
Create the inorec_[get|set]_freecount() helpers to abstract this detail
away from the core repair code.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Fri, 31 Jul 2015 01:03:49 +0000 (11:03 +1000)]
metadump: support sparse inode records
xfs_metadump currently uses mp->m_ialloc_blks sized buffers to copy
inode chunks. If a filesystem supports sparse inodes, some clusters
within inode chunks can point to arbitrary data. If the buffer used to
read inodes includes these sparse clusters, inode read verification
fails and prints filesystem corruption warnings.
Update copy_inode_chunks() to support using a cluster sized buffer to
read a full inode chunk in multiple iterations if sparse inodes is
enabled. For each cluster read, check whether the first inode in the
cluster is sparse and skip the cluster if so. This is safe because
sparse records are allocated at cluster granularity.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
metadump: reorder inode record sanity checks and inode buffer read
In preparation to support sparse inode records, refactor
copy_inode_chunk() to perform all record sanity checks before the cursor
is set to the inode chunk and the inode buffer is read.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
repair: handle sparse inode alignment
Sparse inode support requires inode alignment to match inode chunk size.
xfs_repair currently expects inode alignment to match the default
cluster size or a scaled factor thereof.
Update sb_validate_ino_align() to consider the superblock valid if
sparse inode support is enabled and alignment matches the chunk size.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
repair: do not prefetch holes in sparse inode chunks
The repair prefetch mechanism reads all inode chunks in advance of
repair processing to improve performance. Inode buffer verification and
processing can occur within the prefetch mechanism such as when
directories are being processed. Prefetch currently assumes fully
populated inode chunks which leads to corruption errors attempting to
verify inode buffers that do not contain inodes.
Update prefetch to check the previously scanned sparse inode bits and
skip inode buffer reads of clusters that are sparse. We check sparse
state per-inode cluster because the cluster size is the min. allowable
inode chunk hole granularity.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
repair: reconstruct sparse inode records correctly on disk
Phase 5 traverses all of the in-core inode records and regenerates the
inode btrees a record at a time. The record insertion code doesn't
account for sparse inodes which means the ir_holemask and ir_count
fields are not set on-disk and ir_freecount is set with an invalid value
for sparse inode records.
Update build_ino_tree() to handle sparse inode records correctly. We
must account real, allocated inodes only into the ir_freecount field.
The 64-bit in-core sparse inode bitmask must be converted to compressed
16-bit ir_holemask format. Finally, the ir_count field must set to the
total (non-sparse) inode count of the record.
If the fs does not support sparse inodes, both the ir_holemask and
ir_count field are initialized to zero to preserve backwards
compatibility. These bytes historically landed in the high order bytes
of ir_freecount and must be 0 to be interpreted correctly by older XFS
implementations without sparse inode support.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
repair: do not account sparse inodes in phase 5 cursor init.
The inode btrees are reconstructed in phase 5 of xfs_repair. The btree
cursor initialization counts the allocated and free inodes in the
in-core records and calculates the expected geometry of the resulting
btree. The free and total inode counts for each AG are also ultimately
aggregated to update the associated superblock counts.
Update init_ino_cursor() to not assume 64 inode records and not account
sparse inodes into the total or free inode count for each AG.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
repair: factor out sparse inodes from finobt reconstruction
Phase 5 of xfs_repair recreates the on-disk btrees. The free inode btree
(finobt) contains inode records that contain one or more free inodes.
Sparse inodes are marked as free and therefore sparse inode records can
be incorrectly included in the finobt even when no real free inodes are
available in the record.
Update the finobt in-core record traversal helpers to factor out sparse
inodes and only consider inode records with allocated, free inodes for
finobt insertion.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
repair: process sparse inode records correctly
The inode processing phases of xfs_repair (3 and 4) validate the actual
inodes referred to by the previously scanned inode btrees. The physical
inodes are read from disk and internally validated in various ways. The
inode block state is also verified and corrected if necessary.
Sparse inodes are not physically allocated and the associated blocks may
be allocated to any other area of the fs (file data, internal use,
etc.). Attempts to validate these blocks as inode blocks produce noisy
corruption errors.
Update the inode processing mechanism to handle sparse inode records
correctly. Since sparse inodes do not exist, the general approach here
is to simply skip validation of sparse inodes. Update
process_inode_chunk() to skip reads of sparse clusters and set the buf
pointer of associated clusters to NULL. Update the rest of the function
to only verify non-NULL cluster buffers. Also, skip the inode block
state checks for blocks in sparse inode clusters.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
repair: validate ir_count field for sparse format records
Sparse format inobt records contain an additional count field that
records the number of physical inodes tracked by the record. Verify the
count is internally consistent according to the holemask, similar to how
freecount is validated against the free mask.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
repair: scan sparse finobt records correctly
The finobt scan performs similar checks as to the inobt scan, including
internal record consistency checks, consistency with inobt records,
inode block state, etc. Various parts of this mechanism also assume
fully allocated inode records and thus lead to false errors with sparse
records.
Update the finobt scan to detect and handle sparse inode records
correctly. As for the inobt, do not assume that blocks associated with
sparse regions are allocated for inodes and do not account sparse inodes
against the freecount. Additionally, verify that sparse state is
consistent with the in-core record and set up any new in-core records
that might have been missing from the inobt correctly.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
repair: scan and track sparse inode chunks correctly
Phase 2 of xfs_repair scans the on-disk inobt and creates in-core
records for all inodes in the fs. This also involves marking
free/allocated state of all inodes, internal record verification and
block state management for the inode chunks tracked by inode records.
Various parts of the inobt scan mechanism assume fully allocated inode
records and thus lead to spurious errors when sparse inode records are
encountered.
Update the inobt scan to detect and handle sparse inode records
correctly. Do not set the allocation state of blocks in sparse inode
regions as these blocks do not belong to the record. Do not account
sparse inodes against the ir_freecount as these inodes do not exist and
are not available for allocation by the fs. Finally, track the sparse
status of each individual inode in the in-core inode records for future
reference.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
repair: use ir_count for filesystems with sparse inode support
Repair currently assumes each inobt record covers 64 inodes and uses
this value to validate inode counts in the AGI headers and superblock.
This is not always the case with sparse inode support.
Update scan_inobt() to check for sparse inode support and use the new
ir_count field for inode accounting. ir_count contains the total number
of inodes tracked by the record.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:22 +0000 (09:18 +1000)]
repair: remove duplicate field from aghdr_cnts
The agicount and icount fields are used in separate parts of the AG scan
but both fields track the same data. agicount is used to compare with
the AGI header and icount is used to calculate the total inode count to
compare with sb_icount.
Use agicount rather than icount in scan_ags() and remove the icount
field.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:21 +0000 (09:18 +1000)]
repair: handle sparse format inobt record freecount correctly
The sparse inode chunk feature introduces a new inobt record format that
converts ir_freecount from 4 bytes to 1 byte. ir_freecount references
throughout repair currently assume the 'full' format and endian-convert
from the 32-bit value.
Update the xfs_repair inobt scan and tree rebuild codepaths to use the
correct record format for ir_freecount when sparse inodes is enabled.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:18:07 +0000 (09:18 +1000)]
db: show sparse inodes feature state in version command output
The xfs_db version command prints a string for each of the various
features supported by a filesystem. Include 'SPARSE_INODES' in the
version string when sparse inode chunk allocation is supported by the
fs.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:17:07 +0000 (09:17 +1000)]
db: support sparse inode chunk inobt record and sb fields
The sparse inode feature uses a different on-disk inobt record format.
Define the new record format in the xfs_db type infrastructure and use
this definition for fs' that support sparse inodes.
Also update the superblock type structure with the sb_spino_align field.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Replace them with the standards conform intptr_t and uintptr_t. Note
that many uses look rather questionable cargo-cult avoidance of pointer
arithmetics, but let's leave that for now and look at it separately.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 30 Jul 2015 23:16:07 +0000 (09:16 +1000)]
mkfs: sparse inode chunk support
Allow format of sparse inode chunk enabled filesystems via the '-i
sparse' flag. Note that sparse inode chunk support requires a v5
superblock (-m crc=1).
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Mike Grant [Thu, 30 Jul 2015 23:14:58 +0000 (09:14 +1000)]
xfs_repair: include any realloc'ed buffers in final putbuf
The realloc code included in commit 95dff16b1 potentially introduces
extra buffers to bplist. These should be dealt with at the end of the
longform_dir2_entry_check function. This replaces the originally estimated
number of entries (freetab->naents) with the actual number finally allocated
(num_bps).
Signed-off-by: Mike Grant <mggr@pml.ac.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 28 May 2015 23:26:03 +0000 (09:26 +1000)]
xfs: skip unallocated regions of inode chunks in xfs_ifree_cluster()
xfs_ifree_cluster() is called to mark all in-memory inodes and inode
buffers as stale. This occurs after we've removed the inobt records and
dropped any references of inobt data. xfs_ifree_cluster() uses the
starting inode number to walk the namespace of inodes expected for a
single chunk a cluster buffer at a time. The cluster buffer disk
addresses are calculated by decoding the sequential inode numbers
expected from the chunk.
The problem with this approach is that if the inode chunk being removed
is a sparse chunk, not all of the buffer addresses that are calculated
as part of this sequence may be inode clusters. Attempting to acquire
the buffer based on expected inode characterstics (i.e., cluster length)
can lead to errors and is generally incorrect.
We already use a couple variables to carry requisite state from
xfs_difree() to xfs_ifree_cluster(). Rather than add a third, define a
new internal structure to carry the existing parameters through these
functions. Add an alloc field that represents the physical allocation
bitmap of inodes in the chunk being removed. Modify xfs_ifree_cluster()
to check each inode against the bitmap and skip the clusters that were
never allocated as real inodes on disk.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 28 May 2015 23:22:52 +0000 (09:22 +1000)]
xfs: only free allocated regions of inode chunks
An inode chunk is currently added to the transaction free list based on
a simple fsb conversion and hardcoded chunk length. The nature of sparse
chunks is such that the physical chunk of inodes on disk may consist of
one or more discontiguous parts. Blocks that reside in the holes of the
inode chunk are not inodes and could be allocated to any other use or
not allocated at all.
Refactor the existing xfs_bmap_add_free() call into the
xfs_difree_inode_chunk() helper. The new helper uses the existing
calculation if a chunk is not sparse. Otherwise, use the inobt record
holemask to free the contiguous regions of the chunk.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 28 May 2015 23:20:10 +0000 (09:20 +1000)]
xfs: filter out sparse regions from individual inode allocation
Inode allocation from an existing record with free inodes traditionally
selects the first inode available according to the ir_free mask. With
sparse inode chunks, the ir_free mask could refer to an unallocated
region. We must mask the unallocated regions out of ir_free before using
it to select a free inode in the chunk.
Update the xfs_inobt_first_free_inode() helper to find the first free
inode available of the allocated regions of the inode chunk.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 28 May 2015 23:19:29 +0000 (09:19 +1000)]
xfs: randomly do sparse inode allocations in DEBUG mode
Sparse inode allocations generally only occur when full inode chunk
allocation fails. This requires some level of filesystem space usage and
fragmentation.
For filesystems formatted with sparse inode chunks enabled, do random
sparse inode chunk allocs when compiled in DEBUG mode to increase test
coverage.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
The commit message attached to each patch is being filtered out
during the process of preparing it for import via guilt. This
results in commits without the corresponding commit message.
Massage the patch format to ensure that guilt imports the commit
message along with the code changes.
Also, older versions of filterdiff do not handle git diff metadata
correctly, so add a version check on filterdiff to make sure we use
a working version.
Brian Foster [Thu, 28 May 2015 23:18:32 +0000 (09:18 +1000)]
xfs: allocate sparse inode chunks on full chunk allocation failure
xfs_ialloc_ag_alloc() makes several attempts to allocate a full inode
chunk. If all else fails, reduce the allocation to the sparse length and
alignment and attempt to allocate a sparse inode chunk.
If sparse chunk allocation succeeds, check whether an inobt record
already exists that can track the chunk. If so, inherit and update the
existing record. Otherwise, insert a new record for the sparse chunk.
Create helpers to align sparse chunk inode records and insert or update
existing records in the inode btrees. The xfs_inobt_insert_sprec()
helper implements the merge or update semantics required for sparse
inode records with respect to both the inobt and finobt. To update the
inobt, either insert a new record or merge with an existing record. To
update the finobt, use the updated inobt record to either insert or
replace an existing record.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Thu, 30 Jul 2015 23:09:58 +0000 (09:09 +1000)]
libxfs-apply: auto-name patches for guilt
When applying a series of commits, having to write a name for each
of them is time consuming. instead, just use the same method
'guilt import-commit' uses and use the commit header to generate the
filename automatically.
Dave Chinner [Thu, 30 Jul 2015 23:09:58 +0000 (09:09 +1000)]
progs: clean up all remaining xfs*h includes
Convert all the various ways of including xfs*.h to one standard
way across all libraries and utilities:
#include "xfs/xfs*.h"
This means we are consistently including the local
include/xfs/xfs*h header file in parts of the build.
The exception to this is the libxfs/ directory, which does not use
the "xfs/" prefix in files shared with the kernel to maintain as
much similarity to the kernel code as possible. However, libxfs/ is
the source that include/xfs points back to, so the local include
rule ensures this works just fine.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 28 May 2015 23:09:05 +0000 (09:09 +1000)]
xfs: helper to convert holemask to inode alloc. bitmap
The inobt record holemask field is a condensed data type designed to fit
into the existing on-disk record and is zero based (allocated regions
are set to 0, sparse regions are set to 1) to provide backwards
compatibility. This makes the type somewhat complex for use in higher
level inode manipulations such as individual inode allocation, etc.
Rather than foist the complexity of dealing with this field to every bit
of logic that requires inode granular information, create a helper to
convert the holemask to an inode allocation bitmap. The inode allocation
bitmap is inode granularity similar to the inobt record free mask and
indicates which inodes of the chunk are physically allocated on disk,
irrespective of whether the inode is considered allocated or free by the
filesystem.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Thu, 30 Jul 2015 23:08:58 +0000 (09:08 +1000)]
libxfs-apply: reduce output verbosity
When applying a series of patches, there is lots of verbosity that
hides the actual operations being done. Hide all that verbosity
behind a --verbose CLI option. For patch based execution, turn the
verbosity on by default, otherwise leave it off.
Brian Foster [Thu, 28 May 2015 23:05:49 +0000 (09:05 +1000)]
xfs: pass inode count through ordered icreate log item
v5 superblocks use an ordered log item for logging the initialization of
inode chunks. The icreate log item is currently hardcoded to an inode
count of 64 inodes.
The agbno and extent length are used to initialize the inode chunk from
log recovery. While an incorrect inode count does not lead to bad inode
chunk initialization, we should pass the correct inode count such that log
recovery has enough data to perform meaningful validity checks on the
chunk.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Thu, 30 Jul 2015 23:07:58 +0000 (09:07 +1000)]
libxfs-apply: allow commit range specification
Rather than having to manully run the script for every commit in a
series that needs to be applied, allow the commit ID to specify a
range of commits in the form or a git refspec.
Change the internal code to pull a commit at a time from the source
repository and applying it to the current repository. If guilt is in
use, this will result in a patch per commit, if guilt is not in use
it will aggregate all the changes into a single commit.
Also, fix the xfsprogs libxfs file filter match to exclude files
from the fs/xfs directory correctly.
Note: this pulls in a function from guilt to handle commit ids in a
sane manner.
Dave Chinner [Thu, 30 Jul 2015 23:07:58 +0000 (09:07 +1000)]
progs: clean up libxfs.h includes
Convert all the various ways of including libxfs.h to one standard
way across all libraries and utilities:
#include "xfs/libxfs.h"
This means we are consistently including the local
include/xfs/libxfs.h header file in parts of the build, and means we
need to ensure that the include directory is correctly populated
before we do any other part of the build.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Brian Foster [Thu, 28 May 2015 23:03:04 +0000 (09:03 +1000)]
xfs: introduce inode record hole mask for sparse inode chunks
The inode btrees track 64 inodes per record regardless of inode size.
Thus, inode chunks on disk vary in size depending on the size of the
inodes. This creates a contiguous allocation requirement for new inode
chunks that can be difficult to satisfy on an aged and fragmented (free
space) filesystems.
The inode record freecount currently uses 4 bytes on disk to track the
free inode count. With a maximum freecount value of 64, only one byte is
required. Convert the freecount field to a single byte and use two of
the remaining 3 higher order bytes left for the hole mask field. Use the
final leftover byte for the total count field.
The hole mask field tracks holes in the chunks of physical space that
the inode record refers to. This facilitates the sparse allocation of
inode chunks when contiguous chunks are not available and allows the
inode btrees to identify what portions of the chunk contain valid
inodes. The total count field contains the total number of valid inodes
referred to by the record. This can also be deduced from the hole mask.
The count field provides clarity and redundancy for internal record
verification.
Note that neither of the new fields can be written to disk on fs'
without sparse inode support. Doing so writes to the high-order bytes of
freecount and causes corruption from the perspective of older kernels.
The on-disk inobt record data structure is updated with a union to
distinguish between the original, "full" format and the new, "sparse"
format. The conversion routines to get, insert and update records are
updated to translate to and from the on-disk record accordingly such
that freecount remains a 4-byte value on non-supported fs, yet the new
fields of the in-core record are always valid with respect to the
record. This means that higher level code can refer to the current
in-core record format unconditionally and lower level code ensures that
records are translated to/from disk according to the capabilities of the
fs.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
And assumes that it is is run in the source repo. It makes
more sense to run it in the destination repository and specify
both the source repo and the commit ids in which it comes from.
Change the CLI argument parsing to use options rather than trying to
guess the intention from the number of CLI options presented. Also
add checking to ensure the options are presented correctly, add more
meaningful error messages and check that the source/destination
repositories are recognised as either kernel or libxfs repositories.
The resulting usage is:
$ libxfs-apply
Need to specify both source repo and commit id
Dave Chinner [Thu, 30 Jul 2015 23:06:58 +0000 (09:06 +1000)]
build: populate include/xfs before building
To avoid conflicts between dependency generation and header
installation, we need to separate the header installation out into a
separate step in the build that needs to be done before the actual
build.
Add a "HEADER_SUBDIRS" list to indicate which directories we need
to build headers in, and iterate it before the SUBDIRS build target.
Replace the implicit header installation rules in makefiles with
explict header targets so this will work.