Dave Chinner [Wed, 4 Sep 2013 22:05:50 +0000 (22:05 +0000)]
xfs: Add xfs_log_rlimit.c
Add source files for xfs_log_rlimit.c The new file is used for log
size calculations and validation shared with userspace.
[dchinner: xfs_log_calc_max_attrsetm_res() does not modify the
tr_attrsetm reservation, just calculates the maximum. ]
[dchinner: rework loop in xfs_log_get_max_trans_res() ]
[dchinner: implement xfs_log_calc_unit_res() in util.c to give mkfs
a worse case calculation of the log size needed. ]
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:49 +0000 (22:05 +0000)]
xfs: Get rid of all XFS_XXX_LOG_RES() macro
Get rid of all XFS_XXX_LOG_RES() macros since they are obsoleted now.
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:48 +0000 (22:05 +0000)]
xfs: refactor xfs_trans_reserve() interface
With the new xfs_trans_res structure has been introduced, the log
reservation size, log count as well as log flags are pre-initialized
at mount time. So it's time to refine xfs_trans_reserve() interface
to be more neat.
Also, introduce a new helper M_RES() to return a pointer to the
mp->m_resv structure to simplify the input.
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:47 +0000 (22:05 +0000)]
xfs: Make writeid transaction use tr_writeid
tr_writeid is defined at mp->m_resv structure, however, it does not
really being used when it should be..
This patch changes it to tr_writeid to fetch the correct log
reservation size.
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:46 +0000 (22:05 +0000)]
xfs: Introduce tr_fsyncts to m_reservation
A preparation step.
For now fsync_ts transaction use the pre-calculated log reservation
size of tr_swrite.
This patch introduce a new item tr_fsyncts to mp->m_reservations
structure so that we can fetch the log reservation value for it
in a same manner to others.
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:45 +0000 (22:05 +0000)]
xfs: Introduce a new structure to hold transaction reservation items
Introduce a new structure xfs_trans_res to hold transaction
reservation item info per log ticket.
We also need to improve xfs_trans_resv_calc() by initializing the
log count as well as log flags for permanent log reservation.
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:44 +0000 (22:05 +0000)]
xfs: make struct xfs_perag kernel only
The struct xfs_perag has many kernel-only definitions in it,
requiring a __KERNEL__ guard so userspace can use it to. Move it to
xfs_mount.h so that it it kernel-only, and let userspace redefine
it's own version of the structure containing only what it needs.
This gets rid of another __KERNEL__ check in the XFS header files.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:43 +0000 (22:05 +0000)]
xfs: move kernel specific type definitions to xfs.h
xfs_types.h is shared with userspace, so having kernel specific
types defined in it is problematic. Move all the kernel specific
defines to xfs_linux.h so we can remove the __KERNEL__ guards from
xfs_types.h
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:42 +0000 (22:05 +0000)]
xfs: remove __KERNEL__ check from xfs_dir2_leaf.c
It's actually an ifndef section, which means it is only included in
userspace. however, it's deep within the libxfs code, so it's
unlikely that the condition checked in userspace can actually occur
(search an empty leaf) through the libxfs interfaces. i.e. if it can
happen in usrspace, it can happen in the kernel, so remove it from
userspace too....
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:41 +0000 (22:05 +0000)]
xfs: remove __KERNEL__ from debug code
There is no reason the remaining kernel-only debug code needs to
remain kernel-only. Kill the __KERNEL__ part of the defines, and let
userspace handle the debug code appropriately.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:40 +0000 (22:05 +0000)]
xfs: kill __KERNEL__ check for debug code in allocation code
Userspace running debug builds is relatively rare, so there's need
to special case the allocation algorithm code coverage debug switch.
As it is, userspace defines random numbers to 0, so invert the
logic of the switch so it is effectively a no-op in userspace.
This kills another couple of __KERNEL__ users.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:39 +0000 (22:05 +0000)]
xfs: move swap extent code to xfs_extent_ops
Swapping extents is clearly an extent operaiton, and it is not
shared with userspace. Move the code to xfs_extent_ops.[ch], and
the userspace ioctl structure definition to xfs_fs.h where most of
the other ioctl structure definitions are. The means xfs_dfrag.h is
no longer needed in userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:38 +0000 (22:05 +0000)]
xfs: don't special case shared superblock mounts
Neither kernel or userspace support shared read-only mounts, so
don't beother special casing the support check to be different
between kernel and userspace. The same check canbe used as neither
like it...
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:37 +0000 (22:05 +0000)]
xfsprogs: sync minor kernel header differences
There are lots of little differences between kernel and userspace
headers noticable now that the files are largely the same. Clean up
all the formatting, whitespace and other minor differences in the
userspace headers.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:36 +0000 (22:05 +0000)]
xfs: create xfs_bmap_util.[ch]
There is a bunch of code in xfs_bmap.c that is kernel specific and
not shared with userspace. to minimise the difference between the
kernel and userspace code, shift this unshared code to
xfs_bmap_util.c, and the declarations to xfs_bmap_util.h.
The biggest issue here is xfs_bmap_finish() - userspce has it's own
definition of this function, and so we need to move it out of
xfs_bmap.[ch]. This means several other files need to include
xfs_bmap_util.c as well.
It also introduces and interesting dance for the stack switching
code in xfs_bmapi_allocate(). The stack switching/workqueue code is
actually moved to xfs_bmap_util.c, so that userspace can simply use
a #define in a header file to connect the dots without needing to
know about the stack switch code at all.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Tue, 10 Sep 2013 21:32:49 +0000 (21:32 +0000)]
libxfs: switch over to xfs_sb.c and remove xfs_mount.c
Now that the kernel code has split the superblock specific code out
of xfs_mount.c, we don't need xfs_mount.c anymore. Copy in xfs_sb.c
and remove xfs_mount.c
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:34 +0000 (22:05 +0000)]
xfs: split out the remote symlink handling
The remote symlink format definition and manipulation needs to be
shared with userspace, but the in-kernel interfaces do not. Split
the remote symlink format handling out into xfs_symlink_remote.[ch]
fo it can easily be shared with userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:33 +0000 (22:05 +0000)]
xfs: introduce xfs_inode_buf.c for inode buffer operations
The only thing remaining in xfs_inode.[ch] are the operations that
read, write or verify physical inodes in their underlying buffers.
Move all this code to xfs_inode_buf.[ch] and so we can stop sharing
xfs_inode.[ch] with userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:31 +0000 (22:05 +0000)]
xfs: move inode fork definitions to a new header file
The inode fork definitions are a combination of on-disk format
definition and in-memory tracking and manipulation. They are both
shared with userspace, so move them all into their own file so
sharing is easy to do and track. This removes all inode fork
related information from xfs_inode.h.
Do the same for the all the C code that currently resides in
xfs_inode.c for the same reason.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:30 +0000 (22:05 +0000)]
libxfs: move transaction code to trans.c
There is very little code left in xfs_trans.c. So little it is not
worthtrying to share this file with kernel space any more. Move the
code to libxfs/trans.c, and remove libxfs/xfs_trans.c.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:29 +0000 (22:05 +0000)]
libxfs: introduce xfs_trans_resv.c
The log space reservation calculation code has been separated from
the core transaction code in kernelspace. THi smeans we can add it
here in preparation for removing xfs_trans.c to further reduce the
differences between kernel and usrspace files.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:28 +0000 (22:05 +0000)]
xfs: introduce xfs_quota_defs.h
There are a lot of quota flag definitions that are shared by user
and kernel space. Move them all to xfs_quota_defs.h so we can
unshare xfs_quota.h and remove the __KERNEL__ regions from it.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:27 +0000 (22:05 +0000)]
xfs: introduce xfs_rtalloc_defs.h
There are quite a few realtime device definitions shared with
userspace. Move them from xfs_rtalloc.h to xfs_rt_alloc_defs.h
so we don't need to share xfs_rtalloc.h with userspace anymore.
This removes the final __KERNEL__ region from the XFS kernel
codebase. Yay!
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:26 +0000 (22:05 +0000)]
xfs: split out on-disk transaction definitions
There's a bunch of definitions in xfs_trans.h that define on-disk
formats - transaction headers taht get written into the log, log
item type definitions, etc. Split out everything into a separate
file so that all which remains in xfs_trans.h are kernel only
definitions.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:25 +0000 (22:05 +0000)]
xfs: separate icreate log format definitions from xfs_icreate_item.h
The on disk log format definitions for the icreate log item are
intertwined with the kernel-only in-memory log item definitions.
Separate the log format definitions out into their own header file
so they can easily be shared with userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:24 +0000 (22:05 +0000)]
xfs: separate dquot on disk format definitions out of xfs_quota.h
The on disk format definitions of the on-disk dquot, log formats and
quota off log formats are all intertwined with other definitions for
quotas. Separate them out into their own header file so they can
easily be shared with userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:23 +0000 (22:05 +0000)]
xfs: split out inode log item format definition
The EFI/EFD item format definitions are shared with userspace. Split
the out of header files that contain kernel only defintions to make
it simple to shared them.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:21 +0000 (22:05 +0000)]
xfs: split out inode log item format definition
The log item format definitions are shared with userspace. split the
out of header files that contain kernel only defintions to make it
simple to shared them.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:20 +0000 (22:05 +0000)]
xfs: separate out log format definitions
The on-disk format definitions for the log are spread randoms
through a couple of header files. Consolidate it all in a single
file that can be shared easily with userspace. This means that
xfs_log.h and xfs_log_priv.h no longer need to be shared with
userspace.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:19 +0000 (22:05 +0000)]
libxfs: local to remote format support of remote symlinks
This conversion was overlooked earlier on. Now that the differences
between userspace and kernel space are getting smaller this bug is
obvious. Fix it.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:18 +0000 (22:05 +0000)]
xfs: remove local fork format handling from xfs_bmapi_write()
The conversion from local format to extent format requires
interpretation of the data in the fork being converted, so it cannot
be done in a generic way. It is up to the caller to convert the fork
format to extent format before calling into xfs_bmapi_write() so
format conversion can be done correctly.
The code in xfs_bmapi_write() to convert the format is used
implicitly by the attribute and directory code, but they
specifically zero the fork size so that the conversion does not do
any allocation or manipulation. Move this conversion into the
shortform to leaf functions for the dir/attr code so the conversions
are explicitly controlled by all callers.
Now we can remove the conversion code in xfs_bmapi_write.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:17 +0000 (22:05 +0000)]
libxfs: fix compile warnings
Some of the code shared with userspace causes compilation warnings
from things turned off in the kernel code, such as differences in
variable signedness. Fix those issues.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:15 +0000 (22:05 +0000)]
libxfs: sync xfs_ialloc.c to the kernel code
include the missing xfs_difree() function. it's not used by
userspace, but it makes no sense to have just this one arbitrary
difference between the kernel and userspace files.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:10 +0000 (22:05 +0000)]
libxfs: fix byte swapping on constants
The kernel code uses cpu_to_beXX() on constants in switch()
statements for magic numbers in the btree code. The byte swapping
infratructure isn't hooked up to the proper byte swap macros to make
this work, so fix it and then swap all the generic btree code over
to match the kernel code.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
For CRC enabled filesystems, the BMBT is rooted in an inode, so it
passes through a difference code path on root splits to the
freespace and inode btrees. The inode based btree root has a
corruption problem on split - it's the same problem we saw in the
directory/attr code where headers are memcpy()d from one block to
another without updating the self describing metadata.
Simple fix - when copying the header out of the root block, make
sure the block number is updated correctly.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Dave Chinner [Wed, 4 Sep 2013 22:05:06 +0000 (22:05 +0000)]
xfsprogs: port inode create transaction changes
Bring across the relevant parts of the new inode create transaction
sufficient to keep kernel/user code in sync and implement the
infrastructure needed to make it work in xfsprogs.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Review-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
Eric Sandeen [Thu, 15 Aug 2013 02:26:40 +0000 (02:26 +0000)]
xfs_repair: zero out unused parts of superblocks
Prior to: 1375cb65 xfs: growfs: don't read garbage for new secondary superblocks
we ran the risk of allowing garbage in secondary superblocks
beyond the in-use sb fields. With kernels 3.10 and beyond, the
verifiers will kick these out as invalid, but xfs_repair does
not detect or repair this condition.
There is superblock stale-data zeroing code, but it is under a
narrow conditional - the bug addressed in the above commit did not
meet that conditional. So change this to check unconditionally.
Further, the checking code was looking at the in-memory
superblock buffer, which was zeroed prior to population, and
would therefore never possibly show any stale data beyond the
last up-rev superblock field.
So instead, check the disk buffer for this garbage condition.
If we detect garbage, we must zero out both the in-memory sb
and the disk buffer; the former may contain unused data
in up-rev sb fields which will be written back out; the latter
may contain garbage beyond all fields, which won't be updated
when we translate the in-memory sb back to disk.
The V4 superblock case was zeroing out the sb_bad_features2
field; we also fix that to leave that field alone.
Lastly, use offsetof() instead of the tortured (__psint_t)
casts & pointer math.
Reported-by: Michael Maier <m1278468@allmail.net> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Rich Johnston <rjohnston@sgi.com> Signed-off-by: Rich Johnston <rjohnston@sgi.com>
CID 1061528 (#1 of 1): Out-of-bounds access (OVERRUN)53. overrun-buffer-arg: Overrunning array "dinoc->di_pad" of 6 bytes by passing it to a function which accesses it at byte offset 15 using argument "16UL".
188 memset(dinoc->di_pad, 0, 16);
It seems that di_pad here should be di_pad2, as sekharan pointed out.
Mark Tinguely [Fri, 16 Aug 2013 18:12:43 +0000 (18:12 +0000)]
xfsprogs: fix inode crash in xfs_repair
Adding the lost+found in phase 6 could allocate an inode from
a new inode chunk. Since this chunk was not around in phase 3
when the inode chunks are verificated and added to the avl tree,
the avl tree look up will return a NULL pointer. This results
in a NULL defererence and segmentation fault.
Add the newly created inode chunk as if found in the chunk
verification phase.
Brian Foster [Fri, 9 Aug 2013 12:32:26 +0000 (12:32 +0000)]
xfsprogs/io: add readdir command
readdir reads the directory entries from an open directory from
the provided offset (or 0 if not specified). On completion,
readdir prints summary information regarding the number of
operations and bytes transferred. Options are available to specify
the starting offset, length and verbose mode to dump directory
entry information.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com>
/* account for newly allocated blocks in reserved blocks total */
args->total -= dp->i_d.di_nblocks - nblks;
but every call into this path from proto file parsing started
reserved / args->total as only "1" as passed tro newdirent() -
so if we allocate a block, args->total hits 0, and then in
xfs_dir2_node_addname():
/*
* Add the new leaf entry.
*/
rval = xfs_dir2_leafn_add(blk->bp, args, blk->index);
if (rval == 0) {
...
} else {
/*
* It didn't work, we need to split the leaf block.
*/
if (args->total == 0) {
ASSERT(rval == ENOSPC);
goto done;
}
/*
* Split the leaf block and insert the new entry.
*/
we hit the args->total == 0 special case, and don't do the next
split, and ENOSPC gets returned all the way up, and we fail.
So rather than calling newdirent with a total of "1" in every case,
which doesn't account for possible tree splits, we should call it
with a more appropriate value: XFS_DIRENTER_SPACE_RES(mp, name->len),
which will handle the maximum nr of block allocations that might be
needed during a directory entry insert.
Since the reservation required doesn't depend on entry type,
just push this down a level, into newdirent() itself.
Reported-by: Boris Ranto <branto@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:25:01 +0000 (22:25 +1000)]
xfs: don't emit v5 superblock warnings on write
We write the superblock every 30s or so which results in the
verifier being called. Right now that results in this output
every 30s:
XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
Use of these features in this kernel is at your own risk!
And spamming the logs.
We don't need to check for whether we support v5 superblocks or
whether there are feature bits we don't support set as these are
only relevant when we first mount the filesytem. i.e. on superblock
read. Hence for the write verification we can just skip all the
checks (and hence verbose output) altogether.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:25:00 +0000 (22:25 +1000)]
xfs: rework remote attr CRCs
Note: this changes the on-disk remote attribute format. I assert
that this is OK to do as CRCs are marked experimental and the first
kernel it is included in has not yet reached release yet. Further,
the userspace utilities are still evolving and so anyone using this
stuff right now is a developer or tester using volatile filesystems
for testing this feature. Hence changing the format right now to
save longer term pain is the right thing to do.
The fundamental change is to move from a header per extent in the
attribute to a header per filesytem block in the attribute. This
means there are more header blocks and the parsing of the attribute
data is slightly more complex, but it has the advantage that we
always know the size of the attribute on disk based on the length of
the data it contains.
This is where the header-per-extent method has problems. We don't
know the size of the attribute on disk without first knowing how
many extents are used to hold it. And we can't tell from a
mapping lookup, either, because remote attributes can be allocated
contiguously with other attribute blocks and so there is no obvious
way of determining the actual size of the atribute on disk short of
walking and mapping buffers.
The problem with this approach is that if we map a buffer
incorrectly (e.g. we make the last buffer for the attribute data too
long), we then get buffer cache lookup failure when we map it
correctly. i.e. we get a size mismatch on lookup. This is not
necessarily fatal, but it's a cache coherency problem that can lead
to returning the wrong data to userspace or writing the wrong data
to disk. And debug kernels will assert fail if this occurs.
I found lots of niggly little problems trying to fix this issue on a
4k block size filesystem, finally getting it to pass with lots of
fixes. The thing is, 1024 byte filesystems still failed, and it was
getting really complex handling all the corner cases that were
showing up. And there were clearly more that I hadn't found yet.
It is complex, fragile code, and if we don't fix it now, it will be
complex, fragile code forever more.
Hence the simple fix is to add a header to each filesystem block.
This gives us the same relationship between the attribute data
length and the number of blocks on disk as we have without CRCs -
it's a linear mapping and doesn't require us to guess anything. It
is simple to implement, too - the remote block count calculated at
lookup time can be used by the remote attribute set/get/remove code
without modification for both CRC and non-CRC filesystems. The world
becomes sane again.
Because the copy-in and copy-out now need to iterate over each
filesystem block, I moved them into helper functions so we separate
the block mapping and buffer manupulations from the attribute data
and CRC header manipulations. The code becomes much clearer as a
result, and it is a lot easier to understand and debug. It also
appears to be much more robust - once it worked on 4k block size
filesystems, it has worked without failure on 1k block size
filesystems, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:59 +0000 (22:24 +1000)]
xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
xfs_attr3_leaf_compact() uses a temporary buffer for compacting the
the entries in a leaf. It copies the the original buffer into the
temporary buffer, then zeros the original buffer completely. It then
copies the entries back into the original buffer. However, the
original buffer has not been correctly initialised, and so the
movement of the entries goes horribly wrong.
Make sure the zeroed destination buffer is fully initialised, and
once we've set up the destination incore header appropriately, write
is back to the buffer before starting to move entries around.
While debugging this, the _d/_s prefixes weren't sufficient to
remind me what buffer was what, so rename then all _src/_dst.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:58 +0000 (22:24 +1000)]
xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
xfs_attr3_leaf_unbalance() uses a temporary buffer for recombining
the entries in two leaves when the destination leaf requires
compaction. The temporary buffer ends up being copied back over the
original destination buffer, so the header in the temporary buffer
needs to contain all the information that is in the destination
buffer.
To make sure the temporary buffer is fully initialised, once we've
set up the temporary incore header appropriately, write is back to
the temporary buffer before starting to move entries around.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:57 +0000 (22:24 +1000)]
xfs: correctly map remote attr buffers during removal
If we don't map the buffers correctly (same as for get/set
operations) then the incore buffer lookup will fail. If a block
number matches but a length is wrong, then debug kernels will ASSERT
fail in _xfs_buf_find() due to the length mismatch. Ensure that we
map the buffers correctly by basing the length of the buffer on the
attribute data length rather than the remote block count.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:56 +0000 (22:24 +1000)]
xfs: remote attribute tail zeroing does too much
When an attribute data does not fill then entire remote block, we
zero the remaining part of the buffer. This, however, needs to take
into account that the buffer has a header, and so the offset where
zeroing starts and the length of zeroing need to take this into
account. Otherwise we end up with zeros over the end of the
attribute value when CRCs are enabled.
While there, make sure we only ask to map an extent that covers the
remaining range of the attribute, rather than asking every time for
the full length of remote data. If the remote attribute blocks are
contiguous with other parts of the attribute tree, it will map those
blocks as well and we can potentially zero them incorrectly. We can
also get buffer size mistmatches when trying to read or remove the
remote attribute, and this can lead to not finding the correct
buffer when looking it up in cache.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:55 +0000 (22:24 +1000)]
xfs: remote attribute read too short
Reading a maximally size remote attribute fails when CRCs are
enabled with this verification error:
XFS (vdb): remote attribute header does not match required off/len/owner)
There are two reasons for this, the first being that the
length of the buffer being read is determined from the
args->rmtblkcnt which doesn't take into account CRC headers. Hence
the mapped length ends up being too short and so we need to
calculate it directly from the value length.
The second is that the byte count of valid data within a buffer is
capped by the length of the data and so doesn't take into account
that the buffer might be longer due to headers. Hence we need to
calculate the data space in the buffer first before calculating the
actual byte count of data.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:54 +0000 (22:24 +1000)]
xfs: remote attribute allocation may be contiguous
When CRCs are enabled, there may be multiple allocations made if the
headers cause a length overflow. This, however, does not mean that
the number of headers required increases, as the second and
subsequent extents may be contiguous with the previous extent. Hence
when we map the extents to write the attribute data, we may end up
with less extents than allocations made. Hence the assertion that we
consume th enumber of headers we calculated in the allocation loop
is incorrect and needs to be removed.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:53 +0000 (22:24 +1000)]
xfs: remote attribute lookups require the value length
When reading a remote attribute, to correctly calculate the length
of the data buffer for CRC enable filesystems, we need to know the
length of the attribute data. We get this information when we look
up the attribute, but we don't store it in the args structure along
with the other remote attr information we get from the lookup. Add
this information to the args structure so we can use it
appropriately.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 12:24:51 +0000 (22:24 +1000)]
xfs: Remote attr validation fixes and optimisations
- optimise the calcuation for the number of blocks in a remote xattr.
- check attribute length against MAX_XATTR_SIZE, not MAXPATHLEN
- whitespace fixes
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:26:10 +0000 (10:26 +1000)]
xfs_repair: support CRC enabled remote symlinks
Add support for verifying the contents of remote symlinks with CRCs.
Factor the remote symlink checking code out of the symlink function
so that it is clear what it is checking. This also reduces the
indentation and makes the code clearer.
Then add support for the CRC format by modelling the checking
function directly on the code that is used in the kernel for reading
and checking both remote symlink formats.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:26:09 +0000 (10:26 +1000)]
libxfs: fix dir3 freespace block corruption
When the directory freespace index grows to a second block (2017
4k data blocks in the directory), the initialisation of the second
new block header goes wrong. The write verifier fires a corruption
error indicating that the block number in the header is zero. This
was being tripped by xfs/110.
The problem is that the initialisation of the new block is done just
fine in xfs_dir3_free_get_buf(), but the caller then users a dirv2
structure to zero on-disk header fields that xfs_dir3_free_get_buf()
has already zeroed. These lined up with the block number in the dir
v3 header format.
While looking at this, I noticed that the struct xfs_dir3_free_hdr()
had 4 bytes of padding in it that wasn't defined as padding or being
zeroed by the initialisation. Add a pad field declaration and fully
zero the on disk and in-core headers in xfs_dir3_free_get_buf() so
that this is never an issue in the future. Note that this doesn't
change the on-disk layout, just makes the 32 bits of padding in the
layout explicit.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:26:08 +0000 (10:26 +1000)]
xfs_repair: fix btree block magic number mapping
The magic numbers for generic btree blocks were modified some time
ago (before the kernel code was committed) but the xfs_repair
mapping code was not updated to match. It's no longer a simple
mapping, so just make the code a dense array and use the magic
number as the search key.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:26:04 +0000 (10:26 +1000)]
xfs_mdrestore: recalculate sb CRC before writing
xfs_mdrestore writes the superblock after modifying it, and so the
CRC is not necessarily correct. Make sure the CRC is correct
before we write the superblock back.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:26:02 +0000 (10:26 +1000)]
mkfs.xfs: validate options for CRCs up front.
With CRC enabled filesystems, certain options are now not optional
and so are always enabled. Validate these options up front and
abort if options are specified that cannot be set.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:59 +0000 (10:25 +1000)]
xfs_repair: make directory freespace table CRC format aware.
We fail to take into account the format of the directory block when
reading the best free space form a directory data block for free
space block verification. This causes occasionaly failures in
xfstests.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:58 +0000 (10:25 +1000)]
xfs_repair: convert directory parsing to use libxfs structure
It turns out that xfs_repair copies xfs_db in rolling it's own
opaque directory types for the different block formats. It has a
little comment about how they are "shared" with xfs_db. Shared by
copy and pasting, rather than a common header, it would appear.
Anyway, same problems, need to use format aware definitions and
abstractions from libxfs so that everything is parsed properly.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:57 +0000 (10:25 +1000)]
xfs_db: update field printing for dir crc format changes.
Note that this also requires changing the type parsing to only
allow dir3 data block parsing on CRC enabled filesystems. This is
slighly more complex than it needs to be because of the way the
type table is walked and the assumption that all the entries are in
type number order.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:55 +0000 (10:25 +1000)]
xfs_db: convert directory parsing to use libxfs structure
xfs_db rolls it's own "opaque" directory types for the different
block formats. All it cares about is where the headers end and the
data starts, and none of the other details in the structures. Rather
than duplicate this for the dir3 format, we already have perfectly
good headers and abstraction functions for finding this information
in libxfs. Using these means that the dir2 code used for printing
fields, metadump and check need to be modified to use libxfs
definitions.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:51 +0000 (10:25 +1000)]
libxfs: determine inode size from version number, not struct xfs_dinode
xfs_db does not use the same structure types as libxfs when checking
inodes, and so cannot determine the size of the inode core by
passing a struct xfs_dinode to a function. We do, however, know the
raw version number, so we can pass that instead. Convert the code to
passing the inode version rather than a structure.
Note that this should probably be converted in the kernel code as
well.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:50 +0000 (10:25 +1000)]
xfs_db: disable modification for CRC enabled filessytems.
xfs_db does not have the IO infrastructure to calculate metadata
CRCs after modifying metadata. Hence xfs_db can only run in
read-only mode on filesystems with version 5 superblocks.
To fix this, xfs_db needs to have it's IO engine converted to use
the buffer based IO provided by libxfs rather than rolling it's own
IO routines. That is future work, so until this conversion is done,
only allow xfs_db to run in read-only mode on v5 filesystems.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:49 +0000 (10:25 +1000)]
xfsprogs: disable xfs_check for CRC enabled filesystems
Until xfs_db has full metadata CRC support, xfs_check will not be
able to fully verify filesystems in this format. Don't even
bother trying right now, and to make it simple to test full xfsprogs
installs with xfstests, just silently succeed.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Fri, 7 Jun 2013 00:25:47 +0000 (10:25 +1000)]
xfsprogs: add crc format support to repair
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
[minor whitespace, spelling, and coarse language cleanup -bpm]
Dave Chinner [Fri, 7 Jun 2013 00:25:45 +0000 (10:25 +1000)]
xfsprogs: Add verifiers to libxfs buffer interfaces.
Verifiers need to be used everywhere to enable calculation of CRCs
during writeback of modified metadata. Add then to the libxfs buffer
interfaces conver the internal use of devices to be buftarg aware.
Verifiers also require that the buffer has a back pointer to the
struct xfs_mount. To make this source level comaptible between
kernel and userspace, convert userspace to pass struct xfs_buftargs
around rather than a "device".
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>