git.ipfire.org Git - thirdparty/xfsprogs-dev.git/log

3.1.8 release

Signed-off-by: Ben Myers <bpm@sgi.com>

xfsprogs: update summaries in preparation for release 3.1.8

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Nathan Scott <nathans@debian.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

po: update polish translation

Signed-off-by: Ben Myers <bpm@sgi.com>

xfs_io: allow -F in open args, remove from help

Now that -F ("foreign") is automagic, we should no longer list
it in the help output for open, but we should still accept
it for compatibility; esp. as it is still in the case statement.
Oops.

Remove the -F option from the manpage open section as well.

Reported-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>

mkfs: fix compilation without libblkid

Treat the physical sector as equivalent to the logical one if compiling
without libblkid.

Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: fix messages from set_nlinks

Link counts are unsigned and need to be printed as such. Also only
print the varning about upgrading the inode version if the inode was
version 1 before.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: fix the variable-width nlink array

It looks like we currently never grow the variable-width nlink array
if only the on-disk nlink size overflows 8 bits. This leads to a major
mess in nlink counting, and eventually an assert in phase7.

Replace the indirect all mess with a union that allows doing proper
array arithmetics while we're at it.

Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: fix a few message formats in process_dinode_int

Always output newline after messages, and skip corruptions warnings
if handling uncertain inodes.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: fix incorrect use of thread local data in dir and attr code

The attribute and dirv1 code use pthread thread local data incorrectly in
a few places, which will make them fail in horrible ways when using the
ag_stride options.

Replace the use of thread local data with simple local allocations given
that there is no needed to micro-optimize these allocations as much
as e.g. the extent map. The added benefit is that we have to allocate
less memory, and can free it quickly.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Tom Crane <T.Crane@rhul.ac.uk>
Tested-by: Tom Crane <T.Crane@rhul.ac.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>

fsr: fix /proc/mounts parsing

Make sure we do not reject an XFS root mount just because /dev/root is also
listed in /proc/mounts. The root cause for this was the awkward getmntany
function, which is replaced with a broader reach find_mountpoint function
which replace getmntany and the surrounding code from the main routine in
a structured way. This changes the flow from finding a mounted filesystem
matching the argument and checking that it's XFS to find a mounted XFS
filesystem and thus fixes the bug.

Based on analysis and an earlier patch from
Carlos Maiolino <cmaiolino@redhat.com>.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfs_io: fix fiemap loop continuation

When the fiemap command needs to retrieve more extents from the
kernel via a subsequent IO, it calculates the next logical block to
retrieve in file system block units. the fiemap needs the start
offset in bytes, not filesystem blocks. Hence if the fiemap command
can loop forever retrieving the same blocks if the logical offset
offset of the next block in filesystem block units is smaller than
the number of bytes in a filessytem block. i.e. it will just loop
retreiving the first 32 extents from offset block zero.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

mkfs.xfs: properly handle physical sector size

This splits the fs_topology structure "sectorsize" into
logical & physical, and gets both via blkid_get_topology().

This primarily allows us to default to using the physical
sectorsize for mkfs's "sector size" value, the fundamental
size of any IOs the filesystem will perform.

We reduce mkfs.xfs's "sector size" to logical if
a block size < physical sector size is specified.
This is suboptimal, but permissable.

For block size < sector size, differentiate the error
message based on whether the sector size was manually
specified, or deduced.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

xfs_io: skip FSGEOMETRY call in openfile if no geom var present

sendfile_f() calls openfile() with NULL *geom even if it's on an
xfs filesystem, so we need to skip the ioctl if (!geom).

Fixes regression from d1b88183bb3fc5e338746db53269310348646753

Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_io: fix fiemap loop continuation

When the fiemap command needs to retrieve more extents from the
kernel via a subsequent IO, it calculates the next logical block to
retrieve in file system block units. the fiemap needs the start
offset in bytes, not filesystem blocks. Hence if the fiemap command
can loop forever retrieving the same blocks if the logical offset
offset of the next block in filesystem block units is smaller than
the number of bytes in a filessytem block. i.e. it will just loop
retreiving the first 32 extents from offset block zero.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfs_quota: check for size parsing errors

Doing something like

# xfs_quota -x -c 'limit -u bhard=1.2g ...

will cause cvtnum to fail and return a value of -1LL (because it
cannot parse the decimal), but the quota caller doesn't check
for this error value, casts it to U64, shifts right, and we end
up with an answer of 16 petabytes rather than erroring out.
Fix this.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reported-by: James Lawrie <james@jdlawrie.co.uk>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: update extent count after zapping duplicate blocks

When we find a duplicate extent in an extern format inode we do not zap
the whole inode, but just truncate it to the point where the duplicate
extent was found.  But the current code only updates di_nblocks for the
new size, but no di_nextents/di_anextents.  In most cases this isn't noticed,
but when moving such an inode to the lost+found directoy the consistency
check in xfs_iformat trips over it.  Fix this by updating the on-disk
extent count as part of the inode repair.

Note that we zap btree format inodes with duplicate block completely
at this point, so this fix doesn't apply to them.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reported-by: Arkadiusz Mi??kiewicz <arekm@maven.pl>
Tested-by: Arkadiusz Mi??kiewicz <arekm@maven.pl>
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfs_repair is recommended over xfs_check.

I see "use xfs_repair instead of xfs_check" hint on xfs@irc, mailing
lists and other places but the first source of information (xfs_check
man page) doesn't mention this. Improve that.

Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: cleanup inode record macros

Remove indirections in the inode record bit manipulation macros and flatten
them to a single level of inlines. Also use a common IREC_MASK define
instead of duplicating it for every bitmask.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: move extern declarations to headers

Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: mark local functions static

Also remove unused function, remove useless ARGSUSED annotations and
similar tiny cleanups.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: kill check_inode_block

It's a wrapper around check_aginode_block, but given that the only caller
already has the agno and agbno at hand it isn't overly useful.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfsprogs: extend fiemap configure check

Make the fiemap configure check consistent with the other
libc interface checks - perform a compile and link with a
complete set of symbols, macros and interfaces needed, as
opposed to a build with just the headers.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Nathan Scott <nathans@debian.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfsprogs: add debian linux-libc-dev build dependency

Add a build dependency on linux-libc-dev, to ensure we build
packages with have_fiemap set to true if the headers support
it. Noticed by Dave, some package builds didn't enable this
when they should have.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Nathan Scott <nathans@debian.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfs_io: deprecate the "-F" foreign flag

There's no real reason to force the user to specify "-F" for non-xfs
files, when we can just test for that after it's opened.

* Remove the -F flag from usage() & man pages, but still accept it.
* Set IO_FOREIGN when we open the file, if the fd tests as non-xfs.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

repair: handle filesystems with the log in allocation group 0

Sindre Skogen reported that repair chokes on a very small filesystem created
by mkfs.xfs from xfsprogs 2.9.4. It turned out that for some reason this
filesystem had the log in allocation group 0 and thus repairs validation
of the root inode number was off. Fix this by adding the log blocks if
the log is allocated in allocation group 0.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Sindre Skogen <sindre@workzone.no>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: fix an ABBA deadlock in inode prefetching

The inode prefetching code has a fixed limit of inodes that might are
submitted at a time. Unfortunately the buffers for them get locked
once the prefetching starts. That way the threads processing the inode
might get stuck on buffer locked, but not submitted for reading yet.

Fix this by kicking the queue as soon as we would have to wait on the
ra_count semaphore.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Tested-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: use recursive buffer locking

On a sufficiently corrupt filesystem walking the btree nodes might hit the
same node node again, which currently will deadlock. Use a recursion
counter to avoid the direct deadlock and let them normal loop detection
(two bad nodes and out) do its work. This is how repair behaved before
we added the lock when implementing buffer prefetching.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Tested-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: allocate and free extent records individually

Instead of allocating inode records in chunks and keeping a freelist of them
which gets released to the system memory allocator in one go use plain malloc
and free for them. The freelist just means adding a global lock instead
of relying on malloc and free which could be implemented lockless. In
addition smart allocators like tcmalloc have far less overhead than our
chunk and linked list.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: allocate and free inode records individually

Instead of allocating inode records in chunks and keeping a freelist of them
which never gets released to the system memory allocator use plain malloc
and free for them. The freelist just means adding a global lock instead
of relying on malloc and free which could be implemented lockless, and the
freelist is almost completely worthless as we are done allocating new
inode records once we start freeing them in major quantities.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: do not walk the unlinked inode list

Stefan Pfetzing reported a bug where xfs_repair got stuck eating 100% CPU in
phase3.  We track it down to a loop in the unlinked inode list, apparently
caused by memory corruption on an iSCSI target.

I looked into tracking if we already saw a given unlinked inode, but given
that we keep walking even for inodes where we can't find an allocation btree
record that seems infeasible.  On the other hand these inodes had their
final unlink and thus were dead even before the system went down.  There
really is no point in adding them to the uncertain list and looking for
references to them later.

So the simplest fix seems to be to simply remove the unlinked inode list
walk and just clear it - when we rebuild the inode allocation btrees these
will simply be marked free.

Reported-by: Stefan Pfetzing <stefan.pfetzing@1und1.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

mkfs: refuse to initialize a misaligned device if not forced using libblkid

This is a new version of a patch to fix the problem about the usage of 4k
sector devices when the device is not properly aligned. It makes mkfs to
refuse to initialize a xfs filesystem if the -f option is not passed at the
command line, and forces a 512b sector size if the user chooses to force
the device initialization.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

po: update polish translation

Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: fix more incorrect printf formats

Reported-by: Jakub Bogusz <qboosh@pld-linux.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: avoid ABBA deadlocks on prefetched buffers

Both the prefetch threads and actual repair processing threads can have
multiple buffers at a time locked, but they do no use a common locker
order, which can lead to ABBA deadlocks while trying to lock the buffers.

Switch the prefetch code to do a trylock and skip buffers that have
already been locked to avoid this deadlock.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: validate acl count before reading it

This prevents a segfault on a filesystem so badly corrupted by the RAID
controller that it could be considered fuzzed.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

3.1.7 release

Signed-off-by: Ben Myers <bpm@sgi.com>

xfsprogs: update summaries in preparation for release 3.1.7

Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

Workaround the Debian build dependency handling for libreadline5.

Evidently the build daemons process dependencies differently than local builds,
and expect the first of optional dependencies to be resolved. Flip the
ordering to match this dependency.

Signed-off-by: Nathan Scott <nathans@debian.org>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

revert the accidentally commited inode and ext chunk changes

I had unrelated changes in the repository when commiting Jakubs string fixes,
which accidentally got included in that commit. Revert them for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: properly mark lost+found inode as used

This patch makes mk_orphanage() to properly set the inode link count of
the recently allocated inode in the AVL tree, avoiding the lost+found
directory to be bypass the link count check in phase7 and possibly leaving
lost+found directory with a wrong link count.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

repair: add inline function to get ino tree node

Add get_inode_offset() inline function, which will return the offset
of a specific node in the AVL tree avoiding the need to calculate the
the offset each time it needs to be used.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfsprogs: add .ltdep to .gitignore

Make git ignore all .ltdep files which are created as part of the build
process.

~/xfsprogs # find | grep ltdep
./libxcmd/.ltdep
./libxfs/.ltdep
./libhandle/.ltdep
./libdisk/.ltdep
./libxlog/.ltdep

Reviewed-by: Alain Renaud <arenaud@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

fix up some format strings again

Signed-off-by: Christoph Hellwig <hch@lst.de>

update the polish translation

xfsprogs: fix various incorrect printf formats

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jakub Bogusz <qboosh@pld-linux.org>
Reviewed-by: Jakub Bogusz <qboosh@pld-linux.org>

3.1.6 release

Signed-off-by: Alex Elder <aelder@sgi.com>

xfsprogs: update summaries in preparation for release 3.1.6

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: Don't forget to initialize the radix tree subsystem

The libxfs code uses radix tree routines to manage a mount
point's m_perag_tree.  But the radix tree routines assume
that radix_tree_init() has been called to initialize the
height_to_maxindex[] global array, and this was not being
done.

This showed up when running mkfs.xfs on an ia64 system.  Since
it wasn't initialized, the array was filled with zeroes.  The
first time radix_tree_extend() got called (with index 0), the
height would be set to 1 and all would seem fine.

The *second* time it got called (with index 1) a problem would
arise--though we were apparently "lucky" enough for it not to
matter.  The following loop would simply reference invalid slots
beyond the end of the array until it happened upon one that was
non-zero.  (I've expanded the function radix_tree_maxindex() here.)

        /* Figure out what the height should be.  */
        height = root->height + 1;
        while (index > height_to_maxindex[height])
                height++;

As an example, this looped 1937 times before it found a non-zere
value that would cause it to break out of the loop.

Even that *seemed* to be OK.  But at the end of mkfs.xfs, when
it calls libxfs_umount(), non-initialized "slots" are dereferenced
and we hit a fault.

Wow.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>

repair: prevent blkmap extent count overflows

Fix a bunch of invalid read/write errors due to excessive blkmap
allocations when inode forks are corrupted. These show up some time
after making a blkmap allocation for 536870913 extents on i686,
which is followed some time later by a crash caused bymemory
corruption.

This blkmap allocation size overflows 32 bits in such a
way that it results in a 32 byte allocation and so access to the
second extent results in access beyond the allocated memory and
corrupts random memory.

==5419== Invalid write of size 4
==5419==    at 0x80507DA: blkmap_set_ext (bmap.c:260)
==5419==    by 0x8055CF4: process_bmbt_reclist_int (dinode.c:712)
==5419==    by 0x8056206: process_bmbt_reclist (dinode.c:813)
==5419==    by 0x80579DA: process_exinode (dinode.c:1324)
==5419==    by 0x8059B77: process_dinode_int (dinode.c:2036)
==5419==    by 0x805ABE6: process_dinode (dinode.c:2823)
==5419==    by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
==5419==    by 0x8054012: process_aginodes (dino_chunks.c:1024)
==5419==    by 0xFFF: ???
==5419==  Address 0x944cfb8 is 0 bytes after a block of size 32 alloc'd
==5419==    at 0x48E1102: realloc (in
/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==5419==    by 0x80501F3: blkmap_alloc (bmap.c:56)
==5419==    by 0x80599F5: process_dinode_int (dinode.c:2027)
==5419==    by 0x805ABE6: process_dinode (dinode.c:2823)
==5419==    by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
==5419==    by 0x8054012: process_aginodes (dino_chunks.c:1024)
==5419==    by 0xFFF: ???

Add overflow detection code into the blkmap allocation code to avoid
this problem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

repair: don't cache large blkmap allocations

We currently use thread local storage for storing blkmap allocations
from one inode to another as a way of reducing the number of short
term allocations we do. However, the stored allocations can only
ever grow, so once we've done a large allocation we never free than
memory even if we never need that much memory again. This can occur
if we have corrupted extent counts in inodes, and can greatly
increase the memory footprint of the repair process.

Hence if the cached blkmap array id greater than a reasonable number
of extents (say 100,000), then don't store the blkmap in TLS and
instead free it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

repair: handle memory allocation failure from blkmap_grow

If blkmap_grow fails to allocate a new chunk of memory, it returns
with a null blkmap. The sole caller of blkmap_grow does not check
for this failure, and so will segfault if this error ever occurs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

repair: fix a valgrind reported error on i686

Fix a potential prefetch read problem due to the first loop
execution of pf_batch_read potentially not initialising the fsbno
variable:

==10177== Thread 6:
==10177== Conditional jump or move depends on uninitialised value(s)
==10177== at 0x8079CAB: pf_batch_read (prefetch.c:408)
==10177== by 0x6A2996D: clone (clone.S:130)
==10177==

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

repair: handle repair of image files on large sector size filesystems

Because repair uses direct IO, it cannot do IO smaller than a sector
on the underlying device. When repairing a filesystem image, the
filesystem hosting the image may have a sector size larger than the
sector size of the image, and so single image sector reads and
writes will fail.

To avoid this, when checking a file and there is a sector size
mismatch like this, turn off direct IO. While there, fix a compile
bug in the IO_DEBUG option for libxfs which was found during triage.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfsprogs: libxcmd: ignore errors when initializing fs_table

When initializing fs_table, the full set of mounted filesystems and
the full set of defined projects are (or may be) examined.  If an
error ever occurs looking at one of these entries, the processing
loop just quits, skipping all remaining mounts or projects.

One mount or project being problematic is no reason to give
up entirely.  It may be that it is completely unrelated to
the mount point or project that the user wants to operate on.

So instead of quitting when an error occurs while adding
something to fs_table, proceed until all entries are added.

Meanwhile, the two affected functions are used for either
installing one entry in the table or for initializing the
table based on the full set of mounts or projects.  In
the former case, once the entry matching that was requested
has been found there is no need to continue searching for
other entries, so break out of the loop immediately in
that case.

It so happens that these two changes affect the exact
same portion of the code...

SGI PV 1017024

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: libxcmd: avoid exiting when an error occurs

In a number of spots handling setting up fs_table, libxcmd simply
prints a message and exits if an error occurs.  There should be no
real need to exit in these cases.  Notifying the user that something
went wrong is appropriate but this should not preclude continued
operation.  In a few cases the contents of fs_table built up so far
are discarded as well, and this too can be avoided.

Make it so errors do not lead to exits, nor do they result in
destroying fs_table.  Doing this requires returning a value from
fs_extract_mount_options() so its caller can skip other processing
in this case.  But in most cases we simply no longer exit, and no
longer destroy the fs_table.  This means there is no more use for
fs_table_destroy(), so it can be removed.

There is a sort of short-circuit exit in fs_table_insert_project()
that is unnecessary as well, so get rid of it.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: libxcmd: isolate strdup() calls to fs_table_insert()

Calls to fs_table_insert() are made in four places, and in all four
the mount directory and device name arguments passed are the result
of calls to strdup(). Rather than have all the callers handle
allocating and freeing of these strings, consolidate that into
fs_table_insert().

Only one place passes non-null values for the fslog and fsrt
arguments, and in that case it's easier to keep the allocation of
duplicate strings where they are in the caller. Add a comment in
fs_table_insert() to ensure that's understood.

Note also that fs_table_insert() is always called with both its
dir and fsname arguments non-null, so drop a check for that at
the top of the function.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: libxcmd: encapsulate fs_table initialization

Change fs_table_initialise() so it takes an array of mount points
and an array of project identifiers as arguments (along with their
respective sizes).

Change the quota code to provide fs_table_initialise() these arrays
rather than doing the individual mount point and project insertion
by itself. Other users just pass 0 counts, which results in filling
fs_table with entries for all mounted filesystems and all defined
projects.

This allows a few fs_table functions to be given private scope.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: libxcmd: avoid using strtok()

The strtok() library routine overwrites delimiting bytes in the
string it is supplied. It is also not length-constrained.

Since we're making a duplicate of the string anyway, and since we
are only finding the end of a single token, we can do both without
the need to modify the passed-in mount entry structure.

Add checking for memory allocation failures, and if one occurs just
exit (as is the practice elsewhere in this file).

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: libxcmd: rearrange some routines

Move the definition of a few routines around in the file to avoid
forward references in upcoming patches.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: xfs_quota: kill local variable "type" from free_f()

Only one value is ever really used for the "type" variable in
free_f(), and it indicates that either type of entry in fs_table
is wanted. Just get rid of the variable and make use of the
ability to provide 0 to fs_cursor_initialise() to indicate that.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: libxcmd: allow 0 as a wildcard fs_table entry type selector

In libxcmd a table is used to represent filesystems and directories
that could be subject to quota operations. A cursor mechanism is
used to search that table, and it includes a flag that indicates
whether the type of entry desired represents a directory (for project
quotas) or a mount point (otherwise). It also allows a search for
either type.

There is only call to fs_cursor_initialise() where both mount points
and project paths are requested--all others just requested one or
the other.

Change it so when searching fs_table (in fs_table_lookup() and
fs_cursor_next_entry()), a zero "flags" value is interpreted as a
wildcard, matching either type of entry.

Also add some commentary explaining the use of 0 as a wildcard, and
simplify fs_cursor_next_entry() a bit in the process.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: xfs_quota: don't print invalid quota file inode number

When the state of quota files is dumped, xfs_quota blindly shows
whatever inode number is returned by the kernel. If one of the
quota types is not enabled or enforced, the inode number provided
is an invalid value ((__u64) -1). Rather than print a meaningless
large integer, print "N/A" in its place to make interpreting the
result it a little more obvious.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: libxcmd: use fs_device_number() consistently

The libxcmd code builds up a table that records information about
all filesystems that might be subject to quotas, as well as a set
of directories that are the roots of project quota trees.

When building the table, the device number for each affected
filesystem is determined (in fs_device_number()) using a call to
stat64().  It turns out that in all cases when doing this, a
directory path (and *not* a device special file path) is specified,
in which case the appropriate filesystem device id is found in the
st_dev field produce by the call to stat64() (i.e., the device id
for the mounted filesystem containing the path).  Accordingly,
fs_device_number() always returns the st_dev field.

Another routine, fs_table_lookup(), looks up an entry in this table
based on the path name provided.  However this function allows a
path to a device special file be provided.  In that case the right
device id to use is found in the st_rdev field returned by stat64().

I found this to be confusing, and it took a while to convince
myself that this wasn't actually bug.  (It wasn't initially clear
that device special files were never passed to fs_device_number().)

In order to prevent myself and others from ever wasting time like
this again, use fs_device_number() every time a device number is
needed, and in doing so determine it consistently in all cases (that
is--use st_rdev for device special files and st_dev otherwise).

In the process, change fs_device_number() to return an zero on
success (or an errno) rather than its first argument (or NULL).

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: libxcmd: move fs_device_number() up in the file

No content change, just code movement in preparation for the next
patch.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: libxcmd: kill "search" arg in fs_device_number()

The function fs_device_number() in libxcmd allows the caller to
optionally "search" in /dev for a given device path in order to look
up the dev_t that represents that device path.

If set, all that function does is prepend "/dev/" to the path to see
if that produces a device path that works. So it appears this might
have been to support providing just the basename of a device as a
shorthand for its full path.

In practice, the paths passed to this function with "search" set are
those used in the mount options for a mounted XFS filesystem for the
optional log and real-time device paths. When such paths are used
in the XFS mount path, they will have been subject to a AT_FDCWD
path lookup, so unless the process mounting the filesystem was
sitting in /dev no relative path would ever be specified as just the
basename.

Even though the "mounting with CWD=/dev" is a conceivable scenario,
I think it is not likely enough to warrant the special handling to
cover that case in fs_device_number().

So delete the code that retries with a "/dev" prepended, eliminate
the "search" argument that enables it, and fix the callers
accordingly.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: libxcmd: simplify fs_table_lookup()

Move a block of invariant code out of the loop in fs_table_lookup(),
and add a few comments.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: libxcmd: don't clobber fs_table on realloc()

In fs_table_insert(), realloc() is called to resize the global
fs_table. If it fails, it overwrites a previously valid fs_table
pointer with NULL.

Instead, assign the return value to a local temporary and overwrite
fs_table only if the realloc() call succeeds. The only defined
errno value for a realloc() failure is ENOMEM, so return that
explicitly in the event it fails.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: add b_error

Add a b_error field to struct xfs_buf so that we can return the
exact error fro libxfs_readbuf. And explicit error return would be
nice, but this requires large changes to common code that should be
done on the kernel side first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

libxfs: handle short reads in libxfs_readbufr

Copy the code from libxfs_writebufr to handle short reads, and also
tidy up a formatting issue found in the libxfs_writebufr copy.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

libxfs: save errno before possibly overwriting it

Save away errno for a later error return before possibly overwriting
it in fprintf.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

libxfs: handle read errors in libxfs_trans_read_buf

Libxfs_readbuf may return a NULL buffer to indicate that an
error happend during the read, but we currently ignore that
if libxfs_trans_read_buf is called with a NULL transaction
pointer. Fix this by copying the relevant code from the
kernel version of the routine, and also tidy the code up a
bit by using a common exit label.

This fixes a regression that was introduced in xfsprogs 3.0.0 by
commit:

"Implement buffer and inode caching in libxfs, groundwork
for a parallel version of xfs_repair."

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfsprogs: fix some printf() warnings that show up for ia64

Get rid of remaining build warnings in xfsprogs.

This builds cleanly on ia64 and x86_64, and builds without any
printf() format-related warnings on i386.

[Moved a few warning fixes from this to the previous patch. -Alex]

Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfs_repair: add printf format checking and fix the fallout

Add the gcc printf like attribute to the xfs_repair-internal logging
helpers, and fix the massive fallout. A large part of it is dealing
with the correct format for fixed size 64-bit types, but there were
a lot of real bug in there, including some that lead to crashed when
repairing certain corrupted filesystems on ARM based systems.

[Moved in a few more warning fixes from the next patch. -Alex]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Anisse Astier <anisse@astier.eu>
Signed-off-by: Alex Elder <aelder@sgi.com>

mkfs.xfs: don't increase agblocks past maximum

RH QA discovered this bug:

Steps to Reproduce:
1. Create 4 TB - 1 B partition
    dd if=/dev/zero of=x.img bs=1 count=0 seek=4398046511103
2. Create xfs fs with 512 B block size on the partition
    mkfs.xfs -b size=512 xfs.img

Actual results:
Agsize is computed incorrectly resulting in fs creation fail:
agsize (2147483648b) too big, maximum is 2147483647 blocks

This is due to the "rounding up" at the very end of the calculations;
there may be other places to alleviate the problem, but it seems
most obvious to simply skip the rounding up if it would create too
many blocks in the AG.  Worst case, we lose 1 block per AG.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

mkfs: reject empty suboption arguments

We require the argument to suboptions to not only exist, but also contain
a non-empty string, as cvtnum can't handle empty strings properly. Also add
the missing argument check to the -l agnum suboption which was lacking it.

Reported-by: Chris Pearson <kermit4@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfsprogs: xfs_quota: improve calculation for percentage display

The pct_to_string() function determines the percentage it produces
in a strange way. Simplify the function, and make it return the
simple rounded percentage value. Handle the case of an error
return from snprintf() as well.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: xfs_quota: don't double project block counts

In projects_free_space_data() all of the block counts returned are
doubled. This was probably a mistaken attempt to convert to or from
512-byte basic block units. The caller expects the value returned
to be in 512-byte units, which is exactly what the fs_disk_quota
structure holds, so there should be no doubling.

The effect of this bug is that the disk space used by the "df"
xfs_quota command shows block counts twice what they should be.

SGI PV 1015651

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfsprogs: xfs_quota: return real-time used data as intended

In projects_free_space_data() the real-time used space consumption
is never set. Instead, that value is returned in the field that
should hold the quota limit.

Found by inspection. Never seen/noticed because we currently don't
support quotas when a filesystem has a realtime volume.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

add example xfs_info output and explanation to man page

Basing on irc discussions and questions about reading xfs_info
output I've added example in xfs_growfs manpage.

This is 2nd version of manpage patch which contains fixes provided
by Alex Elder.

Signed-off-by: Roman Ovchinnikov <coolthecold@gmail.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

build using CFLAGS passed in at configure

In order to build xfsprogs in a hermetic build, we need be able to
pass in -I and -L flags to the compiler and linker, respectively.
This needs to be used by the configure script, but we also need to
make sure these flags are used by the Makefiles as well.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>

libxfs: sync files with 2.6.38 kernel code

Bring the libxfs headers and code into sync with the 2.6.37 kernel code.
Update the rest of xfsprogs to work with the new code.

Note: this does not convert xfsprogs to the kernel xfs_trans_ijoin\ijoin_ref
interface, it maintains the older ijoin/ihold interface because of the
different way the inode reference counting works in libxfs. More work will be
needed to change it over to a manner compatible with the current kernel API.

Note: log sector size handling needs to be sorted out. Specifically,
initialising l_sectbb_log/l_sectBBsize correctly and removing the hacks in
xlog_bread and friends (libxlog/xfs_log_recover.c) to work around the fact they
are not initialised correctly. (FWIW, I don't think xfsprogs handles large log
sector size correctly as a result, and especially not if the log device sector
size is different to the data device sector size).

Testing:

Currently passes xfstests on x86_64 w/ 4k block sizes and 512 byte block/2k
directory block filesystems. No obvious regressions are occurring during
xfstests runs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>

libxlog: sync up with 2.6.38 kernel code

Update libxlog with the current 2.6.38 kernel code and well as
updating the necessary parts of libxfs and variaous header files to
ensure that it compiles correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

libxfs: reintroduce old xfs_repair radix-tree code

The current kernel code uses radix trees more widely than the
previous code, so for the next sync we need radix tree support in
libxfs. Pull the old radix tree code out the xfs_repair git history
and move it into libxfs to simplify the kernel code sync.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_repair: Check if agno is inside the filesystem

When getting an inode tree pointer from an array inode_tree_ptrs, we
should check if agno, which is used as a pointer to the array, lives
within the file system, because if it is not, we can end up touching
uninitialized memory. This may happen if we have corrupted directory
entry.

This commit fixes it by passing xfs_mount to affected functions and
checking if agno really is inside the file system.

This solves Red Hat bug #694706

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>

mkfs: link against libuuid after liblkid

The order in which libraries are searched matters if you are using
static libraries. Since libblkid uses some functions from libuuid, it
needs to come before libuuid in the link line.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>

configure.in: declare a requirement for at least autoconf 2.50

On Debian/Ubuntu systems, if autoconf version 2.13 is installed,
autoconf will try to automatically figure out whether autoconf 2.13 or
something more modern is required (since the autoconf maintainers,
curses be upon them, didn't bother to maintain compatibility between
autoconf 2.13 and 2.50). Unfortunately, the hueristics aren't
perfect, and although the configure.in file looks superficially like
it will be compatible with autoconf 2.13, it isn't. You will end up
with a number of very subtle compilation failures if you use autoconf
2.13.

So declare a requirement for autoconf 2.50 using AC_PREREQ(2.50).

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>

xfsprogs: don't hard code the shell binary

Recent changes to debian unstable user space have caused the
xfsprogs build to break as certain shell functionality is being
assumed by libtool and friends. The configure scripts test and
select the correct shell, but the input files ignore this and hard
code the shell to use and hence now break.

Fix this by using the shell that the configure scripts decide is the
right one to use.

Signed-off-by: Dave Chinner <dchinner@redhat.com>

xfsprogs: fix gcc 4.6 variable set but not used warnings

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfsprogs: fix open_by_handle memory leak

open_by_handle() calls handle_to_fshandle() which
allocates an fshandle on the heap, which is never
freed by open_by_handle(). There is no need to
call handle_to_fshandle() though, just pass the
fhandle (rather than fshandle) to handle_to_fsfd(),
like the other *_by_handle() functions do.

Signed-off-by: Bill Kendall <wkendall@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfsprogs: resolve Debian readline build issue

Address the recently reported build issue with libreadline5/6, via
the gplv2 route. Since this appears to be a relatively recent pkg,
I made its use conditional so the deb build continues to work for
everyone not running a bleeding edge distro. Works For Me (tm).

This addresses Debian bug 553875: libreadline5-dev removal pending

Signed-off-by: Nathan Scott <nathans@debian.org>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfsprogs: Don't translate command name

Command names should never be translated. Currently there is
'xfs_quota -x -c "project"...' in one locale (C) while
'xfs_quota -x -c "projekt"...' in another (pl_PL).

Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfs_repair: update the current key cache correctly in btree_update_key

Hang in phase 4 of xfs_repair (This hang is not easily reproducable),
that occur because of corruption in btree that xfs_repair uses.
Scenerio: This problem was in for loop of phase4.c:phase4():line 232
that never completes that reason was that in a very rare scenerio the
btree get corrupted so that the key in current node is greater than
the next node.

ex: current key = 2894 next key = 2880, and evaluate the for loop when j=2894
for (j = ag_hdr_block; j < ag_end; j += blen) {
        bstate = get_bmap_ext(i, j, ag_end, &blen);
}

get_bmap_ext() with j=2894 will return blen=-14
j += blen -> j=2880
get_bmap_ext() with j=2880 will return blen=14
j += blen -> j=2894
endless toggeling to j

Solution: btree for fast performance caches the last accessed node at each
level in struct btree_cursor during btree_search, it will research the new
key in btree only if the given condition fails

if (root->keys_valid && key <= root->cur_key && (!root->prev_value ||
key > root->prev_key))

Now consider the case: 2684 3552 3554
A> cur_key=3552 and prev_key=2684
B> In btree 3552 key is updated to 2880 with btree_update_key() but the cache is
   not invalidated therefore cur_key=3552 still.
C> Insert a new key in btree=2894 with btree_insert()
   btree_insert() first calls the btree_search() to get the correct
node to insert
   the new key 2894 but since above if condition is still true it will
not research
   the btree and will insert new key node between 2684 2894 3552 3554,
but in reality
   cur_key=3552 is pointing to key=2880 which is less than 2894, so
the btree get
   corrupted to 2684 2894 2880 3554.
D> Solution would be to invalidate cache after updating the old
key=3552 to new key=2880,
   so that btree_search() researches in that case 2894 will be
inserted after 2880,
   i.e 2684 2880 2894 3554.
   or
E> Update the cache cur_key=new key this would be better in term of performance
   as it will prevent researching of btree during next btree_search().
F> The btree was corrupted in phase 3 but hang was produced in phase 4.

Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>

3.1.5 release

Signed-off-by: Alex Elder <aelder@sgi.com>

xfsprogs: update CHANGES file for release

Update the CHANGES file, in preparation for releasing xfsprogs
3.1.5. Updated to modify debian/changelog, and to give appropriate
credit to contributors.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Nathan Scott <nathans@debian.org>

xfsprogs: avoid dot-directories when configuring

The "find" command used in the configure script to find localized
files searches through directories (including .git and .pc) that
really should be ignored.  Change it so it skips over these
directories.

I think it's reasonable to assume any such "dot directory" should be
ignored, so this change skips any directory at the top level whose
name begins with ".".

Note that I found an odd anomaly in "find".  If you do not supply
the "-print" argument, the pruned directory names show up in the
output.  Supplying "-print" does not include them (and that's what
we want).

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

xfs_io: add fpunch command for hole punching via fallocate

Add a fpunch command which simply uses fallocate to punch a hole for the
given offset and length.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfs_io: add -p for hole punching to falloc command

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfs_io: add fiemap command to xfs_io

Add a fiemap command that works almost exactly like bmap, but works on all
filesystem supporting the FIEMAP ioctl. It is formatted similarly and
takes similar flags, the only thing thats different is obviously it doesn't
pit out AG info and it doesn't make finding prealloc space optional.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfsprogs: metadump: drop a typedef in db/metadump.c

Use struct name_ent rather than its typedef, and just drop the
typedef entirely.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

xfsprogs: metadump: use printable characters for obfuscated names

There is probably not much need for an extreme amount of randomness
in the obfuscated names produced in metadumps.  Limit the character
set used for (most of) these names to printable characters rather
than every permittable byte.  The result makes metadumps a bit more
natural to work with.

I chose the set of all upper- and lower-case letters, digits, and
the dash and underscore for the alphabet.  It could easily be
expanded to include others (or reduced for that matter).

This change also avoids ever having to retry after picking an
unusable character.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>