Alex Elder [Wed, 28 Sep 2011 10:57:12 +0000 (10:57 +0000)]
xfsprogs: libxcmd: use fs_device_number() consistently
The libxcmd code builds up a table that records information about
all filesystems that might be subject to quotas, as well as a set
of directories that are the roots of project quota trees.
When building the table, the device number for each affected
filesystem is determined (in fs_device_number()) using a call to
stat64(). It turns out that in all cases when doing this, a
directory path (and *not* a device special file path) is specified,
in which case the appropriate filesystem device id is found in the
st_dev field produce by the call to stat64() (i.e., the device id
for the mounted filesystem containing the path). Accordingly,
fs_device_number() always returns the st_dev field.
Another routine, fs_table_lookup(), looks up an entry in this table
based on the path name provided. However this function allows a
path to a device special file be provided. In that case the right
device id to use is found in the st_rdev field returned by stat64().
I found this to be confusing, and it took a while to convince
myself that this wasn't actually bug. (It wasn't initially clear
that device special files were never passed to fs_device_number().)
In order to prevent myself and others from ever wasting time like
this again, use fs_device_number() every time a device number is
needed, and in doing so determine it consistently in all cases (that
is--use st_rdev for device special files and st_dev otherwise).
In the process, change fs_device_number() to return an zero on
success (or an errno) rather than its first argument (or NULL).
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Alex Elder [Wed, 28 Sep 2011 10:57:10 +0000 (10:57 +0000)]
xfsprogs: libxcmd: kill "search" arg in fs_device_number()
The function fs_device_number() in libxcmd allows the caller to
optionally "search" in /dev for a given device path in order to look
up the dev_t that represents that device path.
If set, all that function does is prepend "/dev/" to the path to see
if that produces a device path that works. So it appears this might
have been to support providing just the basename of a device as a
shorthand for its full path.
In practice, the paths passed to this function with "search" set are
those used in the mount options for a mounted XFS filesystem for the
optional log and real-time device paths. When such paths are used
in the XFS mount path, they will have been subject to a AT_FDCWD
path lookup, so unless the process mounting the filesystem was
sitting in /dev no relative path would ever be specified as just the
basename.
Even though the "mounting with CWD=/dev" is a conceivable scenario,
I think it is not likely enough to warrant the special handling to
cover that case in fs_device_number().
So delete the code that retries with a "/dev" prepended, eliminate
the "search" argument that enables it, and fix the callers
accordingly.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Alex Elder [Wed, 28 Sep 2011 10:57:08 +0000 (10:57 +0000)]
xfsprogs: libxcmd: don't clobber fs_table on realloc()
In fs_table_insert(), realloc() is called to resize the global
fs_table. If it fails, it overwrites a previously valid fs_table
pointer with NULL.
Instead, assign the return value to a local temporary and overwrite
fs_table only if the realloc() call succeeds. The only defined
errno value for a realloc() failure is ENOMEM, so return that
explicitly in the event it fails.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Add a b_error field to struct xfs_buf so that we can return the
exact error fro libxfs_readbuf. And explicit error return would be
nice, but this requires large changes to common code that should be
done on the kernel side first.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
libxfs: handle read errors in libxfs_trans_read_buf
Libxfs_readbuf may return a NULL buffer to indicate that an
error happend during the read, but we currently ignore that
if libxfs_trans_read_buf is called with a NULL transaction
pointer. Fix this by copying the relevant code from the
kernel version of the routine, and also tidy the code up a
bit by using a common exit label.
This fixes a regression that was introduced in xfsprogs 3.0.0 by
commit:
"Implement buffer and inode caching in libxfs, groundwork
for a parallel version of xfs_repair."
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
xfs_repair: add printf format checking and fix the fallout
Add the gcc printf like attribute to the xfs_repair-internal logging
helpers, and fix the massive fallout. A large part of it is dealing
with the correct format for fixed size 64-bit types, but there were
a lot of real bug in there, including some that lead to crashed when
repairing certain corrupted filesystems on ARM based systems.
[Moved in a few more warning fixes from the next patch. -Alex]
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Anisse Astier <anisse@astier.eu> Signed-off-by: Alex Elder <aelder@sgi.com>
Eric Sandeen [Mon, 19 Sep 2011 21:45:06 +0000 (21:45 +0000)]
mkfs.xfs: don't increase agblocks past maximum
RH QA discovered this bug:
Steps to Reproduce:
1. Create 4 TB - 1 B partition
dd if=/dev/zero of=x.img bs=1 count=0 seek=4398046511103
2. Create xfs fs with 512 B block size on the partition
mkfs.xfs -b size=512 xfs.img
Actual results:
Agsize is computed incorrectly resulting in fs creation fail:
agsize (2147483648b) too big, maximum is 2147483647 blocks
This is due to the "rounding up" at the very end of the calculations;
there may be other places to alleviate the problem, but it seems
most obvious to simply skip the rounding up if it would create too
many blocks in the AG. Worst case, we lose 1 block per AG.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
We require the argument to suboptions to not only exist, but also contain
a non-empty string, as cvtnum can't handle empty strings properly. Also add
the missing argument check to the -l agnum suboption which was lacking it.
Reported-by: Chris Pearson <kermit4@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Alex Elder [Wed, 24 Aug 2011 21:53:43 +0000 (21:53 +0000)]
xfsprogs: xfs_quota: improve calculation for percentage display
The pct_to_string() function determines the percentage it produces
in a strange way. Simplify the function, and make it return the
simple rounded percentage value. Handle the case of an error
return from snprintf() as well.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
In projects_free_space_data() all of the block counts returned are
doubled. This was probably a mistaken attempt to convert to or from
512-byte basic block units. The caller expects the value returned
to be in 512-byte units, which is exactly what the fs_disk_quota
structure holds, so there should be no doubling.
The effect of this bug is that the disk space used by the "df"
xfs_quota command shows block counts twice what they should be.
Alex Elder [Wed, 24 Aug 2011 21:53:41 +0000 (21:53 +0000)]
xfsprogs: xfs_quota: return real-time used data as intended
In projects_free_space_data() the real-time used space consumption
is never set. Instead, that value is returned in the field that
should hold the quota limit.
Found by inspection. Never seen/noticed because we currently don't
support quotas when a filesystem has a realtime volume.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Theodore Ts'o [Mon, 1 Aug 2011 21:58:24 +0000 (17:58 -0400)]
build using CFLAGS passed in at configure
In order to build xfsprogs in a hermetic build, we need be able to
pass in -I and -L flags to the compiler and linker, respectively.
This needs to be used by the configure script, but we also need to
make sure these flags are used by the Makefiles as well.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Christoph Hellwig <hch@lst.de>
Dave Chinner [Mon, 25 Jul 2011 20:46:18 +0000 (06:46 +1000)]
libxfs: sync files with 2.6.38 kernel code
Bring the libxfs headers and code into sync with the 2.6.37 kernel code.
Update the rest of xfsprogs to work with the new code.
Note: this does not convert xfsprogs to the kernel xfs_trans_ijoin\ijoin_ref
interface, it maintains the older ijoin/ihold interface because of the
different way the inode reference counting works in libxfs. More work will be
needed to change it over to a manner compatible with the current kernel API.
Note: log sector size handling needs to be sorted out. Specifically,
initialising l_sectbb_log/l_sectBBsize correctly and removing the hacks in
xlog_bread and friends (libxlog/xfs_log_recover.c) to work around the fact they
are not initialised correctly. (FWIW, I don't think xfsprogs handles large log
sector size correctly as a result, and especially not if the log device sector
size is different to the data device sector size).
Testing:
Currently passes xfstests on x86_64 w/ 4k block sizes and 512 byte block/2k
directory block filesystems. No obvious regressions are occurring during
xfstests runs.
Dave Chinner [Mon, 25 Jul 2011 20:45:18 +0000 (06:45 +1000)]
libxlog: sync up with 2.6.38 kernel code
Update libxlog with the current 2.6.38 kernel code and well as
updating the necessary parts of libxfs and variaous header files to
ensure that it compiles correctly.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Dave Chinner [Mon, 25 Jul 2011 20:44:18 +0000 (06:44 +1000)]
libxfs: reintroduce old xfs_repair radix-tree code
The current kernel code uses radix trees more widely than the
previous code, so for the next sync we need radix tree support in
libxfs. Pull the old radix tree code out the xfs_repair git history
and move it into libxfs to simplify the kernel code sync.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Lukas Czerner [Tue, 28 Jun 2011 14:26:04 +0000 (14:26 +0000)]
xfs_repair: Check if agno is inside the filesystem
When getting an inode tree pointer from an array inode_tree_ptrs, we
should check if agno, which is used as a pointer to the array, lives
within the file system, because if it is not, we can end up touching
uninitialized memory. This may happen if we have corrupted directory
entry.
This commit fixes it by passing xfs_mount to affected functions and
checking if agno really is inside the file system.
This solves Red Hat bug #694706
Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
The order in which libraries are searched matters if you are using
static libraries. Since libblkid uses some functions from libuuid, it
needs to come before libuuid in the link line.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Christoph Hellwig <hch@lst.de>
configure.in: declare a requirement for at least autoconf 2.50
On Debian/Ubuntu systems, if autoconf version 2.13 is installed,
autoconf will try to automatically figure out whether autoconf 2.13 or
something more modern is required (since the autoconf maintainers,
curses be upon them, didn't bother to maintain compatibility between
autoconf 2.13 and 2.50). Unfortunately, the hueristics aren't
perfect, and although the configure.in file looks superficially like
it will be compatible with autoconf 2.13, it isn't. You will end up
with a number of very subtle compilation failures if you use autoconf
2.13.
So declare a requirement for autoconf 2.50 using AC_PREREQ(2.50).
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Christoph Hellwig <hch@lst.de>
Dave Chinner [Sat, 16 Jul 2011 17:41:19 +0000 (03:41 +1000)]
xfsprogs: don't hard code the shell binary
Recent changes to debian unstable user space have caused the
xfsprogs build to break as certain shell functionality is being
assumed by libtool and friends. The configure scripts test and
select the correct shell, but the input files ignore this and hard
code the shell to use and hence now break.
Fix this by using the shell that the configure scripts decide is the
right one to use.
Bill Kendall [Fri, 6 May 2011 16:42:57 +0000 (16:42 +0000)]
xfsprogs: fix open_by_handle memory leak
open_by_handle() calls handle_to_fshandle() which
allocates an fshandle on the heap, which is never
freed by open_by_handle(). There is no need to
call handle_to_fshandle() though, just pass the
fhandle (rather than fshandle) to handle_to_fsfd(),
like the other *_by_handle() functions do.
Signed-off-by: Bill Kendall <wkendall@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
Nathan Scott [Thu, 28 Apr 2011 23:15:30 +0000 (23:15 +0000)]
xfsprogs: resolve Debian readline build issue
Address the recently reported build issue with libreadline5/6, via
the gplv2 route. Since this appears to be a relatively recent pkg,
I made its use conditional so the deb build continues to work for
everyone not running a bleeding edge distro. Works For Me (tm).
This addresses Debian bug 553875: libreadline5-dev removal pending
Signed-off-by: Nathan Scott <nathans@debian.org> Signed-off-by: Alex Elder <aelder@sgi.com>
Command names should never be translated. Currently there is
'xfs_quota -x -c "project"...' in one locale (C) while
'xfs_quota -x -c "projekt"...' in another (pl_PL).
Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Alex Elder <aelder@sgi.com>
Ajeet Yadav [Wed, 4 May 2011 09:17:29 +0000 (11:17 +0200)]
xfs_repair: update the current key cache correctly in btree_update_key
Hang in phase 4 of xfs_repair (This hang is not easily reproducable),
that occur because of corruption in btree that xfs_repair uses.
Scenerio: This problem was in for loop of phase4.c:phase4():line 232
that never completes that reason was that in a very rare scenerio the
btree get corrupted so that the key in current node is greater than
the next node.
ex: current key = 2894 next key = 2880, and evaluate the for loop when j=2894
for (j = ag_hdr_block; j < ag_end; j += blen) {
bstate = get_bmap_ext(i, j, ag_end, &blen);
}
get_bmap_ext() with j=2894 will return blen=-14
j += blen -> j=2880
get_bmap_ext() with j=2880 will return blen=14
j += blen -> j=2894
endless toggeling to j
Solution: btree for fast performance caches the last accessed node at each
level in struct btree_cursor during btree_search, it will research the new
key in btree only if the given condition fails
Now consider the case: 2684 3552 3554
A> cur_key=3552 and prev_key=2684
B> In btree 3552 key is updated to 2880 with btree_update_key() but the cache is
not invalidated therefore cur_key=3552 still.
C> Insert a new key in btree=2894 with btree_insert()
btree_insert() first calls the btree_search() to get the correct
node to insert
the new key 2894 but since above if condition is still true it will
not research
the btree and will insert new key node between 2684 2894 3552 3554,
but in reality
cur_key=3552 is pointing to key=2880 which is less than 2894, so
the btree get
corrupted to 2684 2894 2880 3554.
D> Solution would be to invalidate cache after updating the old
key=3552 to new key=2880,
so that btree_search() researches in that case 2894 will be
inserted after 2880,
i.e 2684 2880 2894 3554.
or
E> Update the cache cur_key=new key this would be better in term of performance
as it will prevent researching of btree during next btree_search().
F> The btree was corrupted in phase 3 but hang was produced in phase 4.
Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Alex Elder [Wed, 30 Mar 2011 19:37:00 +0000 (19:37 +0000)]
xfsprogs: update CHANGES file for release
Update the CHANGES file, in preparation for releasing xfsprogs
3.1.5. Updated to modify debian/changelog, and to give appropriate
credit to contributors.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Nathan Scott <nathans@debian.org>
Alex Elder [Wed, 30 Mar 2011 17:52:39 +0000 (17:52 +0000)]
xfsprogs: avoid dot-directories when configuring
The "find" command used in the configure script to find localized
files searches through directories (including .git and .pc) that
really should be ignored. Change it so it skips over these
directories.
I think it's reasonable to assume any such "dot directory" should be
ignored, so this change skips any directory at the top level whose
name begins with ".".
Note that I found an odd anomaly in "find". If you do not supply
the "-print" argument, the pruned directory names show up in the
output. Supplying "-print" does not include them (and that's what
we want).
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Add a fiemap command that works almost exactly like bmap, but works on all
filesystem supporting the FIEMAP ioctl. It is formatted similarly and
takes similar flags, the only thing thats different is obviously it doesn't
pit out AG info and it doesn't make finding prealloc space optional.
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
Alex Elder [Fri, 18 Feb 2011 21:21:02 +0000 (21:21 +0000)]
xfsprogs: metadump: use printable characters for obfuscated names
There is probably not much need for an extreme amount of randomness
in the obfuscated names produced in metadumps. Limit the character
set used for (most of) these names to printable characters rather
than every permittable byte. The result makes metadumps a bit more
natural to work with.
I chose the set of all upper- and lower-case letters, digits, and
the dash and underscore for the alphabet. It could easily be
expanded to include others (or reduced for that matter).
This change also avoids ever having to retry after picking an
unusable character.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Mon, 7 Mar 2011 17:39:18 +0000 (17:39 +0000)]
xfsprogs: metadump: fix duplicate handling once and for all
This is a case where I think I've solved a problem to death.
The metadump code now stops rather than spinning forever in the face
of finding no obfuscated name that hasn't already been seen.
Instead, it simply gives up and passes the original name back to use
without obfuscation.
Unfortunately, as a result it actually creates entries with
duplicate names in a directory (or inode attribute fork). And at
least in the case of directories, xfs_mdrestore(8) will populate the
directory it restores with duplicate entries. That even seems to
work, but xfs_repair(8) does identify this as a problem and fixes it
(by moving duplicates to "lost+found").
This might have been OK, given that it was a rare occurence. But
it's possible, with short (5-character) names, for the obfuscation
algorithm to come up with only a single possible alternate name,
and I felt that was just not acceptable.
This patch fixes all that by creating a way to generate alternate
names directly from existing names by carefully flipping pairs of
bits in the characters making up the name.
The first change is that a name is only ever obfuscated once.
If the obfuscated name can't be used, an alternate is computed
based on that name rather than re-starting the obfuscation
process. (Names shorter than 5 characters are still not
obfuscated.)
Second, once a name is selected for use (obfuscated or not), it is
checked for duplicates. The name table is consulted to see if it
has already been seen, and if it has, an alternate for that name is
created (a different name of the same length that has the same hash
value). That name is checked in the name table, and if it too is
already there the process repeats until an unused one is found.
Third, alternates are generated methodically rather than by
repeatedly trying to come up with new random names. A sequence
number uniquely defines a particular alternate name, given an
existing name. (Note that some of those alternates aren't valid
because they contain at least one unallowed character.)
Finally, because all names are now maintained in the name table,
and because of the way alternates are generated, it's actually
possible for short names to get modified in order to avoid
duplicates.
The algorithm for doing all of this is pretty well explained in
the comments in the code itself, so I'll avoid duplicating any
more of that here.
Updates since last posting:
- Definition of ARRAY_SIZE() macro moved to "include/libxfs.h"
- Added some more background commentary:
- About the details of operation in flip_bit().
Specifically, that the table can be expanded as needed,
but that it is already way bigger than practically
necessary (and why it is that way).
- About the number of alternates available as the length
of a name increases.
- That the key cases we're interested in are names that are
around 5 characters in length. Less than that it's not
very important because we don't obfuscate the name, and
greater than that the odds of the result of conflicting
with an existing name are small.
- Basically, the density of meaning in this code is kind of
high, so it warrants a lot more comments to help make what
it's doing more apparent. So I fleshed this out, as requested
by Dave.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Fri, 25 Feb 2011 18:13:48 +0000 (18:13 +0000)]
xfsprogs: metadump: move duplicate name handling into its own function
Move the handling of duplicate names into its own function. As a
result, all names other than "lost+found" files (not just those that
get obfuscated) will be checked to avoid duplication.
This makes the local buffer newname[] in generate_obfuscated_name()
unnecessary, so just drop it and use the passed-in name.
Updates:
- A comment about handling of a leading '/' character is now modified
to match the updated code, rather than being deleted altogether.
- Renamed handle_duplicates() to be handle_duplicate_name().
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Fri, 18 Feb 2011 21:21:02 +0000 (21:21 +0000)]
xfsprogs: metadump: no need for local copy of name when obfuscating
The local "newname" buffer in obfuscate_name() is used to hold an
obfuscated name as it gets generated. But it is always copied back
into the passed-in name buffer, so we might as well just use the
name buffer passed directly and avoid the copy.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Fri, 18 Feb 2011 21:21:02 +0000 (21:21 +0000)]
xfsprogs: metadump: move obfuscation algorithm into its own function
Pull the name obfuscation algorithm into a separate function.
This separates it from the checking for duplicates and recording
of names that are found to be acceptable.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Fri, 18 Feb 2011 21:21:01 +0000 (21:21 +0000)]
xfsprogs: metadump: encapsulate the nametable code
The name table used to find duplicates in metadump.c is allocated
dynamically, but there's no real need to do so. Just make it a BSS
global array and drop the initialization call.
Meanwhile, encapsulate the code that adds entries to and looks up
entries in the table into their own functions. Use the lookup
function to detect a duplicate name in a case not previously
checked.
Change the naming scheme to use "nametable" as a prefix rather than
a suffix.
Finally, now that it's easy to do so, issue a warning if we find
that we're falling back to not obfuscating the name, but that name
has already been used in the current directory. (This can happen if
a obfuscated earlier happens to match a subsequently found "real"
name.)
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Fri, 25 Feb 2011 18:13:44 +0000 (18:13 +0000)]
xfsprogs: metadump: don't loop on too many dups
Don't just loop indefinitely when an obfuscated name comes up as a
duplicate. Count the number of times we've found a duplicate and if
if it gets excessive despite choosing names at random, just give up
and use the original name without obfuscation.
Technically, a typical 5-character name has 255 other names that can
have the same hash value. But the algorithm doesn't hit all
possible names (far from it) so duplicates are still possible.
Updates (v4):
- Rearranged things a bit so that if too many duplicates are
encountered, a warning gets emitted.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Fri, 18 Feb 2011 21:21:01 +0000 (21:21 +0000)]
xfsprogs: metadump: ensure dup table always has entry for obfuscated name
We need to ensure the nametable has a copy of all the names in a
directory (or attribute fork) in order to avoid creating duplicate
entries when obfuscating names. Currently there is an (unlikely)
case where the name is passed back without such an entry being
created. Reorder things so that won't happen.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Fri, 18 Feb 2011 21:21:01 +0000 (21:21 +0000)]
xfsprogs: metadump: use pointers in generate_obfuscated_name()
Switch from using array references to using pointers to refer to the
pathname characters as they get generated. Also limit the scope of
a few automatic variables.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Fri, 18 Feb 2011 21:21:01 +0000 (21:21 +0000)]
xfsprogs: metadump: adjust rather than start over when invalid byte found
The last 5 bytes of a name generated by generate_obfuscated_name()
can be selected such that they (along with all of the preceding
characters in the name) produce any desired value when hashed.
They are selected based on how their value affects the outcome of
the hash calculation for the obfuscated name. Each byte is XOR'd
into the hash at a certain position. The portion of the hash
affected by each of these last five bytes can be seen visually below
(where "last-3" means the byte 3 positions before the last byte in
the name):
(Note that byte (last-4) wraps around. The previous patch in this
series eliminated the effect of the upper 4 bits of that byte by
forcing them to be all be 0 bits.)
Using the (XOR) difference between the hash computed for the
beginning of the obfuscated name and the hash from the original
name, we directly determine the required final five bytes to make
the hashes for the two complete names match. The lower byte (bits
0-7) of that difference is used for the last character in the
obfuscated name, bits 7-14 for the second-to-last, and so on.
So we start with the difference between the hash from the complete
original name and the hash (so far) for a random string constituting
the first part of the obfuscated name. We extract five sets of 8
bits from the result at the positions indicated above, and those
8-bit values will become the final five bytes of the obfuscated
name. By assuming (or forcing) the top bit of each of these
extracted values to be 0 (by masking off the top bit), we can ignore
the overlapping portions when determining the bytes to use.
It's possible for this process to produce characters ('\0' and '/')
that are not allowed in valid names. If this occurs, the existing
code abandons the current obfuscated name and starts again from the
beginning. But there exist cases where this can lead to a
never-ending loop.
Arkadiusz Miśkiewicz encountered just such a name, "R\323\257NE".
That name produces hash value 0x3a4be740, which requires that the
obfuscated name uses '/' at position last-2. The current algorithm
starts over, but since there are no random characters in this
length-5 obfuscated name, no other possibility will be found, and
the process repeats forever.
This change modifies the algorithm used so that if a unallowed
character arises, we flip a bit in that character, along with
another "matching" bit in another (overlapping) character such that
the resulting hash is unchanged. The two unallowed characters in a
name are '\0' (0x00) and '/' (0x2f), and flipping any one bit in
either of those characters results in an allowed character.
So, starting with the first of these last 5 bytes (last-4), if its
"normal" value is one of the unallowed characters, we flip its low
bit and arrange to flip the high bit of its successor byte. The
remaining bytes are treated similarly.
The very last byte has a little different treatment. We can flip
its low bit, but it has no successor byte per se. Its effect on
the hash does, however overlap the upper four bits from byte
(last-4). We can therefore flip the corresponding bit in that (at
position 0x10).
There is one more case to consider. It's possible in that last
case that by flipping a bit in byte (last-4), we have converted
that byte to one that's not allowed. It turns out this won't ever
happen, because we know that byte was initially assigned a value
with its upper four bits clear. Flipping the bit at 0x10 cannot
therefore produce either 0x00 or 0x2f, so we don't need to treat
this case.
With these changes to the name generation algorithm, we avoid
any of the cases in which no alternate name can be found without
using an illegal character. We also avoid all looping due to bad
characters.
Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Fri, 18 Feb 2011 21:21:01 +0000 (21:21 +0000)]
xfsprogs: metadump: drop unneeded use of a random character
With the exception of the last five bytes, an obfuscated name is
simply a random string of (acceptable) characters. The last five
bytes are chosen, based on the random portion before them, such that
the resulting obfuscated name has the same hash value as the
original name.
This is done by essentially working backwards from the difference
between the original hash and the hash value computed for the
obfuscated name so far, picking final bytes based on how that
difference gets manipulated by completing the hash computation.
Of those last 5 bytes, all but the upper half of the first one are
completely determined by this process. The upper part of the first
one is currently computed as four random bits, just like all the
earlier bytes in the obfuscated name.
It is not actually necessary to randomize these four upper bits,
and we can simply make them 0.
Here's why:
- The final bytes are pulled directly from the hash difference
mentioned above, with the lowest-order byte of the hash
determining the last character used in the name.
- The upper nibble of the 5th-to-last byte in a name will affect the
lowest 4 bits of hash value and therefore the last byte of the
name. Those four bits are combined with the hash computed from
the random characters generated earlier.
- Because those earlier bytes were random, their hash value will
also be random, and in particular, the lowest-order four bits of
the hash will be random.
- So it doesn't matter whether we choose all 0 bits or some other
random value for that upper nibble of the byte at offset
(namelen - 5). When it's combined with the hash, the last byte of
the name will be random either way.
Therefore we will choose to use all 0's for that upper nibble.
Doing this simplifies the generation of two of the final five
characters, and makes all five of them get computed in a consistent
way. We'll still get some small bit of obfuscation for even
5-character names, since the upper bits of the first character will
generally be cleared and likely different from the original.
Add the use of a mask in the one case it wasn't used to be even more
consistent.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Fri, 18 Feb 2011 21:21:01 +0000 (21:21 +0000)]
xfsprogs: metadump: simplify '/' handling
In generate_obfuscated_name(), the incoming name is allowed to start
with a '/' character, in which case it is copied over to the new
name and ignored for the remainder of the hash calculation. A '/'
character is needlessly included at the beginning of each name
stashed in the duplicates table (regardless of whether one was
present in the name provided).
Simplify the affected code by processing the '/' right away, and
using a pointer thereafter for the start of the new name. Stop
including a leading '/' in the name used for duplicate detection.
Note: It is not clear a leading '/' character is ever even present
in a name presented for obfuscation. I have not investigated this
question; this change merely adjusts the code while preserving its
original functionality.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Alex Elder [Fri, 25 Feb 2011 18:13:37 +0000 (18:13 +0000)]
xfsprogs: metadump: some names aren't all that special
Move the check for short names out of is_special_dirent() and into
generate_obfuscated_name(). That way the check is more directly
associated with the algorithm that requires it.
Similarly, move the check for inode == 0, since that case has to do
with storing extended attributes (not files) in the name table.
As a result, is_special_dirent() is really only focused on whether a
given file is in the lost+found directory. Rename the function to
reflect its more specific purpose.
Updates (v3):
- The previous version did not properly skip the "lost+found"
directory itself; this one does.
- Created a new definition representing the name of the orphanage
directory. Encapsulate recognizing that directory into a new
macro, is_lost_found().
- Removed casts that eliminate a compile warning in calls to
libxfs_da_hashname(); will do them separately later if needed.
Updates (v4):
- Renamed is_lost_found() to be is_orphanage_dir(), and turned
it into an inline static function.
- Added parentheses around targets of the sizeof operation.
- Added a small bit of clarifying commentary in spots where
it was suggested.
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Dave Chinner [Wed, 23 Feb 2011 22:28:26 +0000 (09:28 +1100)]
xfs_repair: inode flags check should use flags
The RT bitmap inode format flag check should use the flag, not the
bit definition. As a result, it is incorrectly detecting inodes with
the prealloc flag set as has having an invalid bit set.
Eric Sandeen [Wed, 23 Feb 2011 16:28:55 +0000 (10:28 -0600)]
xfs_repair: Don't ever try to set the device blocksize
On 4k devices, we get this warning from repair:
# xfs_repair /dev/sdc2
xfs_repair: warning - cannot set blocksize 512 on block device /dev/sdc2: Invalid argument
Phase 1 - find and verify superblock...
...
but things proceed without trouble after that.
I'm unable to find any history or reason for setting the
device blocksize at the beginning of repair, and in any case,
things clearly work without doing so. So, let's just remove it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Tue, 22 Feb 2011 21:47:50 +0000 (08:47 +1100)]
xfs_repair: validate inode di_flags field
xfs-reapir is not validating the di_flags field in the inode for
sanity. Block fuzzing indicates that we are not picking situations
like the RT bit being set on filesystems without realtime devices.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
Ajeet Yadav [Thu, 3 Feb 2011 06:17:24 +0000 (06:17 +0000)]
xfsprogs: unhandled error check in libxfs_trans_read_buf
libxfs_trans_read_buf() is used in both mkfs.xfs & xfs_repair.
During stability testing we found some time occur pagefault in
mkfs.xfs, code inspection shows that if libxfs_readbuf() fails then
occurs a page fault in xfs_buf_item_init() called in
libxfs_trans_read_buf().
mkfs.xfs: unhandled page fault (11) at 0x00000070, code 0x017
Added NULL check and errno handling.
Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com> Signed-off-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Thu, 6 Jan 2011 10:20:25 +0000 (21:20 +1100)]
xfs_repair: multithread phase 2
Running some recent repair tests on broken filesystem meant running
phase 1 and 2 repeatedly to reproduce an issue at the start of phase
3. Phase 2 was taking approximately 10 minutes to run as it
processes each AG serially.
Phase 2 can be trivially parallelised - it is simply scanning the
per AG trees to calculate free block counts and free and used inodes
counts. This can be done safely in parallel by giving each AG it's
own structure to aggregate counts into, then once the AG scan is
complete adding them all together.
This patch uses 32-way threading which results in no noticable
slowdown on single SATA drives with NCQ, but results in ~10x
reduction in runtime on a 12 disk RAID-0 array.
Dave Chinner [Thu, 6 Jan 2011 06:24:00 +0000 (17:24 +1100)]
repair: warn if running in low memory mode
When checking large filesystems, xfs_repair makes an estimate of how
much RAM it will need to execute effectively. If the amount of RAM
is less than this, it reduces the bhash size and turns of
prefetching, which will substantially slow down the repair process.
Add a warning that indicates this is happening, along with a
recommendation of how much RAM repair calculates it needs to run
with prefetching enabled.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Ajeet Yadav [Tue, 1 Feb 2011 21:33:32 +0000 (14:33 -0700)]
xfs_repair: handle negative errors from read
xfs_repair does not handle read() errors while searching for secondary
superblocks. This problem is identified with a simple test case:
(1) delete primary superblock of xfs partition with
#dd if=/dev/zero of=/dev/sda1 bs=512 count=1
#sync
(2) run xfs_repair, and remove the storage while it is searching for
secondary superblocks
xfs_repair will loop forever, printing ............
Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Ajeet Yadav [Tue, 1 Feb 2011 21:28:40 +0000 (14:28 -0700)]
xfs_repair: fix pagefault due to unhandled NULL check in da_read_buf()
xfs_repair does not correctly handle bplist[i] for error situations in
function da_read_buf(). If libxfs_readbuf() fails then bplist[i] = NULL,
but error handing code calls libxfs_putbuf(bdlist[i]) for all indexes of i
without first checking whether its NULL. This result in pagefault in
libpthread library during pthread_mutex_unlock().
This problem is identified when we remove the storage while xfs_repair
is running on it.
Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Prefer /proc/mounts if it exists over /etc/mtab to get a correct picture
of the kernels mount table for this process. This works arounds some
userspace like pam_mount polluting /etc/mtab with incorrect entries.
Also remove the "mtab" global variable and instead pass it explicitly
to fsrallfs, like we already do for other functions.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Alex Elder [Fri, 30 Jul 2010 21:45:45 +0000 (21:45 +0000)]
xfsprogs: fix depend targets
There's no need to re-make the dependency files all the time. Make
it so the "depend" target rebuilds the ".dep" file only if necessary.
Also change the name of the dependency file created for "ltdepend"
to be ".ltdep".
Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
We need to substitute root_sbindir and root_libdir even for the case
where we don't have the different from the default prefix, otherwise
xfsprogs won't build for that case with rpath errors, and wouldn't
install correctly either.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Christian Kujau <lists@nerdbynature.de>
Peter Watkins [Fri, 9 Jul 2010 16:17:10 +0000 (09:17 -0700)]
xfs_db: validate btree block magic in the freesp command
Occasionally I've hit a SEGV while querying free space in xfs_db on a
mounted file system. In scanfunc_bno, block->bb_numrecs has crazy values.
And bb_magic is not XFS_ABTB_MAGIC.
Check for the correct magic number first, and return otherwise.
Signed-off-by: Peter Watkins <treestem@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Dave Chinner [Thu, 8 Jul 2010 00:20:22 +0000 (10:20 +1000)]
xfs_db: check for valid inode data pointer before dereferencing
When processing an inode, the code checks various flags to determine
whether to output messages or not. When checking the CLI provided
inode numbers to be verbose about, we fail to check if the inode
data structre returned is valid or not before dereferencing it.
Hence running xfs_check with the "serious errors only" flag, xfs_db
will crash. Fix up the "should we output" logic to be safe.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Dave Chinner [Tue, 6 Apr 2010 09:19:01 +0000 (19:19 +1000)]
xfs_fsr: Improve handling of attribute forks V2
If the file being defragmented has attributes, then fsr puts a dummy
attribute on the temporary file to try to ensure that the inode
attribute fork offset is set correctly. This works perfectly well
for the old style of attributes that use a fixed fork offset - the
presence of any attribute of any size or shape will result in fsr
doing the correct thing.
However, for attr2 filesystems, the attribute fork offset is
dependent on the size and shape of both the data and attribute
forks. Hence setting a small attribute on the file does not
guarantee that the two inodes have the same fork offset and
therefore compatible for a data fork swap.
This patch improves the attribute fork handling of fsr. It checks
the filesystem version to see if the old style attributes are in
use, and if so uses the current method.
If attr2 is in use, fsr uses bulkstat output to determine what the
fork offset is. If the attribute fork offsets differ then fsr will
try to create attributes that will result in the correct offset. If
that fails, or the attribute fork is too large, it will give up and just
attempt the swap.
This fork offset value in bulkstat new functionality in the kernel,
so if there are attributes and a zero fork offset, then the kernel
does not support this feature and we simply fall back to the existing,
less effective code.
Version 2:
- simplify the attribute creation to use a small fixed size attribute
- handle the fork offset not changing as attributes are added - it can take a
few attributes to move it from one offset to another
- comment the code better
- passes test 226 and reduces the number of unswappable inode pairs passed to
the (fixed) kernel to zero
Wengang Wang [Mon, 26 Apr 2010 17:49:41 +0000 (12:49 -0500)]
xfsprogs: mkfs manpage fix for -nsize/log
There are two limitations for the mkfs.xfs -nsize/log option:
1) directory block size must be a power of 2.
2) it can't be less than a file system block size.
Current man page don't include the above information. User could
be confused with errors, say "Illegal value xxx for -n size option", but
they can't find out the cause by checking the man page.
The patch adds the two limitations to the manpage.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Petr Salinger [Wed, 24 Mar 2010 03:21:15 +0000 (14:21 +1100)]
Resolve build issues on Debian GNU/kFreeBSD port.
Additional platform target added to build system, with similar
build options to Linux but ultimately making BSD syscalls (and
hence leveragin the existing FreeBSD port in places too).
Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Nathan Scott <nathans@debian.org> Signed-off-by: Nathan Scott <nathans@debian.org>
Dave Chinner [Sun, 14 Mar 2010 22:52:08 +0000 (09:52 +1100)]
xfsprogs: duplicate extent btrees in xfs_repair need locking
The per-ag duplicate extent btrees can be search concurrently from multiple
threads. This occurs when inode extent lists are being processed and inodes
with extents in the same AG are checked concurrently. The btrees have an
internal traversal cursor, so doing concurrent searches can result in the
cursor being corrupted for both searches.
Add an external lock for each duplicate extent tree and use it for searches,
inserts and deletes to ensure that we don't trash the state of any operation.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Dave Chinner [Wed, 17 Feb 2010 05:19:17 +0000 (16:19 +1100)]
xfsprogs: clean up make install build V2
The install targets did not get the silent treatment like the
normal build targets. Shut them up.
Also, remove the top level install target dependency on the default
target. Each sub-directory already defines the correct dependencies
for the install targets and so all the rebuilds can be done in one
traversal of the subdirectories via the install rules.
Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
We currently fail to detect that a device does indeed not contain any
signature and we are indeed fine to proceed with it due to mishandling
the return value of blkid_do_fullprobe. Fix that up and add some
better diagnostics of the blkid detection.
from RH bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=561870
# dd if=/dev/zero of=k bs=1MB count=2 seek=20; mkfs.xfs k
# mkfs.xfs: probe of k failed, cannot detect existing filesystem.
# mkfs.xfs: Use the -f option to force overwrite
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Only negatie return values from open mean we failed to open the device.
Without this check we do not print the usage message when no device is
specified. This leads to a weird failure in xfstests 122.
Reviewed-by: Eric Sandeen <sandeen@sandeen.ent> Signed-off-by: Christoph Hellwig <hch@lst.de>