Jim Meyering [Fri, 28 Jan 2011 20:19:50 +0000 (21:19 +0100)]
copy: make extent_copy use sparse_copy, rather than its own code
* src/copy.c (extent_copy): Before this change, extent_copy would fail
to create holes, thus breaking --sparse=auto and --sparse=always.
I.e., copying a large enough file of all zeros, cp --sparse=always
should introduce a hole, but with extent_copy, it would not.
Jim Meyering [Thu, 27 Jan 2011 19:57:17 +0000 (20:57 +0100)]
copy: factor sparse-copying code into its own function, because
we're going to have to use it from within extent_copy, too.
* src/copy.c (sparse_copy): New function, factored out of...
(copy_reg): ...here.
Remove now-unused locals.
Jim Meyering [Thu, 27 Jan 2011 16:30:08 +0000 (17:30 +0100)]
fiemap copy: avoid a performance hit due to very small buffer
* src/copy.c (extent_copy): Don't let what should have been a
temporary reduction of buf_size (to handle a short ext_len) become
permanent and thus impact the performance of all further iterations.
Jim Meyering [Sat, 22 Jan 2011 12:09:08 +0000 (13:09 +0100)]
copy: don't allocate a separate buffer just for extent-based copy
* src/copy.c (copy_reg): Move use of extent_scan to just *after*
we allocate the main copying buffer, so we can...
(extent_scan): Take a new parameter, BUF, and use that rather
than allocating a private buffer. Update caller.
Jim Meyering [Sat, 22 Jan 2011 11:55:58 +0000 (12:55 +0100)]
copy: tweak variable name; improve a comment
* src/copy.c (copy_reg): Rename a variable to make more sense from
caller's perspective: s/require_normal_copy/normal_copy_required/.
This is an output-only variable, and the original name could make
it look like an input (or i&o) variable.
Jim Meyering [Sat, 22 Jan 2011 11:36:03 +0000 (12:36 +0100)]
copy: call extent_copy also when make_holes is false, ...
so that we benefit from using extents also when reading a sparse
input file with --sparse=never.
* src/copy.c (copy_reg): Remove erroneous test of "make_holes"
so that we call extent_copy also when make_holes is false.
Otherwise, what's the point of that parameter?
Jim Meyering [Mon, 11 Oct 2010 09:19:02 +0000 (11:19 +0200)]
rename extent_scan member
* extent-scan.h [struct extent_scan]: Rename member:
s/hit_last_extent/hit_final_extent/. "final" is clearer,
since "last" can be interpreted as "preceding".
Jim Meyering [Mon, 11 Oct 2010 08:39:50 +0000 (10:39 +0200)]
fiemap copy: don't let write failure go unreported; adjust style, etc.
* src/copy.c (write_zeros): Add comments.
(extent_copy): Move decls of "ok" and "i" down to scope where used.
Adjust comments.
Rename local: s/holes_len/hole_size/
Print a diagnostic upon failure to write zeros.
bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Jim Meyering wrote:
> jeff.liu wrote:
>> Sorry for the delay.
>>
>> This is the new patch to isolate the stuff regarding to extents reading to a new module. and teach
>> cp(1) to make use of it.
>
> Jeff,
>
> I applied your patch to my rebased fiemap-copy branch.
> My first step was to run the usual
>
> ./bootstrap && ./configure && make && make check
>
> "make check" failed on due to a double free in your new code:
> (x86_64, Fedora 13, ext4 working directory)
>
> To get details, I made this temporary modification:
Hi Jim,
I am sorry for the fault, it fixed at the patch below.
Would you please revie at your convenience?
Changes:
========
1. fix write_zeros() as Jim's comments, thanks for pointing this out.
2. remove char const *fname from struct extent_scan.
3. change the signature of open_extent_scan() from "void open_extent_scan(struct extent_scan
**scan)" to "void open_extent_scan(struct extent_scan *scan)"; the reason is I'd like to reduce once
memory allocation for the extent_scan variable, instead, using stack to save it.
4. remove close_extent_scan() from a function defined at extent-scan.c to extent-scan.h as a Macro
definination, but it does nothing for now, since initial extent scan defined at stack.
5. add a macro "free_extents_info()" defined at extent-scan.h to release the memory allocated to
extent info which should be called combine with get_extents_info(), it just one line, so IMHO,
define it as macro should be ok.
I have done the memory check via `valgrind`, no issue found.
make test against cp/sparse-fiemap failed at the extent compare stage, but the file content is
identical to each other by comparing those two files "j1/j2" manually.
Is it make sense if we verify them through diff(1) since the testing file is in small size?
or we have to merge the contig extents from the output of `filefrag', I admit I have not dig into
the filefrag-extent-compare at the moment, I need to recall the perl language syntax. :-P.
>From 50a3338db06442fa2d789fd65175172d140cc96e Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu@oracle.com>
Date: Wed, 29 Sep 2010 15:35:43 +0800
Subject: [PATCH 1/1] cp: add a new module for scanning extents
* src/extent-scan.c: Source code for scanning extents.
Call open_extent_scan() to initialize extent scan.
Call get_extents_info() to get a number of extents for each iteration.
* src/extent-scan.h: Header file of extent-scan.c.
Wrap free_extent_info() as macro define to release the space allocated extent_info per extent scan.
Wrap close_extent_scan() as macro define but do nothing at the moment.
* src/Makefile.am: Reference it and link it to copy_source.
* src/copy.c: Make use of the new module, replace fiemap_copy() with extent_copy().
Paul Eggert [Wed, 9 Jun 2010 06:15:07 +0000 (08:15 +0200)]
copy.c: ensure proper alignment of fiemap buffer
* src/copy.c (fiemap_copy): Ensure that our fiemap buffer
is large enough and well-aligned.
Replace "0LL" with equivalent "0" as 3rd argument to lseek.
Jim Meyering [Sat, 5 Jun 2010 08:17:48 +0000 (10:17 +0200)]
copy.c: adjust comments, tweak semantics
* src/copy.c (fiemap_copy): Rename from fiemap_copy_ok.
Add/improve comments.
Remove local, "fail".
(fiemap_copy): Do not require caller to set
"normal_copy_required" before calling fiemap_copy.
Report ioctl failure if it's the 2nd or subsequent call.
Jie Liu [Thu, 13 May 2010 14:17:53 +0000 (22:17 +0800)]
tests: add a new test for FIEMAP-copy
* tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
loopbacked ext4 partition.
* tests/Makefile.am (sparse-fiemap): Reference the new test.
Jie Liu [Thu, 13 May 2010 14:09:30 +0000 (22:09 +0800)]
cp: Add FIEMAP support for efficient sparse file copy
* src/fiemap.h: Add fiemap.h for fiemap ioctl(2) support.
Copied from linux's include/linux/fiemap.h, with minor formatting changes.
* src/copy.c (copy_reg): Now, when `cp' invoked with --sparse=[WHEN] option, we
will try to do FIEMAP-copy if the underlaying file system support it, fall back
to a normal copy if it fails.
Paul Eggert [Fri, 21 Jan 2011 18:59:32 +0000 (10:59 -0800)]
manual: document floating point better
* doc/coreutils.texi (Floating point): New section.
(od invocation, tail invocation, sort invocation, printf invocation):
(sleep invocation, seq invocation): Refer and defer to it. See
<http://lists.gnu.org/archive/html/bug-coreutils/2011-01/msg00031.html>.
Jim Meyering [Thu, 20 Jan 2011 09:50:11 +0000 (10:50 +0100)]
build: update gnulib submodule to latest
The previous gnulib submodule reference was *still* to a
non-public commit. My submodule had a stray commit, so
the reference was always to a local merge commit.
Reported by Rob Vermaas.
Jim Meyering [Mon, 17 Jan 2011 11:36:58 +0000 (12:36 +0100)]
uniq: replace a wasteful loop with simple calculation
* src/uniq.c (find_field): Remove the byte-skipping loop altogether.
Instead, perform the simple calculation. This results in a 10%
performance improvement for large byte offsets.
Sami Kerola [Sun, 16 Jan 2011 23:27:06 +0000 (00:27 +0100)]
uniq: don't continue field processing after end of line
* NEWS (Bug fixes): Mention it.
* src/uniq.c (find_field): Stop processing loop when end of line
is reached. Before this fix, 'uniq -f 10000000000 /etc/passwd'
would run for a very long time.
Ondřej Vašík [Fri, 14 Jan 2011 15:38:57 +0000 (16:38 +0100)]
doc: specify how tr, echo, printf treat octal numbers
* doc/coreutils.texi (tr's Character sets): Document how a 9-bit
octal value is interpreted. tr does not ignore the ninth bit.
(echo invocation, printf invocation): Document that any ninth
bit in \OOO is ignored. (http://debbugs.gnu.org/7574)
Pádraig Brady [Tue, 11 Jan 2011 19:30:28 +0000 (19:30 +0000)]
maint: refactor to use read-file from gnulib
* bootstrap.conf: Add the read-file module
* src/ptx.c: Replace the original code which would
needlessly read SIZE_MAX bytes of files larger than this.
* src/shuf.c: Replace the original code.
Pádraig Brady [Thu, 13 Jan 2011 09:36:38 +0000 (09:36 +0000)]
maint: trivial system header file cleanups
* src/system.h: Note where it should be included, and
make ordering check portable to GLIBC > 2
* src/copy.c: Move <sys/ioctl.h> along with other system headers
as is done elsewhere.
* src/install.c: Move <sys/wait.h> along with other system headers
as is done elsewhere.
* src/ptx.c: Include <regex.h> rather than "regex.h" as
is done elsewhere. Note <regex.h> is kept after "system.h"
as per commit dba300a0.
Jim Meyering [Wed, 12 Jan 2011 20:20:34 +0000 (21:20 +0100)]
doc: clean up HACKING guidelines
* HACKING: Remove mention of "indent-tabs-mode: nil", since
we've remove all of those directives. No longer needed.
Remove dated (pre-emacs-23) reference regarding WhiteSpace mode.
Paul Eggert [Wed, 12 Jan 2011 02:07:15 +0000 (18:07 -0800)]
gnulib: Also use dtoastr and ldtoastr modules.
This adjusts to the recent splitting of the ftoastr module into 3
<http://lists.gnu.org/archive/html/bug-gnulib/2011-01/msg00199.html>.
* bootstrap.conf (gnulib_modules): Add dtoastr, ldtoastr,
as coreutils needs all 3 modules now.
Jim Meyering [Sat, 8 Jan 2011 16:44:55 +0000 (17:44 +0100)]
du: don't abort when a subdir is renamed during traversal
* NEWS (Bug fixes): Mention it.
* src/du.c (prev_level): Move declaration "up" to file-scope global.
(du_files): Reset prev_level to 0 upon abnormal fts_read termination.
Reported by Johathan Nieder in http://bugs.debian.org/609049
Also, improve a diagnostic.
* tests/du/move-dir-while-traversing: Test for the above.
* tests/Makefile.am (TESTS): Add it.
Jim Meyering [Mon, 17 Nov 2008 11:05:27 +0000 (12:05 +0100)]
maint: generate much of the THANKS file
Before this change, we had a tendency to manually list each
contributor's name in THANKS. Now, each commit "Author" is
included in the generated THANKS file automatically, and most
of the old THANKS file is now a template, THANKS.in.
We'll still have to manually list the names of people who report
problems without a usable patch.
* THANKS.in: New file, derived from THANKS, but removing names of
those who are listed as git log 'Author:'s.
* THANKS: Remove file.
* thanks-gen: New file.
* Makefile.am (THANKS): New rule.
(EXTRA_DIST): Add .mailmap, THANKS.in and thanks-gen.
* .gitignore: Add THANKS and THANKS-to-translators.
* .mailmap: Unify on single address and name-spelling per contributor.
Eric Blake [Thu, 30 Dec 2010 20:08:32 +0000 (13:08 -0700)]
maint: allow gettext 0.17 again
Commit 041c9c47 traded the 'gettext' module for the lighter 'gettext-h'
module, so as to not require the latest gettext release (we only need
the latest release if we ship gettext as a dependent library, but
coreutils has long preferred to use it as an external library).
But that commit overlooked two places necessary to allow the use of
gettext 0.17.
This does not force you to downgrade (using gettext 0.18.1.1 is still
just fine), nor does it affect tarballs (once a tarball is built
with a given gettext version, it can be built on other machines
regardless of what gettext version is present).
Pádraig Brady [Thu, 30 Dec 2010 01:36:59 +0000 (01:36 +0000)]
split: fix the suffix length calculation
* src/split.c (set_suffix_length): Only auto-calculate
the suffix length when the number of files is specified.
* tests/misc/split-a: Add a case to trigger the bug,
and exercise the suffix length auto-calculation.
* NEWS: Mention the fix.
Reported by Dmitry V. Levin and Sergey Vlasov at
https://bugzilla.altlinux.org/show_bug.cgi?id=24841
Paul Eggert [Thu, 23 Dec 2010 08:07:35 +0000 (00:07 -0800)]
csplit: diagnose file counter wraparound
* src/csplit.c (create_output_file): Detect overflow when the
file counter wraps around, and exit with a diagnostic. Formerly
the code silently wrapped around and wrote to the wrong file,
losing output data.
Paul Eggert [Thu, 23 Dec 2010 07:11:05 +0000 (23:11 -0800)]
getlimits: port to hosts with very wide int, or non-ASCII
* src/getlimits.c (decimal_ascii_add): Remove, replacing with ...
(decimal_absval_add_one): New function, with different signature,
which does not assume ASCII. All callers changed.
(print_int): Remove assumptions that integers fit in 206 bits, and
that characters are ASCII. These assumptions are portable in
practice but are easy to remove here.
Jim Meyering [Wed, 22 Dec 2010 10:49:25 +0000 (11:49 +0100)]
maint: correct test-related comments
* tests/mv/i-3: Adjust comment to match just-changed code.
Spotted by Pádraig Brady.
* tests/init.cfg (retry_delay_): Correct spelling of function name
in usage example.
Jim Meyering [Wed, 22 Dec 2010 10:10:23 +0000 (11:10 +0100)]
tests: adjust preceding change to handle general WERROR_CFLAGS values
* gnulib-tests/Makefile.am (test_xvasprintf_CFLAGS):
(test_lock_CFLAGS, test_tls_CFLAGS): Avoid a syntax error when
$(WERROR_CFLAGS) expands to more than one token.
Paul Eggert [Wed, 22 Dec 2010 09:48:27 +0000 (01:48 -0800)]
tests: do not assume compiler knows -Wxxx flags
* gnulib-tests/Makefile.am (test_xvasprintf_CFLAGS):
(test_lock_CFLAGS, test_tls_CFLAGS): Do not append GCC-specific
flags like -Wno-format-security unless the GCC-specific flag
-Werror is also specified. This avoids a "make check" failure on
Solaris when using Sun C 5.8.
Paul Eggert [Tue, 21 Dec 2010 01:40:31 +0000 (17:40 -0800)]
who: omit useless definitions of MAXHOSTNAMELEN
This prevents a compilation failure on Solaris 8, GCC 4.4.2, with
"configure --enable-gcc-warnings".
* src/who.c (MAXHOSTNAMELEN): Remove; no longer needed.
* src/pinky.c: Likewise.
Paul Eggert [Sun, 19 Dec 2010 04:02:45 +0000 (20:02 -0800)]
tests: sync init.sh from gnulib
* tests/init.sh (setup_): Initialize fail=0 before invoking mktempd_.
Ensure that IFS is defined initially.
(mktempd_): Remove fail=0 initialization; no longer needed.
Pádraig Brady [Sat, 18 Dec 2010 02:50:33 +0000 (02:50 +0000)]
cp: ensure backups are created when -T specified
* src/cp.c (do_copy): When -T is specified, initialize
the NEW_DST and SB variables, which are checked when
running: cp -T --force --backup file file
* tests/cp/backup-1: Add the -T case
Pádraig Brady [Sat, 18 Dec 2010 05:27:46 +0000 (05:27 +0000)]
sort: use at most 8 threads by default
* src/sort.c (main): If --parallel isn't specified,
restrict the number of threads to 8 by default.
If the --parallel option is specified, then
allow any number of threads to be set, independent
of the number of processors on the system.
* doc/coreutils.texi (sort invocation): Document the changes
to determining the number of threads to use.
Mention the memory overhead when using multiple threads.
* tests/misc/sort-spinlock-abuse: Allow single core
systems that support pthreads.
* tests/misc/sort-stale-thread-mem: Likewise.
* tests/misc/sort-unique-segv: Likewise.
* NEWS: Mention the change in behaviour.
Paul Eggert [Sat, 18 Dec 2010 06:39:47 +0000 (22:39 -0800)]
tests: set fail=0 by default
* tests/init.sh (setup_): Set fail=0. This was the intent as per
<http://lists.gnu.org/archive/html/bug-coreutils/2010-12/msg00058.html>
but the assignment in mktempd_ is ineffective, since mktempd_
is used inside `` and its assignments are in a subshell.
Paul Eggert [Fri, 17 Dec 2010 06:31:56 +0000 (22:31 -0800)]
sort: do not generate thousands of subprocesses for 16-way merge
Without this change, tests/misc/sort-compress-hang would consume
more than 10,000 process slots on my RHEL 5.5 x86-64 server,
making it likely for other applications to fail due to lack of
process slots. With this change, the same benchmark causes 'sort'
to consume at most 19 process slots. The change also improved
wall-clock time by 2% and user+system time by 14% on that benchmark.
* NEWS: Document this.
* src/sort.c (MAX_PROCS_BEFORE_REAP): Remove.
(reap_exited): Renamed from reap_some; this is a more accurate name,
since "some" incorrectly implies that it reaps at least one process.
All uses changed.
(reap_some): New function: it *does* reap at least one process.
(pipe_fork): Do not allow more than NMERGE + 2 subprocesses.
(mergefps, sort): Omit check for exited processes: no longer needed,
and anyway the code consumed too much CPU per line when 2 < nprocs.
Paul Eggert [Thu, 16 Dec 2010 21:55:13 +0000 (13:55 -0800)]
sort: fix hang with sort --compress
* NEWS: Document this.
* src/sort.c (UNCOMPRESSED, UNREAPED, REAPED): New constants.
(struct tempnode): New member 'state', to hold these constants.
The pid member is now undefined if state == UNCOMPRESSED.
(struct sortfile): Replace member 'pid' with member 'temp'.
(uintptr): Remove.
(proctab_hasher, proctab_comparator, register_proc, delete_proc):
Proctab entries are now struct tempnode *, not pid_t, to handle
the case where multiple tempnode objects correspond to the same
pid. This avoids a race condition that can cause a hang.
(register_proc): Arg is now struct tempnode *, not pid_t. All
callers changed.
(delete_proc): Set tempnode state to REAPED.
(create_temp_file): No need to set pid member here; it's now
done when the pid is known.
(maybe_create_temp, create_temp): Remove PPID arg. Return struct
tempnode *, not char *. All callers changed.
(maybe_create_temp): Set node state to UNCOMPRESSED or UNREAPED.
No need to set node->pid to 0.
(open_temp): Replace NAME and PID args with a single TEMP arg.
All callers changed. Wait only for unreaped children.
(zaptemp): Wait for decompressor to finish before removing its
temporary-file input. This avoids .nfsXXXX hassles with NFS
and fixes a race (leading to a hang) regardless of NFS.
(open_input_files): Adjust to new way of dealing with temp files
and their subprocesses.
* tests/Makefile.am (TESTS): Add misc/sort-compress-hang.
* tests/misc/sort-compress-hang: New file.
Paul Eggert [Thu, 16 Dec 2010 08:03:29 +0000 (00:03 -0800)]
sort: don't dump core when merging from input twice
* NEWS: Document this.
* src/sort.c (avoid_trashing_input): The previous fix to this
function didn't fix all the problems with this code. Replace it
with something simpler: just copy the input file. This doesn't
change the number of files, so return void instead of the updated
file count. Caller changed.
* tests/misc/sort-merge-fdlimit: Test for the bug.
Jim Meyering [Tue, 14 Dec 2010 08:08:37 +0000 (09:08 +0100)]
doc: tail: semi-deprecate --sleep-interval and --max-unchanged-stats
Those options are useful only on systems that lack inotify support
and in the unusual event that a system with inotify support must
resort to polling.
* src/tail.c (usage): Note that the --max-unchanged-stats=N and
--sleep-interval=N options are rarely useful on systems with
inotify support.
* doc/coreutils.texi (tail invocation): Likewise.
Paul Eggert [Tue, 14 Dec 2010 19:09:32 +0000 (11:09 -0800)]
sort: fix very-unlikely buffer overrun when merging to input file
* src/sort.c (avoid_trashing_input): Fix a typo that could cause a
buffer overrun in theory. In practice this is extremely unlikely,
as it requires running out of file descriptors in a small merge,
presumably because some other process is hogging all the OS's file
descriptors.
Paul Eggert [Tue, 14 Dec 2010 18:07:36 +0000 (10:07 -0800)]
tests: default to /tmp as the temporary directory
* tests/check.mk (TESTS_ENVIRONMENT): Default TMPDIR to /tmp,
rather than to the working directory; this is more common in
practice, which makes the tests more real-worldish; and it is
often faster. Also, it avoids some problems with NFS cleanups.
* tests/misc/sort-compress: Remove unnecessary code setting TMPDIR.
* tests/misc/sort-compress-proc: Likewise. Do the final sleep
only if TMPDIR is relative, which should be rarely given the
change to TESTS_ENVIRONMENT.
Paul Eggert [Tue, 14 Dec 2010 07:23:17 +0000 (23:23 -0800)]
sort: fix some --compress reaper bugs
* src/sort.c (uintptr): New type.
(enum procstate, struct procnode, update_proc): Remove.
(proctab_hasher, proctab_comparator, register_proc, wait_proc):
(reap_some): The proctab is now simply a hash of process-IDs
rather than of pointers to objects with reference counts and
states; this is smaller and faster and easier to understand.
(nprocs): Now pid_t, not size_t, since one cannot have more than
PID_MAX children.
(reap): If the argument is -1, wait; if 0 (a new value), do not.
Delete pid from proctab as needed. Ignore children that are not
in proctab, as they are from the program that exec'ed us and are
irrelevant to our success or failure.
(delete_proc, reap_all): New functions.
(open_temp): Register the child.
(sort): Clean up all children afterwards; without this patch,
'sort' sometimes missed failures in children due to race conditions.
* tests/Makefile.am (TESTS): Add misc/sort-compress-proc.
* tests/misc/sort-compress-proc: New file, to test for the
bugs fixed above.