Joel Rosdahl [Thu, 1 Aug 2019 22:04:24 +0000 (00:04 +0200)]
Describe new code style and switch from Uncrustify to clang-format
Changing source code style to what I prefer is something that I have
wanted to do for quite some time but I never got to it. Doing it now
feels like a good time since much code will be rewritten anyway as part
of the C++-ification.
Joel Rosdahl [Wed, 24 Jul 2019 11:18:17 +0000 (13:18 +0200)]
C++-ify source code
The ccache source code will be converted to C++, targeting C++11. This
commit only arranges the existing C-style code to be built as C++ code.
This makes it possible to call new C++ code from old C-style code.
Gradual conversion to C++ functionality and idioms will follow in a slow
and controlled fashion – no big bang rewrites.
The alternative would be to convert code in a top-down fashion, i.e.
only calling legacy C code from new C++ code, not the other way around.
That approach is however not a good idea since the code that will
benefit most from being written in proper C++ is code deep down in the
call chains.
Except for renaming source code files to .cpp and .hpp, this commit
makes minimal changes to make the code base buildable again, for
example:
- Instructs configure.ac to look for a mandatory C++11-compliant
compiler.
- Adds Makefile rules for building C++ code.
- Sets up Travis-CI to pass C++ compiler flags and similar to the build.
- Adds new casts where needed.
- Adds const keywords where needed.
- Renames variables to something else than C++ keywords (e.g.
“template”).
- Rearranges some code to avoid complaints about goto jumps that cross
variable lifetimes.
Joel Rosdahl [Fri, 19 Jul 2019 07:22:04 +0000 (09:22 +0200)]
Don’t pass -Werror and compilation-only options to the preprocessor
Clang emits warnings when it sees unused options, so when ccache runs
the Clang preprocessor separately, options that are not used by the
preprocessor will produce warnings. This means that the user may get
warnings which would not be present when not using ccache. And if
-Werror is present then the preprocessing step fails, which needless to
say is not optimal.
To work around this:
* Options known to have the above mentioned problem are not passed to
the preprocessor.
* In addition, -Werror is also not passed to the preprocessor so that
options not properly marked as “compiler only” will only trigger
warnings, not errors.
Joel Rosdahl [Wed, 17 Jul 2019 08:39:30 +0000 (10:39 +0200)]
Improve -x/--show-compression
- Ignore *.tmp.* files.
- Mention on-disk size (adjusted for disk block size) to make it match
the cache size reported by “ccache --show-stats”.
- Introduced “space savings” and “of original” percentages.
- Calculate compression ratio only for compressed files.
- Include “incompressible files” size, i.e. total size of .raw files and
files produced by previous ccache versions.
- Removed file counts since I don’t think that they are of much
interest.
- Handle unparsable manifest files from previous ccache versions
gracefully.
Joel Rosdahl [Mon, 15 Jul 2019 12:10:28 +0000 (14:10 +0200)]
Implement support for file cloning on Linux (Btrfs/XFS)
- Added a new file_clone (CCACHE_FILECLONE) configuration setting. If
set, ccache uses the FICLONE ioctl if available to clone files to/from
the cache. If file cloning is not supported by the file system, ccache
will silently fall back to copying (or hard linking if hard_link is
enabled).
- Compression will be disabled if file_clone is enabled, just like for
hard_link.
- file_clone has priority over hard_link.
- Tested on Btrfs and XFS on Linux 5.0.0.
Anders Björklund [Mon, 15 Jul 2019 13:28:26 +0000 (15:28 +0200)]
Add command to show compression statistics (#440)
This will only show information about the files that is knows about
(right magic bytes). So the file count might differ from what is shown
with the regular statistics (which shows all files, including old ones).
The terminology used here is a bit confused, the compression ratio is
supposed to grow upwards. Sometimes known as "space savings" instead,
so list both values (ratio and savings) to make the output more obvious.
Joel Rosdahl [Fri, 5 Jul 2019 19:43:07 +0000 (21:43 +0200)]
Reimplement the hard link mode
- Files stored by hard linking are saved as _N.raw files next to their
.result file, where N is the 0-based index of the entry in the .result
file.
- The .result file stores expected file sizes for the .raw files and the
code verifies that they are correct before retrieving the files from
the cache.
- The manual has been updated to mention the new file size check and
also some other caveats.
1. Hard links are error prone.
2. Compression will make hard links obsolete as a means of saving cache
space.
3. A future backend storage API will be easier to write.
Point 1 is still true, but since the result file now stores expected
file sizes, many inadvertent modifications of files will be detected.
Point 2 is also still true, but you might want to trade cache size for
speed in cases where increased speed actually is measurable, like with
very large object files.
Point 3 does not quite hold after thinking some more about future APIs.
I think that it will be relatively straight-forward to add operations
like supports_raw_files, get_raw_file and put_raw_file to the API.
Joel Rosdahl [Tue, 2 Jul 2019 11:57:11 +0000 (13:57 +0200)]
Probe whether the compiler produces a .dwo
GCC and Clang behave differently when given e.g. “-gsplit-dwarf -g1”:
GCC produces a .dwo file but Clang doesn’t. Trying to guess how the
different options behave for each compiler is complex and error prone.
Instead, Ccache now probes whether the compiler produced a .dwo and only
stores it if it was produced. On a cache hit, the .dwo is restored if it
exists in the previous result – if it doesn’t exist in the result, it
means that the compilation didn’t produce a .dwo.
Joel Rosdahl [Sun, 30 Jun 2019 12:01:38 +0000 (14:01 +0200)]
Add checksumming of cached content
Both compressed and uncompressed content are checksummed and verified.
The chosen checksum algorithm is XXH64, which is the same that the zstd
frame format uses (but ccache stores all 64 bits instead of only 32,
because why not?).
Joel Rosdahl [Sat, 29 Jun 2019 20:35:50 +0000 (22:35 +0200)]
Require libzstd and remove zlib support
* zlib has been removed. Good riddance!
* libzstd is now required for building ccache. However, it’s not bundled
like zlib was.
* To make it easier to build ccache on systems that lack an easily
installable libzstd, the configure script now offers a
--with-libzstd-from-internet option, which downloads a zstd source
release archive, unpacks it in the tree and sets up the Makefile to
build the library and link ccache (statically) with it.
* Enabled compression by default.
* Made compression level 0 mean “use a default level suitable for the
current compression algorithm”. For zstd, that’s initially level -1,
but that could change in the future. The reason for using 0 as a
special marker is that a future alternative compression algorithm
could have another reasonable default than zstd. (Let’s hope that
future algorithms don’t use level 0 for something.)
* Changed default compression level to 0.
Joel Rosdahl [Sat, 29 Jun 2019 18:39:00 +0000 (20:39 +0200)]
Restructure Travis configuration
In preparation for switching from zlib to zstd. I find it easier to use
a flat job list instead of a matrix and state settings explicitly for
the different jobs.
Joel Rosdahl [Sat, 29 Jun 2019 18:35:48 +0000 (20:35 +0200)]
Replace murmurhashneutral2 with xxHash (XXH64)
XXH64 is significantly faster than murmurhashneutral2 (on 64-bit
systems, which one can assume ccache almost always is running on these
days). This of course doesn’t matter for keys in hash tables, but it
opens up for using it as a checksumming algorithm for cached data as
well.
Joel Rosdahl [Sat, 29 Jun 2019 18:25:31 +0000 (20:25 +0200)]
Don’t try a higher zstd level than supported
If the user tries a higher level than supported by libzstd,
initialization will fail. Instead, let’s clamp the level to the highest
supported value.
Regarding negative levels: They are supported from libzstd 1.3.4, but
the query function ZSTD_minCLevel is only supported from 1.4.0 (from
1.3.6 with ZSTD_STATIC_LINKING_ONLY), so let’s not use it for
verification of the level. In libzstd 1.3.3 and older, negative levels
are silently converted to the zstd’s default level (3), so there’s no
major harm done if a user uses a negative level with older libzstd
versions.
Joel Rosdahl [Sat, 22 Jun 2019 20:51:42 +0000 (22:51 +0200)]
Use the compression API for reading and writing manifests
* Manifest and result files now share the same common header (sans the
magic bytes) and will be compressed using the common compression
settings.
* Removed the legacy “hash size” and “reserved” fields.
Joel Rosdahl [Sun, 16 Jun 2019 14:42:42 +0000 (16:42 +0200)]
Add a content size field to the result file header
The content size field says how much uncompressed data is stored in the
file. This can be used to relatively quickly determine the compression
rate for the whole cache by only inspecting each file’s header insted of
having to read and decompress all files.
Since the content size needs to be calculated before actually adding the
content to the result file, I’ve reverted back to let the format use a
“number of entries” field instead of an EOF marker (similar to Anders
Björklund’s original work in 0399be2d) since the information about the
number of files now has to be known beforehand.
Another subtle change is that the compression level field now is int8_t
instead of uint8_t to make it possible to record negative levels.
Igor Pylypiv [Mon, 10 Jun 2019 20:52:35 +0000 (13:52 -0700)]
Fix possible NULL pointer dereference (#433)
cppcheck:
[src/manifest.c:270] -> [src/manifest.c:269]: (warning)
Either the condition '!errmsg' is redundant or there is possible null pointer
dereference: errmsg.
Joel Rosdahl [Sat, 8 Jun 2019 18:49:39 +0000 (20:49 +0200)]
Improve naming of things
Some words are being used to mean several things in the code base:
* “object” can mean both “the object file (.o) produced by the compiler”
and “the result stored in the cache, including e.g. the .o file and .d
file”.
* “hash” can mean both “the state that the hash_* functions operate on”,
“the output of a hash function” and “the key used to index results
and manifests in the cache”.
This commits tries to make the naming more consistent:
* “object” means “the object file (.o) produced by the compiler”
* “result” means “the result stored in the cache, including e.g. the .o
file and .d file”.
* “struct hash” means “the state that the hash_* functions operate on”.
* “digest” means “the output of a hash function”. However, “hash” is
still used in documentation and command line output since I think that
“hash” is easier to understand for most people, especially since
that’s the term used by Git.
* “name” means “the key used to index results and manifests in the
cache”.