Igor Pylypiv [Mon, 10 Jun 2019 20:52:35 +0000 (13:52 -0700)]
Fix possible NULL pointer dereference (#433)
cppcheck:
[src/manifest.c:270] -> [src/manifest.c:269]: (warning)
Either the condition '!errmsg' is redundant or there is possible null pointer
dereference: errmsg.
Joel Rosdahl [Sat, 8 Jun 2019 18:49:39 +0000 (20:49 +0200)]
Improve naming of things
Some words are being used to mean several things in the code base:
* “object” can mean both “the object file (.o) produced by the compiler”
and “the result stored in the cache, including e.g. the .o file and .d
file”.
* “hash” can mean both “the state that the hash_* functions operate on”,
“the output of a hash function” and “the key used to index results
and manifests in the cache”.
This commits tries to make the naming more consistent:
* “object” means “the object file (.o) produced by the compiler”
* “result” means “the result stored in the cache, including e.g. the .o
file and .d file”.
* “struct hash” means “the state that the hash_* functions operate on”.
* “digest” means “the output of a hash function”. However, “hash” is
still used in documentation and command line output since I think that
“hash” is easier to understand for most people, especially since
that’s the term used by Git.
* “name” means “the key used to index results and manifests in the
cache”.
Joel Rosdahl [Sat, 8 Jun 2019 11:25:49 +0000 (13:25 +0200)]
Improve how <MD4, number of hashed bytes> is represented
Internally, the tuple <MD4 hash, number of hashed bytes>,which is the
key used for cached results and manifests, was represented as 16 bytes +
1 uint32_t. Externally, i.e. in file names, it was represented as
<MD4>-<size>, with <MD4> being 32 hex digits and <size> being the number
of hashed bytes in human-readable form.
This commits changes the internal representation to 20 bytes, where the
last 4 bytes are the number of hashed bytes in big-endian order. The
external representation has been changed to match this, i.e. to be 40
hex digits. This makes the code slightly less complex and more
consistent. Also, the code that converts the key into string form has
been rewritten to not allocate on the heap but to just write the output
into a buffer owned by the caller.
struct file_hash (16 bytes + 1 uint32_t) has been renamed to struct
digest (20 bytes) in order to emphasize that it represents the output of
a hash algorithm that not necessarily gets file content as its input.
The documentation of the manifest format has been updated to reflect the
logical change of keys, even though the actual serialized content of
manifest files hasn’t changed. While at it, reading of the obsolete
“hash_size” and “reserved” fields has been removed. (Future changes in
the manifest format will be handled by just stepping the version.)
Joel Rosdahl [Thu, 6 Jun 2019 18:10:10 +0000 (20:10 +0200)]
Remove the hard link mode
Rationale:
* The hard link feature is prone to errors: a) changes to files outside
the cache will corrupt the cache, and b) the mtime field in the file's
i-node is used for different purposes by ccache and build tools like
make.
* The upcoming enabling of LZ4 compression by default will make the hard
link mode obsolete as a means of saving cache space.
* Not supporting hard links will make a future backend storage API
simpler.
Joel Rosdahl [Thu, 6 Jun 2019 11:44:16 +0000 (13:44 +0200)]
Improve error handling of (de)compressors
Previously, some kinds of corruption were not detected by the zlib
decompressor since it didn’t check that it had reached the end of the
stream and therefore didn’t verify the Adler-32 checksum.
Joel Rosdahl [Tue, 4 Jun 2019 19:49:52 +0000 (21:49 +0200)]
Use the compression API for results
It didn’t feel right to use zlib’s gzip format for the embedded content,
especially since other compression libraries don’t support a similar
interface. Therefore, use the standard low-level zlib API instead.
Joel Rosdahl [Thu, 30 May 2019 18:37:12 +0000 (20:37 +0200)]
Revise disk format for results
* Removed unused hash_size and reserved fields. Since there are no
hashes stored in in the result metadata, hash size is superfluous. The
reserved bits field is also unnecessary; if we need to change the
format, we can just step RESULT_VERSION and be done with it.
* Instead of storing file count in the header, store an EOF marker after
the file entries. The main reason for this is that files then can be
appended to the result file without having to precalculate how many
files the result will contain.
* Don’t include trailing NUL in suffix strings since the length is known.
* Instead of potentially compressing the whole file, added an
uncompressed header telling how/if the rest of the file is
compressed (which algorithm and level). This makes it possible to more
efficiently recompress files in a batch job since it’s possible to
reasonably efficiently check if a cached file should be repacked. The
reason for not having compression info in each subfile
header (supporting different compression algorithms/levels per
subfile) is to make the repacking scenario simpler.
* Prepared for adding support for “reference entries”, which refer to
other results. There are two potential use cases for reference
entries: a) deduplication and b) storing partial results with a
different compression algorithm/level. It’s probably only the
deduplication use case that is interesting, though. It can be done
either at cache miss time or later as a batch job. If we really want
to, we can in the future add similar “raw reference entries” that
refer to files stored verbatim in the storage, thus re-enabling hard
link functionality.
* Changed to cCrS as the magic bytes for result files. This is analogous
to the magic bytes used for manifest files.
* Added documentation of the format.
Luboš Luňák [Mon, 20 May 2019 19:18:16 +0000 (21:18 +0200)]
Fix PCH detection in depend mode (+test improvements) (#427)
* do not refer to Clang's PTH in tests
The PTH feature has been removed (https://reviews.llvm.org/D54547)
and according to the commit it has never really been used. Maybe this
made sense somewhen in the past, but now those .pth files must be PCHs
internally. This commit actually just changes the .pth extensions
to .pch to avoid confusion, technically nothing should change
except for filenames.
* try to share PCH tests between GCC and Clang
Clang is supposed to be a drop-in for GCC, so in general it should
be able to handle everything GCC can. That's not completely true
in practice, there are differences, but it doesn't make sense
to completely duplicate a testcase just because there are some
differences. So start creating a shared common base for the PCH
tests and do separately only tests that act differently.
* more sharing of PCH tests between GCC and Clang
There's e.g. no need to do all kinds of complex tests with both
.gch and .pch with Clang, except for checking that Clang finds
one of them if none is specified explicitly.
* log also when pch usage is detect from pragma pch_preprocess
* try harder to verify in tests that ccache detects PCH changes
Some of the tests did that, e.g. those 'file changed', but e.g. the cached
.gch creation did not. So try to intentionally change the .gch/.h and test
that it leads to a cache miss. Otherwise there might be a hit simply
because ccache failed to detect PCH usage and ignores the .gch completely.
* clean up #include vs -fpch-preprocess in pch tests
As the manpage says, -fpch-preprocess is needed only with the #include
form, otherwise it's pointless.
* do not mention sloppiness in pch tests, only no sloppiness
Since sloppiness is normally required, so no point in stating the obvious.
* test also -include-pch with clang
* hash also pch introduced only using -include
GCC does not output the pch in the .d dependencies file, so without
this there would be false cache hits.
* be consistent about sloppiness in pch tests
create pch -> pch_defines
use pch -> time_macros
* test CCACHE_PCH_EXTSUM more thoroughly and also with -include
* pch test for .gch file being in an extra directory
* doc corrections for how to use PCH with ccache
- ccache will fail to properly detect that -include a.h means using
a.h.gch if it requires using path from -I (they are not searched)
- -fpch-preprocess does nothing with Clang, it doesn't output
pragma GCC pch_preprocess and so #include form for PCHs doesn't work
* explain better problems of -MD/-MMD in depend mode
Pavol Sakac [Sun, 5 May 2019 19:04:30 +0000 (21:04 +0200)]
Fix object size verification + bump to 64 bit file sizes in manifest (#407)
Changed manifest format to save the actual file size along with hashed content size.
File size field in manifest updated to 64bits.
Manifest version set to 2.
Joel Rosdahl [Wed, 1 May 2019 12:51:45 +0000 (14:51 +0200)]
Improve fix in #400 to handle more cases
The dependency file name can come from e.g. DEPENDENCIES_OUTPUT as well,
so hash information about a /dev/null .d file after the argument
processing loop instead.
Joel Rosdahl [Wed, 1 May 2019 11:58:18 +0000 (13:58 +0200)]
Bail out on “-MF /dev/null”
This is an alternative fix for #397, based on the observation/assumption
that using “-MF /dev/null” is only ever used as part of a compiler probe
call in combination with “-c /dev/null -o /dev/null”, so there is little
reason to cache the result. The advantage of just bailing out is to
reduce the number of special cases we have to handle.
this is useful for determining the length of the generated argument string
* correctly handle @file syntax on Windows
the @file syntax means that the process reads command arguments from the
specified file. this is commonly used in order to shorten commands which
would otherwise be longer than the maximum length limit: many build systems
do this in all cases to avoid hitting this limit.
when a command exceeds 8192 characters on on Windows, ccache now writes
the parsed/modified arguments to a tmpfile and then runs the command using
that tmpfile with @tmpfile in order to preserve this mechanism and avoid hitting
the length limit
Joel Rosdahl [Mon, 22 Apr 2019 13:22:07 +0000 (15:22 +0200)]
Fix minitrace.c compilation error with GCC 7.3
The error/warning looks like this:
src/minitrace.c: In function ‘mtr_flush’:
src/minitrace.c:256:54: error: ‘%.*s’ directive output may be truncated writing up to 700 bytes into a region of size 252 [-Werror=format-truncation=]
snprintf(arg_buf, ARRAY_SIZE(arg_buf), "\"%s\":\"%.*s\"", raw->arg_name, 700, raw->a_str);
^~~~
In file included from /usr/include/stdio.h:862:0,
from src/minitrace.c:9:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:64:10: note: ‘__builtin___snprintf_chk’ output 6 or more bytes (assuming 706) into a destination of size 256
return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Joel Rosdahl [Mon, 15 Apr 2019 19:35:38 +0000 (21:35 +0200)]
Touch up NEWS.adoc
* Use “-” for bullets consistently.
* Use “curly quotation marks” instead of ``asciidoctor'' quotation style
for readability, and similar for apostrophes.
Joel Rosdahl [Sat, 13 Apr 2019 20:52:23 +0000 (22:52 +0200)]
Improve handling of debug levels
Fixes #368.
* Remember if we have seen any option on level 3.
* Let “-g0”, “-ggdb0” and similar cancel out any previously seen “-g”
options except “-gsplit-dwarf”. This is based on observations on how
GCC 7.3 behaves.
* Delay acting on seen debug options until after we have processed all
arguments. This way we can avoid e.g. hashing the current directory if
we get “-g3 -g0”.