Timm Bäder [Thu, 18 Mar 2021 09:25:24 +0000 (10:25 +0100)]
readelf: Pull advance_pc() in file scope
Make advance_pc() a static function so we can get rid of another nested
function. Rename it to run_advance_pc() and use a local advance_pc()
macro to pass all the local variables. This is similar to what the
equivalent code in libdw/dwarf_getsrclines.c is doing.
Noah [Thu, 10 Jun 2021 14:29:45 +0000 (10:29 -0400)]
debuginfod: PR25978 - Created the prefetch fdcache
The debuginfod fdcache-prefetch logic has been observed to show some
degeneracies in operation. Since fdcache evictions are done
frequently, and freshly prefetched archive elements are put at the
back of lru[], each eviction round can summarily nuke things that
were just prefetched .... and are just going to be prefetched again.
It would be better to have two lru lists, or being able to insert
newly prefetched entries somewhere in the middle of the list rather
than at the very very end.
Alice Zhang [Tue, 6 Jul 2021 20:12:43 +0000 (16:12 -0400)]
PR27531: retry within default retry_limit will be supported.
In debuginfod-client.c (debuginfod_query_server),insert a
goto statement for jumping back to the beginning of curl
handles set up if query fails and a non ENOENT error is returned.
Also introduced DEBUGINFOD_RETRY_LIMIT_ENV_VAR and default
DEBUGINFOD_RETRY_LIMIT(which is 2).
Correponding test has been added to tests/run-debuginfod-find.sh
debuginfod: PR27711 - Use -I/-X regexes during groom phase
The debuginfod -I/-X regexes operate during traversal to identify
those files in need of scanning. The regexes are not used during
grooming. This means that if from run to run, the regex changes so
that formerly indexed files are excluded from traversal, the data is
still retained in the index.
This is both good and bad. On one hand, if the underlying data is
still available, grooming will preserve the data, and let clients ask
for it. On the other hand, if the growing index size is a problem,
and one wishes to age no-longer-regex-matching index data out, there
is no way.
Let's add a debuginfod flag to use regexes during grooming.
Specifically, in groom(), where the stat() test exists, also check
for regex matching as in scan_source_paths(). Treat failure of the
regex the same way as though the file didn't exist.
Andrei Homescu [Tue, 29 Jun 2021 01:26:53 +0000 (18:26 -0700)]
libelf: Fix unaligned d_off offsets for input sections with large alignments
The mkl_memory_patched.o object inside the libmkl_core.a library from
the Intel Math Kernel Library version 2018.2.199 has this section
with an alignment of 4096 and offset of 0xb68:
[ 2] .data PROGBITS 0000000000000000 000b68 011000 00 WA 0 0 4096
Reading this file with libelf and trying to write it back to disk triggers
the following sequence of events:
1) code in elf_getdata.c clamps d_align for this section's data buffer
to the section's offset
2) code in elf32_updatenull.c checks if the alignment is a power of two
and incorrectly returns an error
This commit fixes this corner case by increasing the alignment to the
next power of two after the clamping, so the check passes.
A test that reproduces this bug using strip is also included.
Frank Ch. Eigler [Wed, 16 Jun 2021 22:49:10 +0000 (18:49 -0400)]
debuginfod test: fix groom/stale race condition
Additional tracing, and use of "% make check VERBOSE=1" in a .spec
file allowed tracking down of this intermittent problem. The race was
between a SIGUSR1 or two to a debuginfod server (triggering two
traverse/scan phases), followed shortly by a SIGUSR2 (triggering a
groom). If those signals were received too close together, the groom
phase could be stopped early, and the rm'd files not noticed.
New testsuite code adds metric polls after SIGUSR1 & SIGUSR2 to ensure
the respective processing phases are complete. It also turns on "set -x"
tracing, so as to avoid pulling out quite as much hair next time.
"make check VERBOSE=1" is also important for spec files.
Frank Ch. Eigler [Wed, 16 Jun 2021 14:49:49 +0000 (10:49 -0400)]
debuginfod tests: tolerate 000-perm files in cache-copy test
It appears possible for 000-permission files to sneak into the
test debuginfod-cache, which cp (or find|cpio) refuse to copy.
These files are OK not to copy, so ignore the error and proceed.
Omar Sandoval [Thu, 10 Jun 2021 00:45:57 +0000 (17:45 -0700)]
libdwfl: fix potential NULL pointer dereference when reading link map
When read_addrs() was moved into file scope, there was a mistake in
converting "buffer" from a closure variable to a parameter: we are
checking whether the pointer argument is NULL, not whether the buffer
itself is NULL. This causes a NULL pointer dereference when we try
to use the NULL buffer later.
Fixes: 3bf41d458fb6 ("link_map: Pull read_addrs() into file scope") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
PR27863: debuginfod optimization for concurrent requests
Sometimes, due to configuration error, mishap, or DoS misadventure, a
debuginfod server may receive near-concurrent requests for the exact
same data from multiple clients. In practically all cases, it is
beneficial to the clients, as well as the server, to serialize these
requests. This way, debuginfod does not waste CPU in repeatedly &
concurrently decompressing large archives or querying upstream
servers. Second and later requesters can benefit from the fdcache /
client-cache and get their results, probably earlier!
This patch adds an "after-you" queueing phase to servicing
http-buildid requests, whereby thereads serialize themselves on each
query URL being serviced at the moment. Prometheus metrics are added,
and the http GET trace line is modified to print the queue+service
times separately.
Hand-tested on large kernel-debuginfo's, and shows host CPU refusing
to multiply in the face of concurrent identical queries. The
automated test tries a hundred concurrent curls, at least some of
which are slow enough to trigger the "after-you" wait here.
CCLD elflint
ld: elflint.o: in function `check_attributes':
elflint.c:(.text+0xdcff): undefined reference to `buffer_left'
ld: elflint.c:(.text+0xe557): undefined reference to `buffer_left'
```
It happens due to possible external linkage of `buffer_left()`.
The change forces local linkage to always use local definition
(either inline or out-of-line).
Reported-by: Toralf Förster
Bug: https://bugs.gentoo.org/794601 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Fixes: e95d1fbb ("elflint: Pull left() in file scope") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Frank Ch. Eigler [Fri, 14 May 2021 22:37:30 +0000 (18:37 -0400)]
PR27859: correct 404-latch bug in debuginfod client reuse
PR27701 implemented curl handle reuse in debuginfod_client objects,
but with an unexpected bug. Server responses returning an error
"latched" because the curl_easy handles for error cases weren't all
systematically removed from the curl multi handle. This prevented
their proper re-addition the next time.
This version of the code simplfies matters by making only the curl
curl_multi handle long-lived. This turns out to be enough, because it
can maintain a pool of long-lived http/https connections and related
data, and lend them out to short-lived curl_easy handles. This mode
handles errors or hung downloads even better, because the easy handles
don't undergo complex state transitions between reuse.
A new test case confirms this correction via the federating debuginfod
instance (cleaning caches between subtests to make sure http* is being
used and reused).
Dmitry V. Levin [Wed, 12 May 2021 15:00:00 +0000 (15:00 +0000)]
elfcompress: fix exit status regression in case of "Nothing to do"
When elfcompress decides that no section data needs to be updated and
therefore the file does not have to be rewritten, it still has to exit
with a zero status indicating success.
Resolves: https://sourceware.org/bugzilla/show_bug.cgi?id=27856 Fixes: c497478390de ("elfcompress: Replace cleanup() with label")
Mark Wielaard [Sat, 1 May 2021 16:00:49 +0000 (18:00 +0200)]
libdw: Document and handle DW_FORM_indirect in __libdw_form_val_compute_len
Update the documentation in __libdw_form_val_compute_len for handling
DW_FORM_indirect and make sure the indirect form isn't DW_FORM_indirect
itself or DW_FORM_implicit_const.
debuginfod: debuginfod client should cache negative results.
Add debuginfod_config_cache for reading and writing to cache
configuration files, make use of the function within
debuginfod_clean_cache and debuginfod_query_server.
In debuginfod_query_server, create 000-permission file on failed
queries. Before querying each BUILDID, if corresponding 000 file
detected, compare its stat mtime with parameter from
.cache/cache_miss_s. If mtime is fresher, then return ENOENT and
exit; otherwise unlink the 000 file and proceed to a new query.
tests: add test in run-debuginfod-find.sh
test if the 000 file is created on failed query; if querying the
same failed BUILDID, whether the query should proceed without
going through server; set the cache_miss_s to 0 and query the same
buildid, and this time should go through the server.
With PR25365, we accidentally lost the ability to rmdir client-cache
directories corresponding to buildids. Bring this back, with some
attention to a possible race between a client doing cleanup and
another client doing lookups at the same time.
libdw: handle DW_FORM_indirect when reading attributes
Whenever we encounter an attribute with DW_FORM_indirect, we need to
read its true form from the DIE data. Then, we can continue normally.
This adds support to the most obvious places: __libdw_find_attr() and
dwarf_getattrs(). There may be more places that need to be updated.
I encountered this when inspecting a file that was processed by our BOLT
tool: https://github.com/facebookincubator/BOLT. This also adds a couple
of test cases using a file generated by that tool.
Client objects now carry long-lived curl handles for outgoing
connections. This makes it more efficient for multiple sequential
queries, because the TCP connections and/or TLS state info are kept
around awhile, avoiding O(100ms) setup latencies. debuginfod is
adjusted to take advantage of this for federation. Other clients
should gradually do this too, perhaps including elfutils itself (in
the libdwfl->debuginfod_client hooks).
A large gdb session with 117 debuginfo downloads was observed to run
twice as fast (45s vs. 1m30s wall-clock time), just in nuking this
extra setup latency. This was tested via a debuginfod intermediary:
it should be even faster once gdb reuses its own debuginfod_client.
Dmitry V. Levin [Sun, 21 Mar 2021 08:00:00 +0000 (08:00 +0000)]
po: update XGETTEXT_OPTIONS
Recognize sgettext as a macro which is used for translations.
Flag _, N_, and sgettext with pass-c-format. The effect of this
specification is that xgettext will propagate format string
requirements for _, N_, and sgettext calls to their first arguments,
and thus mark them as format strings.
debuginfod: only update database stats once per groom
On very large servers, each database-stat counting pass can take tens
of minutes (!), and doing it twice per groom pass does not seriously
improve data quality. Just do it once, after stale data removal &
basic sqlite vacuum.
Mark Wielaard [Sat, 3 Apr 2021 17:36:12 +0000 (19:36 +0200)]
unstrip: Fix small leak in handle_output_dir_module.
eu-unstrip might leak a string for each module found when using the -d
option. Make sure to free the output_file name when we are done with the
module.
Mark Wielaard [Sat, 3 Apr 2021 17:20:32 +0000 (19:20 +0200)]
ar: Always close newfd in do_oper_insert.
newfd is normally created by mkstemp given the original fd exists.
Otherwise it will created by open from arfname. In the second case
newfd might not get closed. Preventd this by always trying to close
it after errout.
Frank Ch. Eigler [Tue, 30 Mar 2021 17:22:43 +0000 (13:22 -0400)]
debuginfod: Set child thread names via pthread_setname_np()
In order to assist problem diagnosis / monitoring, use this
gnu-flavoured pthread function to set purpose names to the various
child threads debuginfod starts. libmicrohttpd already sets this for
its threads.
Timm Bäder [Sun, 7 Mar 2021 18:02:29 +0000 (13:02 -0500)]
debuginfod-client: Don't compare a double to a long
Clang warns about this:
../../debuginfod/debuginfod-client.c:899:28: error: implicit conversion from 'long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Werror,-Wimplicit-int-float-conversion]
pa = (dl > LONG_MAX ? LONG_MAX : (long)dl);
~ ^~~~~~~~
/usr/lib64/clang/10.0.1/include/limits.h:47:19: note: expanded from macro 'LONG_MAX'
^~~~~~~~~~~~
<built-in>:38:22: note: expanded from here
^~~~~~~~~~~~~~~~~~~~
Modified for jakub's observation about LONG_MAX overflow.
Mark Wielaard [Wed, 3 Mar 2021 20:40:53 +0000 (21:40 +0100)]
readelf: Sanity check verneed and verdef offsets in handle_symtab.
We are going through vna_next, vn_next and vd_next in a while loop.
Make sure that all offsets are sane. We don't want things to wrap
around so we go in cycles.
Timm Bäder [Wed, 17 Feb 2021 09:27:06 +0000 (10:27 +0100)]
build: Check for -Wimplicit-fallthrough=5 separately
GCC accepts the =5, which means it doesn't try to parse any comments
and only accepts the fallthrough attribute in code. Clang does not ever
parse any comments and always wants the fallthrough attribute anyway.
Clang also doesn't accept the =n parameter for -Wimplicit-fallthrough.
Test for =5 separately and use it if supported and fall back to just
-Wimplicit-fallthrough otherwise.
Frank Ch. Eigler [Thu, 18 Feb 2021 00:34:09 +0000 (19:34 -0500)]
testsuite: run-debuginfod-find.sh: Fix grooming test indeterminacy
We were looking at a less-than-ideal metric to check the effects
of grooming on the database. It turns out there is a counter
just for removed files/archives, which will have the same value
regardless of the presence of other test configurations.
Mark Wielaard [Fri, 12 Feb 2021 15:42:44 +0000 (16:42 +0100)]
readelf: Type DIE offset is from start of CU.
While inspecting some type units I noticed the type offset seemed off.
We were printing the offset as is, but it should include the offset of
the unit. There was actually a testcase for this, run-readelf-types.sh
but that had the same bug in the expected output. Fixed both.
Mark Wielaard [Fri, 12 Feb 2021 15:28:50 +0000 (16:28 +0100)]
readelf, libdw: blocks aren't expressions for DWARF version 4
For DWARF version 4 or higher a block form really encodes a block,
not an expression location. Also constant offsets can be expressed
as DW_FORM_implicit_const in DWARF version 5.
Frank Ch. Eigler [Sun, 14 Feb 2021 21:02:05 +0000 (16:02 -0500)]
PR27413: use bsdtar to unpack deb-related formats
dpkg-deb has been reported to be fragile when running under
debuginfod, whereas bsdtar (libarchive) is happy with all these
flavors of files. Switch to a bsdtar based pipeline, now
equipped with an escaped glob pattern that adapts to a variety
of interior data.tar* compression formats.
No testsuite impact. .ipk format tested with some random openwrt and
kino-extension binaries found on the net. Some of these are built
with out buildid, and hardly any with debuginfo, but whatever, bsdtar
and elfutils extract whatever info is there.
Signed-off-by: Frank Ch. Eigler <fche@redhat.com> Signed-off-by: Dorinda Bassey <dorindabassey@gmail.com>
Commit eb922a1b8f3a ("tests: use ${CC} instead of 'gcc' in tests")
exports ${CC} into the test environment, but doesn't quote the
value for the assignment. That doesn't work properly if the value
contains whitespace. In a multilib/biarch environment however, it's
common to set CC="gcc -m32" or similar. That causes tests to print
error messages: "/bin/sh: line 2: -m32: command not found".
Fix that by adding quotes around all make variables (not just $CC)
used in setting up TESTS_ENVIRONMENT.
Signed-off-by: Alexander Miller <alex.miller@gmx.de>
A couple of closely related pieces of work allow more early warning
about low storage/memory conditions:
- New prometheus metrics to track filesystem freespace, and more
details about some errors.
- Frequent checking of $TMPDIR freespace, to trigger fdcache
emergency flushes.
- Switch to floating point prometheus metrics, to communicate
fractions - and short time intervals - accurately.
- Fix startup-time pthread-creation error handling.
Testing is smoke-test-level only as it is hard to create
free-space-limited $TMPDIRs. Locally tested against tiny through
medium tmpfs filesystems, with or without sqlite db also there. Shows
a pleasant stream of diagnostics and metrics during shortage but
generally does not fail outright. However, catching an actual
libstdc++- or kernel-level OOM is beyond our ken.
PR27323 debuginfod: improve query concurrency with grooming
Start using a second sqlite3 database connection for webapi query
servicing. This allows much better concurrency when long-running
grooming operations are in progress.
No testsuite impact. Grooming times are too short to try to hit with
concurrent requests. OTOH the existing tests did show some
interesting regressions that needed fixing, like needing not to
dual-wield db and dbq when doing rpm-dwz-related lookups from during
scanning, and the way in which corrupted databases are reported.
These needed some automated invocations of gdb on the running
debuginfod binaries that just failed their testing, for in-situ
debugging.
Hand-tested for function on a huge 20GB index file. Allowed webapi
queries to be run throughout random points of the grooming process,
including especially the long count(*) report loops before & after.
Érico Rolim [Tue, 2 Feb 2021 00:16:56 +0000 (21:16 -0300)]
libdwfl: use GNU strerror_r only when available.
Some C libraries don't provide the GNU version of strerror_r, only the
XSI-compliant one. We use the GNU version when available, since it fits
the code better, and otherwise use the XSI-compliant one.
To better support cross-compilation Gentoo provides a way
to configure system without 'gcc' binary and only provide
tool-prefixed tools, like 'x86_64-pc-linux-gnu-gcc'.
The packages are built as ./configure --host=x86_64-pc-linux-gnu.
In https://bugs.gentoo.org/718872 Agostino Sarubbo found
a few test failures that use hardcoded 'gcc' instead of
expected ${CC}. The change propagates detected ${CC} at
configure time to test scripts.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Timm Bäder [Fri, 8 Jan 2021 08:04:49 +0000 (09:04 +0100)]
strip: Remove no_symtab_updates() function
The no_symtab_updates() function was being called at the beginning of
all case labels in this switch, so we can just call it once before the
switch. Then it only has one call-site, so inline this short function
there.
Timm Bäder [Fri, 8 Jan 2021 08:04:47 +0000 (09:04 +0100)]
strip: Pull relocate() info file scope
Pull relocate() info file scope and get rid of a nested function this
way. Refactor remove_debug_relocations() to minimize the parameters we
need to pass to relocate().
Mark Wielaard [Tue, 12 Jan 2021 10:35:10 +0000 (11:35 +0100)]
elflint: Recognize SHF_GNU_RETAIN as extra section flag.
SHF_GNU_RETAIN is like SHF_LINK_ORDER it can appear on any section
and should be ignored by elflint. Add all such flags to a new
EXTRA_SHFLAGS and use it consistently in check_sections.
before the change section_flags_string() ignored unknown section
flags: snprintf() did write numeric value into buffer, but
"*cp = '\0'" negated the effect.
The change advances the 'cp' pointer'.
While at it add a '|' separator between known and unknown flags.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Érico Rolim [Wed, 16 Dec 2020 22:30:12 +0000 (19:30 -0300)]
src/readelf: use qsort instead of qsort_r.
This program is single threaded, so using qsort with a global variable
isn't a danger. The interface for qsort_r isn't standardized (and
diverges between glibc and FreeBSD, for example), which makes usage of
qsort, where possible, preferrable.
Dmitry V. Levin [Sun, 20 Dec 2020 08:00:00 +0000 (08:00 +0000)]
Split the top level .gitignore file
Move subdirectory parts of the top level .gitignore into appropriate
subdirectories. This would be consistent with ChangeLog files,
currently one has to update the top level ChangeLog file when
the top level .gitignore file is changed in a way that affects
a specific subdirectory only.