I noticed that the "pretend language" handling in the DWARF reader
doesn't work as intended; the problem code in dwarf2_per_cu::set_lang
is:
if (unit_type () == DW_UT_partial)
return;
The issue here is that this subverts the very purpose of having a
"pretend" language.
Some background: when Jakub wrote dwz, we also added support for this
style of DWARF compression to gdb. Now, dwz only shares DIEs in a
"top level" way -- i.e., at the time (and as far as I know, continuing
to today), it would not emit a DW_TAG_imported_unit inside a
namespace. So, when implementing this we also implemented an
optimization, namely that gdb would not re-read every imported unit a
la '#include', but instead would make symtabs for each included unit
(partial units didn't yet exist).
However, an imported/partial unit might not have a language -- but a
language is necessary for interpreting the DIEs. This is where the
"pretend" language comes from. When reading a CU, any included
partial units that do not have a language of their own will inherit
that CU's language.
This patch started by removing the DW_UT_partial check. This of
course caused assertion failures in some modes, as set_lang also
asserts that the language cannot change. But, it's possible for a CU
to be prepared multiple times, and for different invocations to
provide different languages.
This is not a scenario we allowed for in the early days. Nowadays,
though, it seems to me that it's basically fine in practice, with the
reason being that sharing DIEs that differ semantically but not
syntactically across different languages is hard to achieve.
We do see this some cross-language sharing in a limited way -- "dwz
-5" will emit inclusions from both C and C++ CUs for the
gdb.fortran/mixed-lang-stack.exp test -- but note that this sharing is
limited to things that are common between C and C++, like "float".
Therefore this patch replaces the assertions in set_lang with some
compare-exchanges.
Finally I changed cutu_reader to use a std::optional for the pretend
language. I think this makes it more clear what is happening. And,
while doing this I found a spot in the cooked indexer where
language_minimal was passed in, but where the importing CU's language
should have been used.
I regression tested this on x86-64 Fedora 40 using the default board,
plus the cc-with-gdb-index, cc-with-debug-names, and cc-with-dwz-5
boards.