git.ipfire.org Git - thirdparty/binutils-gdb.git/commit

libctf: archive: format v2

This commit does a bunch of things, all tangled together tightly enough that
disentangling them seemed no to be worth doing.

The biggest is a new archive format, v2, identified by a magic number which
is one higher than the v1 format's magic number.  As usual with libctf we
can only write out the new format, but can still read the old one.

The new format has multiple improvements over the old:

- It is written native-endian and aggressively endian-swapped at open time,
   just like CTF and BTF dicts; format v1 was little-endian, necessitating
   byteswapping all over the place at read and write time rather than
   localized in one pair of functions at read time.

- The modent array of name-offset -> archive-offset mappings for the CTF
   archives is explicitly pointed at via a new ctfa_modents header member
   rather than just starting after the end of the header.

- The length that prepends each archive member actually indicates its
   length rather than always being sizeof (uint64_t) bytes too high (this
   was an outright bug)

- There is a new shared properties table which in future we may be able to
   use to unify common values from the constituent CTF headers, reducing the
   size overhead of these (repeated, uncompressed) entities.  Right now it
   only contains one value, parent_name, which is the parent dict name if
   one is common across all dicts in the archive (always true for any
   archives derived from ctf_link()).  This is used to let
   ctf_archive_next() et al reliably open dicts in the archive even if they
   are child BTF dicts (which do not contain a header name).

   The properties table shares its property names with the CTF members,
   and uses the same format (and shared code) for the property values as for
   CTF archive members: length-prepended.  The archive members and
   name->value table ("modents") use distinct tables for properties and CTF
   dicts, to ensure they are spatially separated in the file, to maximize
   compressibility if we end up with a lot of properties and people compress
   the whole thing.

We can also restrict various old bug-workaround kludges that only apply to
dicts found in v1 archives: in particular, we needed to dig out the preamble
of some CTF dicts without opening them to figure out whether they used the
.dynstr or .strtab sections: this whole bug workaround is now unnecessary
for v2 and above.

There are other changes for readability and consistency:

- The archive wrapper data structure, known outside ctf-archive.c as
   ctf_archive_t, is now consistently referred to inside ctf-archive.c as
   'struct ctf_archive_internal' and given the parameter name 'arci' rather
   than sometimes using ctf_archive_t and sometimes using 'wrapper' or 'arc'
   as parameter names.  The archive itself is always called 'struct
   ctf_archive' to emphasise that it is *not* a ctf_archive_t.
   ctf_archive_t remains the public typedef: the fact that it's not actually
   the same thing as the archive file format is an internal implementation
   detail.

- We keep the archive header around in a new ctfi_hdr member, distinct
   from the actual archive itself, to make upgrading from v1 and cross-
   endianness support easier.  The archive itself is now kept as a char *
   and used only to root pointer arithmetic.

author	Nick Alcock <nick.alcock@oracle.com>
	Wed, 28 May 2025 12:42:11 +0000 (13:42 +0100)
committer	Nick Alcock <nick.alcock@oracle.com>
	Wed, 28 May 2025 14:11:37 +0000 (15:11 +0100)
commit	16e0dd9aab73d2ca270c26783cea85e1fdd95256
tree	9f59f8e6645f3d0092744da1cb77be9ef88c30fa	tree
parent	4bdc7aed0373a384f0526db3f7da11efd08d7bf7	commit \| diff

include/ctf.h		diff \| blob \| blame \| history
libctf/ctf-archive.c		diff \| blob \| blame \| history
libctf/ctf-impl.h		diff \| blob \| blame \| history
libctf/ctf-link.c		diff \| blob \| blame \| history
libctf/ctf-open-bfd.c		diff \| blob \| blame \| history