git.ipfire.org Git - thirdparty/binutils-gdb.git/commit

libctf: string: refs rework

This commit moves provisional (not-yet-serialized) string refs towards the
scheme to be used for CTF IDs in the future.  In particular

- provisional string offsets now count downwards from just under the
   external string offset space (all bits on but the high bit).  This makes
   it possible to detect an overflowing strtab, and also makes it trivial to
   determine whether any string offset (ref) updates were missed -- where
   before we might get a slightly corrupted or incorrect string, we now get
   a huge high strtab offset corresponding to no string, and an error is
   emitted at read time.

- refs are emitted at serialization time during the pass through the types.
   They are strictly associated with the newly-written-out buffer: the
   existing opened CTF dict is not changed, though it does still get the new
   strtab so that new refs to the same string can just refer directly to it.
   The provisional strtab hash table that contains these strings is not
   deleted after serialization (because we might serialize again): instead,
   we keep track in the parent of the lowest-yet-used ("latest") provisional
   strtab offset, and any strtab offset above that, but not external
   (high-bit-on) is considered provisional.

   This is sort-of-enforced by moving most of the ref-addition function
   declarations (including ctf_str_add_ref) to a new ctf-ref.h, which is
   not included by ctf-create.c or ctf-open.c.

- because we don't add refs when adding types, we don't need to handle the
   case where we add things to expanding vlens (enums, struct members) and
   have to realloc() them.  So the entire painful movable refs system can
   just be deleted, along with the ability to remove refs piecemeal at all
   (purging all of them is still possible).  Strings added during type
   addition are added via ctf_str_add(), which adds no refs: the strings are
   picked up at serialization time and refs to their final, serialized
   resting place added.  The DTDs never have any refs in them, and their
   provisional strtab offsets are never updated by the ref system.

This caused several bugs to fall out of the earlier work and get fixed.
In particular, attempts to look up a string in a child dict now search
the parent's provisional strtab too: we add some extra special casing
for the null string so we don't need to worry about deduplication
moving it somewhere other than offset zero.

Finally, the optimization that removes an unreferenced synthetic external
strtab (the record of the strings the linker has told us about, kept around
internally for lookup during late serialization) is faulty: references to a
strtab entry will only produce CTF-level refs if their value might change,
and an external string's offset won't change, so it produces no refs: worse
yet, even if we did get a ref (say, if the string was originally believed
to be internal and only later were we told that the linker knew about it
too), when we serialize a strtab, all its refs are dropped (since they've
been updated and can no longer change); so if we serialized it a second
time, its synthetic external strtab would be considered empty and dropped,
even though the same external strings as before still exist, referencing
it.  We must keep the synthetic external strtab around as long as external
strings exist that reference it, i.e. for the life of the dict.

One benefit of all this: now we're emitting provisional string offsets at
a really high value, it's out of the way of the consecutive, deduplicated
string offsets in child dicts.  So we can drop the constraint that you
cannot add strings to a dict with children, which allows us to add types
freely to parent dicts again.  What you can't do is write that dict out
again: when we serialize, we currently update the dict being serialized
with the updated strtabs: when you write a dict out, its provisional
strings become real strings, and suddenly the offsets would overlap once
more.  But opening a dict and its children, adding to it, and then
writing it out again is rare indeed, and we have a workaround: anyone
wanting to do this can just use ctf_link instead.

author	Nick Alcock <nick.alcock@oracle.com>
	Fri, 7 Feb 2025 17:06:36 +0000 (17:06 +0000)
committer	Nick Alcock <nick.alcock@oracle.com>
	Fri, 28 Feb 2025 15:13:24 +0000 (15:13 +0000)
commit	a480362d88405301e28fefed895e390507354cae
tree	8de58886b6587eccf91e76572f3581ced2850382	tree
parent	97a72b2a35dbd218bc61f39cc5fd7b4d18c3126b	commit \| diff

libctf/ctf-create.c		diff \| blob \| blame \| history
libctf/ctf-impl.h		diff \| blob \| blame \| history
libctf/ctf-link.c		diff \| blob \| blame \| history
libctf/ctf-open.c		diff \| blob \| blame \| history
libctf/ctf-ref.h	[new file with mode: 0644]	blob
libctf/ctf-serialize.c		diff \| blob \| blame \| history
libctf/ctf-string.c		diff \| blob \| blame \| history
libctf/ctf-util.c		diff \| blob \| blame \| history
libctf/testsuite/libctf-writable/error-propagation.c		diff \| blob \| blame \| history
libctf/testsuite/libctf-writable/reserialize-strtab-corruption.c		diff \| blob \| blame \| history
libctf/testsuite/libctf-writable/reserialize-strtab-corruption.lk		diff \| blob \| blame \| history