These are a bit trickier than previous things. Datasecs are unusual: the
content they contain for a given variable is conceptually part of that
variable, in that a variable can only appear in one datasec: so if two TUs
have different datasec values for a variable, you'll want to emit two
conflicting variables with different datasec entries. Equally, if they
have entries in different datasecs, they're conflicting. But the *index*
of a variable in a datasec has nothing to do with the variable: it's just
a property of how many other variables are in the datasec.
So we turn the type graph upside down for them. We track the variable ->
datasec mappings for every variable we are dedupping, and use this to hash
variables with datasec entries *twice*: firstly, as purely variable type,
name, and promoted-to-non-extern linkage, and secondly with all of that plus
the datasec name, offset and size: we indicate that the non-extern hash
*replaces* the extern one, and use this later on. The datasec itself is not
hashed at all! We skip it at both hashing and emission time (without
breaking anything else, because nothing points at datasecs, so nothing will
ever recurse down into one).
The popcount code (used to find the "most popular" type, the one to put in
the shared dict) changes to say that replaced types (extern vars) popcounts
are added to the counts of the types that replace them (the corresponding
non-extern vars).
At emission time, replaced variables (extern variables) are skipped,
ensuring that extern vars with non-conflicting non-extern counterparts are
skipped in favour of the non-extern ones. ctf_add_section_variable then
takes care of emitting both the var and its corresponding datasec for us.