RFC elf: Fix slow tls access after dlopen [BZ #19924]
In short: __tls_get_addr checks the global generation counter,
_dl_update_slotinfo updates up to the generation of the accessed
module. If the global generation is newer than geneneration of the
module then __tls_get_addr keeps hitting the slow path that updates
the dtv.
Possible approaches i can see:
1. update to global generation instead of module,
2. check the module generation in the fast path.
This patch is 1.: it needs additional sync (load acquire) so the
slotinfo list is up to date with the observed global generation.
Approach 2. would require walking the slotinfo list at all times.
I don't know how to make that fast with many modules.
Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.