libstdc++: Use __libc_single_threaded for locale initialization
Most initialization of locales and facets happens before main() during
startup, when the program is likely to only have one thread. By using
the new __gnu_cxx::__is_single_threaded() function instead of checking
__gthread_active_p() we can avoid using pthread_once or atomics for the
common case.
That said, I'm not sure why we don't just use a local static variable
instead, as __cxa_guard_acquire() already optimizes for the
single-threaded case: