The liburcu rcu_cmpxchg_pointer() uses CMM_RELAXED ordering on the CAS
failure path. When a thread loses the CAS and gets another thread's
pointer back, reading fields through that pointer is a data race on
weakly-ordered architectures (ARM, POWER) because the failing load has
no acquire semantics.
Override rcu_cmpxchg_pointer() and rcu_xchg_pointer() to use standard
__atomic builtins with __ATOMIC_ACQ_REL (success) and __ATOMIC_ACQUIRE
(failure) ordering. This fixes the race on all architectures and is
natively visible to ThreadSanitizer.