CAS instruction is expensive. From the x86 CPU's point of view, getting
a cache line for writing is more expensive than reading. See Appendix
A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line exclusive and cause
excessive cache line bouncing.
Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock
loop if compare may fail to reduce cache line bouncing on contended locks.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit
d672a98a1af106bd68deb15576710cd61363f7a6)
#define FORCE_ELISION(m, s)
#endif
+#ifndef LLL_MUTEX_READ_LOCK
+# define LLL_MUTEX_READ_LOCK(mutex) \
+ atomic_load_relaxed (&(mutex)->__data.__lock)
+#endif
+
static int __pthread_mutex_lock_full (pthread_mutex_t *mutex)
__attribute_noinline__;
break;
}
atomic_spin_nop ();
+ if (LLL_MUTEX_READ_LOCK (mutex) != 0)
+ continue;
}
while (LLL_MUTEX_TRYLOCK (mutex) != 0);