From: Willy Tarreau Date: Thu, 15 Jul 2021 14:02:47 +0000 (+0200) Subject: MEDIUM: atomic: relax the load/store barriers on x86_64 X-Git-Tag: v2.5-dev3~2 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=99198546f656ea440a03a18efb3308b4d11e35ba;p=thirdparty%2Fhaproxy.git MEDIUM: atomic: relax the load/store barriers on x86_64 The x86-tso model makes the load and store barriers unneeded for our usage as long as they perform at least a compiler barrier: the CPU will respect store ordering and store vs load ordering. It's thus safe to remove the lfence and sfence which are normally needed only to communicate with external devices. Let's keep the mfence though, to make sure that reads of same memory location after writes report the value from memory and not the one snooped from the write buffer for too long. An in-depth review of all use cases tends to indicate that this is okay in the rest of the code. Some parts could be cleaned up to use atomic stores and atomic loads instead of explicit barriers though. Doing this reliably increases the overall performance by about 2-2.5% on a 8c-16t Xeon thanks to less frequent flushes (it's likely that the biggest gain is in the MT lists which use them a lot, and that this results in less cache line flushes). --- diff --git a/include/haproxy/atomic.h b/include/haproxy/atomic.h index 70422f4c29..3198b381a9 100644 --- a/include/haproxy/atomic.h +++ b/include/haproxy/atomic.h @@ -530,13 +530,13 @@ static __inline void __ha_barrier_load(void) { - __asm __volatile("lfence" ::: "memory"); + __asm __volatile("" ::: "memory"); } static __inline void __ha_barrier_store(void) { - __asm __volatile("sfence" ::: "memory"); + __asm __volatile("" ::: "memory"); } static __inline void