From: Willy Tarreau <w@1wt.eu>
Date: Thu, 15 Jul 2021 14:02:47 +0000 (+0200)
Subject: MEDIUM: atomic: relax the load/store barriers on x86_64
X-Git-Tag: v2.5-dev3~2
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=99198546f656ea440a03a18efb3308b4d11e35ba;p=thirdparty%2Fhaproxy.git

MEDIUM: atomic: relax the load/store barriers on x86_64

The x86-tso model makes the load and store barriers unneeded for our
usage as long as they perform at least a compiler barrier: the CPU
will respect store ordering and store vs load ordering. It's thus
safe to remove the lfence and sfence which are normally needed only
to communicate with external devices. Let's keep the mfence though,
to make sure that reads of same memory location after writes report
the value from memory and not the one snooped from the write buffer
for too long.

An in-depth review of all use cases tends to indicate that this is
okay in the rest of the code. Some parts could be cleaned up to
use atomic stores and atomic loads instead of explicit barriers
though.

Doing this reliably increases the overall performance by about 2-2.5%
on a 8c-16t Xeon thanks to less frequent flushes (it's likely that the
biggest gain is in the MT lists which use them a lot, and that this
results in less cache line flushes).
---

diff --git a/include/haproxy/atomic.h b/include/haproxy/atomic.h
index 70422f4c29..3198b381a9 100644
--- a/include/haproxy/atomic.h
+++ b/include/haproxy/atomic.h
@@ -530,13 +530,13 @@
 static __inline void
 __ha_barrier_load(void)
 {
-	__asm __volatile("lfence" ::: "memory");
+	__asm __volatile("" ::: "memory");
 }
 
 static __inline void
 __ha_barrier_store(void)
 {
-	__asm __volatile("sfence" ::: "memory");
+	__asm __volatile("" ::: "memory");
 }
 
 static __inline void