Implement ST_LD_MEM_BARRIER on x86 with a locked xor
Microbenchmarks on modern Intel architectures show that a memory barrier
implemented with locked xor operation performs about 30% better when
compared to a barrier implemented with mfence, while providing the same
memory ordering guarantees. This patch changes the implementation of
ST_LD_MEM_BARRIER on x86 architectures to use the faster, locked xor
operation. Additionally, support for Microsoft's compiler is added.