Tests have shown that AMD systems really need to use a cpu_relax()
in these two loops. The performance improves from 10.03 to 10.56M
messages per second (+5%) on a 128-thread system, without affecting
intel nor ARM, so let's do this.
break;
}
#endif
- __ha_cpu_relax_for_read();
+ __ha_cpu_relax();
}
/* Here we own the tail. We can go on if we're still the leader,
*/
do {
next_cell = HA_ATOMIC_LOAD(&cell.next);
- } while (next_cell != &cell && __ha_cpu_relax_for_read());
+ } while (next_cell != &cell && __ha_cpu_relax());
/* OK our message was queued. Retrieving the sent size in the ring cell
* allows another leader thread to zero it if it finally couldn't send