s390/processor: Implement cpu_relax() with cpu serialization
There are many loops in the form of
while (READ_ONCE(*somelocation))
cpu_relax();
Strictly speaking the architecture requires serialization instead of only a
compiler barrier in the loop so the READ_ONCE() will see an updated value.
However real hardware does not require this (see IBM z Systems Processor
Optimization Primer - FAQ [1]), but it is still recommended to add
serialization. Given that cpu_relax() is doing nothing useful, it does
not hurt to add the single and fast instruction which makes sure that
serialization happens, and such loops may be left a bit faster.