For device (agent) scope atomics - as needed when there is more than one teams,
a buffer_wbl2 followed by s_waitcnt is required. When doing the initial porting,
the pre-atomic instruction got accidentally replaced by buffer_inv sc1, which is
not quite the right instruction.