When generating 16-bit code via -m16, the NOP mcount code generation
emits a 5-byte NOP. However, that is neither a valid i8086 instruction
(long NOPs are PentiumPro+), nor would it get decoded as a 5-byte
instruction. It's a 4-byte 'nopw 0(%si)' followed by a zero byte. The
latter causes the following instruction to get misinterpreted as some
form of ADD.
Fix this by emiting a 3-byte no-op 'lea 0(%si)' instead which makes it
compatible with systems lacking long NOP support.
Add a test for this and change the existing one accordingly.