Add ix86_output_lcp_stall_peephole to generate LCP stall peepholes with
the previous scratch register:
1. Scan backward for the previous scratch register definition with
the same immediate operand in the same basic block.
2. The previous scratch register is unusable if it is set between the
previous scratch register definition and the current instruction.
3. If a usable previous scratch register is found, ignore the allocated
scratch register and use the previous scratch register. Otherwise, use
the allocated scratch register.
so that the same scratch register can be reused if possible:
1. When bootstrapping GCC 16 with only C and C++ enabled, this optimization
triggers 54 times. No regressions.
2. When building glibc 2.44, this optimization triggers 33 times. No
regressions.
3. When building Linux kernel 7.1.1, this optimization triggers 2099 times.
Kernel boots correctly.