After the perf trampoline assembly was split into per-architecture files,
the macOS universal2 build failed at the lipo step:
fatal error: lipo: Python/asm_trampoline_aarch64.o and
Python/asm_trampoline_x86_64.o have the same architectures (x86_64)
and can't be in the same fat output file
PY_CORE_CFLAGS on universal2 contains "-arch arm64 -arch x86_64", so each
.S file was assembled into a fat .o containing both slices (with one slice
empty because of the #ifdef guards). lipo then refused to merge two fat
objects that share architectures.
Compile each per-arch object with a single -arch flag before merging.