gh-144157: Optimize bytes.translate() by deferring change detection (GH-144158)
Optimize bytes.translate() by deferring change detection
Move the equality check out of the hot loop to allow better compiler
optimization. Instead of checking each byte during translation, perform
a single memcmp at the end to determine if the input can be returned
unchanged.
This allows compilers to unroll and pipeline the loops, resulting in ~2x
throughput improvement for medium-to-large inputs (tested on an AMD zen2).
No change observed on small inputs.
It will also be faster for bytes subclasses as those do not need change
detection.