gh-152100: Fuse set-operation character classes into a single charset (GH-152214)
Add a compile-time optimization pass (Lib/re/_optimizer.py) that rewrites
set-operation character classes into a single character set where the
engine's charset() representation allows it. charset() treats every NEGATE
as a polarity toggle, so a mid-list NEGATE expresses set difference and a
flat run expresses union.
Set difference -- [A--B], emitted by the parser as A(?<![B]) -- fuses into
the charset [NEGATE] B [NEGATE] A, matching A minus B in one test instead of
a charset match plus a lookbehind rescan. _optimize_charset is made
segment-aware so the interior NEGATE compiles correctly.
A union with a non-flat operand, such as [0-9||[a-z--b]], is emitted by the
parser as a BRANCH that it cannot merge. Once its alternatives are all
one-character matchers, their item lists are concatenated into a single IN.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>