--
* Character class escapes (``\d``, ``\D``, ``\s``, ``\S``, ``\w`` and ``\W``)
- outside a character set are now compiled to a single ``CATEGORY`` opcode
- instead of being wrapped in an ``IN`` block. This speeds up matching of
- patterns such as ``\d+`` and reduces the size of the compiled byte code.
- (Contributed by Serhiy Storchaka in :gh:`152033`.)
+ outside a character set, and character sets containing a single such escape
+ (such as ``[\d]`` or ``[^\s]``), are now compiled to a single ``CATEGORY``
+ opcode instead of being wrapped in an ``IN`` block. This speeds up matching
+ of patterns such as ``\d+`` and reduces the size of the compiled byte code.
+ (Contributed by Serhiy Storchaka in :gh:`152033` and Pieter Eendebak in
+ :gh:`152056`.)
module_name
-----------
subpatternappend((NOT_LITERAL, set[0][1]))
else:
subpatternappend(set[0])
+ elif _len(set) == 1 and set[0][0] is CATEGORY:
+ # optimization: a lone category like [\d] or [^\d]
+ if negate:
+ subpatternappend((CATEGORY, CH_NEGATE[set[0][1]]))
+ else:
+ subpatternappend(set[0])
else:
if negate:
set.insert(0, (NEGATE, None))
--- /dev/null
+Optimize matching of a character set that contains a single character
+category, such as ``[\d]`` or ``[^\s]``: it is now compiled to a single
+``CATEGORY`` opcode, the same as the corresponding ``\d`` or ``\S`` escape,
+instead of being wrapped in an ``IN`` block. This speeds up matching and
+reduces the size of the compiled byte code. Patch by Pieter Eendebak.