]> git.ipfire.org Git - thirdparty/Python/cpython.git/commit
gh-95555: Support Unicode property escapes \p{...} in regular expressions (GH-151969)
authorSerhiy Storchaka <storchaka@gmail.com>
Fri, 26 Jun 2026 04:33:33 +0000 (07:33 +0300)
committerGitHub <noreply@github.com>
Fri, 26 Jun 2026 04:33:33 +0000 (07:33 +0300)
commit794b42ff8a75614898c98c56ab87090e9804c369
tree4f6f67bf2c35673fde2883d1451eb33c9aa00d54
parent908f438e198a753d40d1166b5f8725e650a9ed6e
gh-95555: Support Unicode property escapes \p{...} in regular expressions (GH-151969)

Add support for \p{property} and \P{property} escapes in Unicode (str)
regular expressions, for the properties the engine can resolve without
the unicodedata database.  They are matched as CATEGORY opcodes or as
fixed sets of character ranges.

Supported in this change: many General_Category values (the groups L, N,
Z, C and the values Lu, Lt, Lm, Nd, Nl, No, Zs, Zl, Zp, Cc, Cf, Cs, Co
and Cn); the binary properties Alphabetic, Lowercase, Uppercase, Numeric,
Printable, XID_Start, XID_Continue, Cased and Case_Ignorable; the POSIX
compatibility classes; the code-point classes ASCII, Any, Assigned,
Noncharacter_Code_Point, Join_Control, Pattern_Syntax and
Pattern_White_Space; and Regional_Indicator, ASCII_Hex_Digit and
Hex_Digit.

Property and value names use loose matching (UAX #44 UAX44-LM3), so a
property may be spelled \p{Lu}, \p{gc=Lu} or \p{name=yes}.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Doc/library/re.rst
Doc/whatsnew/3.16.rst
Lib/re/_constants.py
Lib/re/_parser.py
Lib/re/_properties.py [new file with mode: 0644]
Lib/test/test_re.py
Misc/NEWS.d/next/Library/2026-06-22-12-00-00.gh-issue-95555.Pr0p18.rst [new file with mode: 0644]
Modules/_sre/sre.c
Modules/_sre/sre_constants.h
Tools/unicode/makeunicodedata.py