]> git.ipfire.org Git - thirdparty/Python/cpython.git/commit
gh-150889: Improve performance of unicodedata.normalize() (GH-150890)
authorPieter Eendebak <pieter.eendebak@gmail.com>
Sat, 6 Jun 2026 08:34:33 +0000 (10:34 +0200)
committerGitHub <noreply@github.com>
Sat, 6 Jun 2026 08:34:33 +0000 (11:34 +0300)
commit97dea30914a39bbfbe38ab0e31367309ba98ed22
tree49edb73553c0dd1ff3edcb278aa326ec3c9ad877
parent2452449b32a768cc088a110fd95390acb5e27f83
gh-150889:  Improve performance of unicodedata.normalize() (GH-150890)

Scan the nfc_first/nfc_last reindex tables comparing only .start, range-check
the candidate once, and terminate on a sentinel above every codepoint, so each
entry costs a single comparison. ~2x faster on non-Latin and combining-heavy
NFC/NFKC input; no new data tables.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Misc/NEWS.d/next/Library/2026-06-04-10-44-36.gh-issue-150889.UYNLR_.rst [new file with mode: 0644]
Modules/unicodedata.c
Modules/unicodedata_db.h
Tools/unicode/makeunicodedata.py