]> git.ipfire.org Git - thirdparty/Python/cpython.git/commit
gh-149079: Fix O(n^2) canonical ordering in unicodedata.normalize() (GH-149080)
authorSeth Larson <seth@python.org>
Tue, 2 Jun 2026 09:39:50 +0000 (02:39 -0700)
committerGitHub <noreply@github.com>
Tue, 2 Jun 2026 09:39:50 +0000 (11:39 +0200)
commit991224b1e8311c85f198f6dd8208bf8cff7fc26f
tree51c3870eaa58ca20ef77c78de93437ae4d869f62
parentc52d2b16ddda3995f0f935b1a3815f1aac498da6
gh-149079: Fix O(n^2) canonical ordering in unicodedata.normalize() (GH-149080)

Replace the insertion sort used for canonical ordering of combining
characters with a hybrid approach: insertion sort for short runs (< 20)
and counting sort for longer runs, reducing worst-case complexity from
O(n^2) to O(n). This prevents denial of service via crafted Unicode
strings with many combining characters in alternating CCC order.

Co-authored-by: ch4n3-yoon <ch4n3.yoon@gmail.com>
Co-authored-by: Seokchan Yoon <13852925+ch4n3-yoon@users.noreply.github.com>
Co-authored-by: Stan Ulbrych <stan@python.org>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Maurycy Pawłowski-Wieroński <maurycy@maurycy.com>
Lib/test/test_unicodedata.py
Misc/NEWS.d/next/Security/2026-04-27-16-36-11.gh-issue-149079.vKl-LM.rst [new file with mode: 0644]
Modules/unicodedata.c