]> git.ipfire.org Git - thirdparty/Python/cpython.git/commit
gh-62259: Add support of multi-byte encodings in the XML parser (GH-149860)
authorSerhiy Storchaka <storchaka@gmail.com>
Tue, 26 May 2026 19:40:25 +0000 (22:40 +0300)
committerGitHub <noreply@github.com>
Tue, 26 May 2026 19:40:25 +0000 (19:40 +0000)
commit8ab7b43a14bed4780febbd7586a41cfe459aa6d5
tree88b033b66e06d7405e69dcdf0ef9bc714a7430ea
parenta34edf7446098fc143b101b22ab29b42ea002458
gh-62259: Add support of multi-byte encodings in the XML parser (GH-149860)

Supported encodings: "cp932", "cp949", "cp950", "Big5","EUC-JP",
"GB2312", "GBK", "johab", and "Shift_JIS".

Partially supported encodings (only BMP characters): "Big5-HKSCS",
"EUC_JIS-2004", "EUC_JISX0213", "Shift_JIS-2004", "Shift_JISX0213",
"utf-8-sig" and non-standard aliases like "UTF8" (without hyphen).

The parser now raises ValueError for known unsupported
multi-byte encodings such us "ISO-2022-JP" or "raw-unicode-escape"
instead of failing later, when encounter non-ASCII data.
47 files changed:
Doc/library/pyexpat.rst
Doc/whatsnew/3.16.rst
Include/internal/pycore_codecs.h
Lib/codecs.py
Lib/encodings/big5.py
Lib/encodings/big5hkscs.py
Lib/encodings/cp932.py
Lib/encodings/cp949.py
Lib/encodings/cp950.py
Lib/encodings/euc_jis_2004.py
Lib/encodings/euc_jisx0213.py
Lib/encodings/euc_jp.py
Lib/encodings/euc_kr.py
Lib/encodings/gb18030.py
Lib/encodings/gb2312.py
Lib/encodings/gbk.py
Lib/encodings/hz.py
Lib/encodings/idna.py
Lib/encodings/iso2022_jp.py
Lib/encodings/iso2022_jp_1.py
Lib/encodings/iso2022_jp_2.py
Lib/encodings/iso2022_jp_2004.py
Lib/encodings/iso2022_jp_3.py
Lib/encodings/iso2022_jp_ext.py
Lib/encodings/iso2022_kr.py
Lib/encodings/johab.py
Lib/encodings/punycode.py
Lib/encodings/raw_unicode_escape.py
Lib/encodings/shift_jis.py
Lib/encodings/shift_jis_2004.py
Lib/encodings/shift_jisx0213.py
Lib/encodings/unicode_escape.py
Lib/encodings/utf_16.py
Lib/encodings/utf_16_be.py
Lib/encodings/utf_16_le.py
Lib/encodings/utf_32.py
Lib/encodings/utf_32_be.py
Lib/encodings/utf_32_le.py
Lib/encodings/utf_7.py
Lib/encodings/utf_8.py
Lib/encodings/utf_8_sig.py
Lib/test/test_codecs.py
Lib/test/test_pyexpat.py
Lib/test/test_xml_etree.py
Misc/NEWS.d/next/Library/2026-05-14-17-01-19.gh-issue-62259.ytlFD5.rst [new file with mode: 0644]
Modules/pyexpat.c
Python/codecs.c