[3.9] gh-135661: Fix parsing start and end tags in HTMLParser according to the HTML5 standard (GH-135930) (GH-136268) (#136293)
* Whitespaces no longer accepted between `</` and the tag name.
E.g. `</ script>` does not end the script section.
* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
as whitespaces. The only whitespaces are `\t\n\r\f `.
* Null character (U+0000) no longer ends the tag name.
* Attributes and slashes after the tag name in end tags are now ignored,
instead of terminating after the first `>` in quoted attribute value.
E.g. `</script/foo=">"/>`.
* Multiple slashes and whitespaces between the last attribute and closing `>`
are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.
* Multiple `=` between attribute name and value are no longer collapsed.
E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".
* Whitespaces between the `=` separator and attribute name or value are no
longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
"=bar", both with value None; `<a foo= bar>` produces two attributes:
"foo" with value "" and "bar" with value None.
* Fix data loss after unclosed script or style tag (gh-86155).
Also backport test.support.subTests() (gh-135120).
[3.9] gh-135462: Fix quadratic complexity in processing special input in HTMLParser (GH-135464) (GH-135486)
End-of-file errors are now handled according to the HTML5 specs --
comments and declarations are automatically closed, tags are ignored.
(cherry picked from commit 6eb6c5dbfb528bd07d77b60fd71fd05d81d45c41)
[3.9] gh-123409: fix `IPv6Address.reverse_pointer` for IPv4-mapped addresses (GH-123419) (GH-135085)
Fix functionality that was broken with better textual representation for IPv4-mapped addresses (gh-87799)
(cherry picked from commit 77a2fb4bf1a1b160d6ce105508288fc77f636943)
Co-authored-by: Seth Michael Larson <seth@python.org> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
[3.9] gh-87799: Improve the textual representation of IPv4-mapped IPv6 addresses (GH-29345) (GH-135078)
Represent IPv4-mapped IPv6 address as x:x:x:x:x:x:d.d.d.d,
where the 'x's are the hexadecimal values
of the six high-order 16-bit pieces of the address,
and the 'd's are the decimal values
of the four low-order 8-bit pieces of the address
(standard IPv4 representation).
[3.9] gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648) (GH-133944) (#134346)
* [3.9] gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648) (GH-133944)
If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().
Brian Schubert [Mon, 2 Jun 2025 15:57:06 +0000 (11:57 -0400)]
[3.9] gh-80222: Fix email address header folding with long quoted-string (GH-122753) (GH-129111) (GH-132371)
Email generators using email.policy.default could incorrectly omit the
quote ('"') characters from a quoted-string during header refolding,
leading to invalid address headers and enabling header spoofing. This
change restores the quote characters on a bare-quoted-string as the
header is refolded, and escapes backslash and quote chars in the string.
Co-authored-by: R. David Murray <rdmurray@bitdance.com> Co-authored-by: Mike Edmunds <medmunds@gmail.com> Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Email generators using email.policy.default may convert an RFC 2047
encoded-word to unencoded form during header refolding. In a structured
header, this could allow 'specials' chars outside a quoted-string,
leading to invalid address headers and enabling spoofing. This change
ensures a parsed encoded-word that contains specials is kept as an
encoded-word while the header is refolded.
[3.9] gh-119511: Fix a potential denial of service in imaplib (GH-119514) (#130248)
The IMAP4 client could consume an arbitrary amount of memory when trying
to connect to a malicious server, because it read a "literal" data with a
single read(size) call, and BufferedReader.read() allocates the bytes
object of the specified size before reading. Now the IMAP4 client reads data
by chunks, therefore the amount of used memory is limited by the
amount of the data actually been sent by the server.
(cherry picked from commit 735f25c5e3a0f74438c86468ec4dfbe219d93c91)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
Fix ThreadedVSOCKSocketStreamTest: if get_cid() returns the host
address or the "any" address, use the local communication address
(loopback): VMADDR_CID_LOCAL.
On Linux 6.9, apparently, the /dev/vsock device is now available but
get_cid() returns VMADDR_CID_ANY (-1).
Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Petr Viktorin <encukou@gmail.com>
[3.9] gh-107262: Update Tkinter tests for Tcl/Tk 8.6.14 (GH-119322) (#130275)
Co-authored-by: James De Bias <81095953+DBJim@users.noreply.github.com> Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Petr Viktorin [Thu, 23 Jan 2025 15:55:08 +0000 (16:55 +0100)]
[3.9] gh-121277: Allow .. versionadded:: next in docs (GH-121278) (#128117)
Make `versionchanged:: next`` expand to current (unreleased) version.
When a new CPython release is cut, the release manager will replace
all such occurences of "next" with the just-released version.
(See the issue for release-tools and devguide PRs.)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
(cherry picked from commit 7d24ea9db3e8fdca52058629c9ba577aba3d8e5c)
gh-121277: Raise nice error on `next` as second argument to deprecated-removed (GH-124623)
[3.9] gh-95588: Drop the safety claim from `ast.literal_eval` docs. (GH-95919) (GH-126729)
It was never really safe and this claim conflicts directly with the big warning in the docs about it being able to crash the interpreter.
(cherry picked from commit 8baef8ae367041a5cfefb40b19c7b87e9bcb56a2)
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Petr Viktorin [Tue, 12 Nov 2024 09:26:31 +0000 (10:26 +0100)]
[3.9] gh-113027: Fix test_variable_tzname in test_email (GH-113821) (GH-126477)
Determine the support of the Kyiv timezone by checking the result of
astimezone() which uses the system tz database and not the one
populated by zoneinfo.
Petr Viktorin [Fri, 6 Sep 2024 11:13:54 +0000 (13:13 +0200)]
[3.9] [CVE-2023-27043] gh-102988: Reject malformed addresses in email.parseaddr() (GH-111116) (#123769)
Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.
[3.9] gh-112275: Fix HEAD_LOCK deadlock in child process after fork (GH-112336) (#123688)
HEAD_LOCK is called from _PyEval_ReInitThreads->_PyThreadState_DeleteExcept before _PyRuntimeState_ReInitThreads reinit runtime->interpreters.mutex which might be locked before fork.
Co-authored-by: Seth Michael Larson <seth@python.org> Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru> Co-authored-by: Gregory P. Smith <greg@krypto.org>
[3.9] gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233) (#122610)
Per RFC 2047:
> [...] these encoding schemes allow the
> encoding of arbitrary octet values, mail readers that implement this
> decoding should also ensure that display of the decoded data on the
> recipient's terminal will not cause unwanted side-effects
It seems that the "quoted-word" scheme is a valid way to include
a newline character in a header value, just like we already allow
undecodable bytes or control characters.
They do need to be properly quoted when serialized to text, though.
This should fail for custom fold() implementations that aren't careful
about newlines.
[3.9] gh-122133: Authenticate socket connection for `socket.socketpair()` fallback (GH-122134) (#122428)
Authenticate socket connection for `socket.socketpair()` fallback when the platform does not have a native `socketpair` C API. We authenticate in-process using `getsocketname` and `getpeername` (thanks to Nathaniel J Smith for that suggestion).
The _private_networks variables, used by various is_private
implementations, were missing some ranges and at the same time had
overly strict ranges (where there are more specific ranges considered
globally reachable by the IANA registries).
This patch updates the ranges with what was missing or otherwise
incorrect.
100.64.0.0/10 is left alone, for now, as it's been made special in [1].
The _address_exclude_many() call returns 8 networks for IPv4, 121
networks for IPv6.
In 3.10 and below, is_private checks whether the network and broadcast
address are both private.
In later versions (where the test wss backported from), it checks
whether they both are in the same private network.
For 0.0.0.0/0, both 0.0.0.0 and 255.225.255.255 are private,
but one is in 0.0.0.0/8 ("This network") and the other in
255.255.255.255/32 ("Limited broadcast").
[3.9] gh-117187: Fix XML tests for vanilla Expat <2.6.0 (GH-117203) (GH-117247)
This fixes XML unittest fallout from the https://github.com/python/cpython/issues/115398 security fix. When configured using `--with-system-expat` on systems with older pre 2.6.0 versions of libexpat, our unittests were failing.
Use of a proxy is intended to defer DNS for the hosts to the proxy itself, rather than a potential for information leak of the host doing DNS resolution itself for any reason. Proxy bypass lists are strictly name based. Most implementations of proxy support agree.
(cherry picked from commit c43b26d02eaa103756c250e8d36829d388c5f3be)
Co-authored-by: Weii Wang <weii.wang@canonical.com>
[3.9] Fix tests for XMLPullParser with Expat 2.6.0 (GH-115133) (GH-115535)
Feeding the parser by too small chunks defers parsing to prevent
CVE-2023-52425. Future versions of Expat may be more reactive.
(cherry picked from commit 4a08e7b3431cd32a0daf22a33421cd3035343dc4)
[3.9] gh-109858: Protect zipfile from "quoted-overlap" zipbomb (GH-110016) (GH-113915)
Raise BadZipFile when try to read an entry that overlaps with other entry or
central directory.
(cherry picked from commit 66363b9a7b9fe7c99eba3a185b74c5fdbf842eba)
[3.9] bpo-37013: Fix the error handling in socket.if_indextoname() (GH-13503) (GH-112600)
* Fix a crash when pass UINT_MAX.
* Fix an integer overflow on 64-bit non-Windows platforms.
(cherry picked from commit 0daf555c6fb3feba77989382135a58215e1d70a5)
Output with two wheels:
```
❯ GITHUB_ACTIONS=true ./Tools/build/verify_ensurepip_wheels.py
::error file=/Volumes/RAMDisk/cpython/Lib/ensurepip/_bundled/pip-22.0.4-py3-none-any.whl::Found more than one wheel for package pip.
::error file=/Volumes/RAMDisk/cpython/Lib/ensurepip/_bundled/pip-23.2.1-py3-none-any.whl::Found more than one wheel for package pip.
```
Output without wheels:
```
❯ GITHUB_ACTIONS=true ./Tools/build/verify_ensurepip_wheels.py
::error file=::Could not find a pip wheel on disk.
```
(cherry picked from commit f8a047941f2e4a1848700c21d58a08c9ec6a9c68)
Łukasz Langa [Thu, 24 Aug 2023 10:09:11 +0000 (12:09 +0200)]
[3.9] gh-108342: Make ssl TestPreHandshakeClose more reliable (GH-108370) (#108407)
* In preauth tests of test_ssl, explicitly break reference cycles
invoving SingleConnectionTestServerThread to make sure that the
thread is deleted. Otherwise, the test marks the environment as
altered because the threading module sees a "dangling thread"
(SingleConnectionTestServerThread). This test leak was introduced
by the test added for the fix of issue gh-108310.
* Use support.SHORT_TIMEOUT instead of hardcoded 1.0 or 2.0 seconds
timeout.
* SingleConnectionTestServerThread.run() catchs TimeoutError
* Fix a race condition (missing synchronization) in
test_preauth_data_to_tls_client(): the server now waits until the
client connect() completed in call_after_accept().
* test_https_client_non_tls_response_ignored() calls server.join()
explicitly.
* Replace "localhost" with server.listener.getsockname()[0].
(cherry picked from commit 592bacb6fc0833336c0453e818e9b95016e9fd47)
Co-authored-by: Victor Stinner <vstinner@python.org>
[3.9] gh-108342: Break ref cycle in SSLSocket._create() exc (GH-108344) (#108351)
Explicitly break a reference cycle when SSLSocket._create() raises an
exception. Clear the variable storing the exception, since the
exception traceback contains the variables and so creates a reference
cycle.