Tim Peters [Sat, 25 May 2024 03:08:21 +0000 (22:08 -0500)]
gh-119105: Differ.compare is too slow [for degenerate cases] (#119492)
``_fancy_replace()`` is no longer recursive. and a single call does a worst-case linear number of ratio() computations instead of quadratic. This renders toothless a universe of pathological cases. Some inputs may produce different output, but that's rare, and I didn't find a case where the final diff appeared to be of materially worse quality. To the contrary, by refusing to even consider synching on lines "far apart", there was more easy-to-digest locality in the output.
Alyssa Coghlan [Fri, 24 May 2024 14:29:19 +0000 (00:29 +1000)]
GH-119496: accept UTF-8 BOM in .pth files (GH-119503)
`Out-File -Encoding utf8` and similar commands in Windows Powershell 5.1 emit
UTF-8 with a BOM marker, which the regular `utf-8` codec decodes incorrectly.
`utf-8-sig` accepts a BOM, but also works correctly without one.
This change also makes .pth files match the way Python source files are handled.
Fix ThreadedVSOCKSocketStreamTest: if get_cid() returns the host
address or the "any" address, use the local communication address
(loopback): VMADDR_CID_LOCAL.
On Linux 6.9, apparently, the /dev/vsock device is now available but
get_cid() returns VMADDR_CID_ANY (-1).
Brett Simmers [Thu, 23 May 2024 20:59:35 +0000 (16:59 -0400)]
gh-118727: Don't drop the GIL in `drop_gil()` unless the current thread holds it (#118745)
`drop_gil()` assumes that its caller is attached, which means that the current
thread holds the GIL if and only if the GIL is enabled, and the enabled-state
of the GIL won't change. This isn't true, though, because `detach_thread()`
calls `_PyEval_ReleaseLock()` after detaching and
`_PyThreadState_DeleteCurrent()` calls it after removing the current thread
from consideration for stop-the-world requests (effectively detaching it).
Fix this by remembering whether or not a thread acquired the GIL when it last
attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()`
instead of `gil->enabled`.
This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've
reenabled it.
Petr Viktorin [Thu, 23 May 2024 16:01:37 +0000 (18:01 +0200)]
gh-117142: Slightly hacky fix for memory leak of StgInfo (GH-119424)
Add a funciton that inlines PyObject_GetTypeData and skips
type-checking, so it doesn't need access to the CType_Type object.
This will break if the memory layout changes, but should
be an acceptable solution to enable ctypes in subinterpreters in
Python 3.13.
Tim Peters [Wed, 22 May 2024 23:25:08 +0000 (18:25 -0500)]
gh-119105: difflib.py Differ.compare is too slow [for degenerate cases] (#119376)
Track all pairs achieving the best ratio in Differ(). This repairs the "very deep recursion and cubic time" bad cases in a way that preserves previous output.
Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer: avoid the creation of a temporary string.
Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().
Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().
Michael Vincent [Wed, 22 May 2024 17:59:47 +0000 (12:59 -0500)]
gh-117505: Run ensurepip in isolated env in Windows installer (GH-118257)
ensurepip forks a subprocess to run pip itself, but that subprocess only inherits a -I isolated mode flag (see _run_pip() in Lib/ensurepip/__init__.py), not the "-E -s" flags that the installer has been using. This means that parts of ensurepip don't actually run in an isolated environment and can make incorrect decisions based on packages installed in the user site-packages.
gh-119247: Add macros to use PySequence_Fast safely in free-threaded build (#119315)
Add `Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST` and
`Py_END_CRITICAL_SECTION_SEQUENCE_FAST` macros and update `str.join` to use
them. Also add a regression test that would crash reliably without this
patch.
Geoffrey Thomas [Wed, 22 May 2024 16:35:18 +0000 (12:35 -0400)]
Remove almost all unpaired backticks in docstrings (#119231)
As reported in #117847 and #115366, an unpaired backtick in a docstring
tends to confuse e.g. Sphinx running on subclasses of standard library
objects, and the typographic style of using a backtick as an opening
quote is no longer in favor. Convert almost all uses of the form
The variable `foo' should do xyz
to
The variable 'foo' should do xyz
and also fix up miscellaneous other unpaired backticks (extraneous /
missing characters).
No functional change is intended here other than in human-readable
docstrings.
Eric Snow [Wed, 22 May 2024 15:57:52 +0000 (11:57 -0400)]
gh-119213: Be More Careful About _PyArg_Parser.kwtuple Across Interpreters (gh-119331)
_PyArg_Parser holds static global data generated for modules by Argument Clinic. The _PyArg_Parser.kwtuple field is a tuple object, even though it's stored within a static global. In some cases the tuple is statically allocated and thus it's okay that it gets shared by multiple interpreters. However, in other cases the tuple is set lazily, allocated from the heap using the active interprepreter at the point the tuple is needed.
This is a problem once that interpreter is destroyed since _PyArg_Parser.kwtuple becomes at dangling pointer, leading to crashes. It isn't a problem if the tuple is allocated under the main interpreter, since its lifetime is bound to the lifetime of the runtime. The solution here is to temporarily switch to the main interpreter. The alternative would be to always statically allocate the tuple.
This change also fixes a bug where only the most recent parser was added to the global linked list.
Batuhan Taskaya [Wed, 22 May 2024 01:39:26 +0000 (04:39 +0300)]
gh-60191: Implement ast.compare (#19211)
* bpo-15987: Implement ast.compare
Add a compare() function that compares two ASTs for structural equality. There are two set of attributes on AST node objects, fields and attributes. The fields are always compared, since they represent the actual structure of the code. The attributes can be optionally be included in the comparison. Attributes capture things like line numbers of column offsets, so comparing them involves test whether the layout of the program text is the same. Since whitespace seems inessential for comparing ASTs, the default is to compare fields but not attributes.
ASTs are just Python objects that can be modified in arbitrary ways. The API for ASTs is under-specified in the presence of user modifications to objects. The comparison respects modifications to fields and attributes, and to _fields and _attributes attributes. A user could create obviously malformed objects, and the code will probably fail with an AttributeError when that happens. (For example, adding "spam" to _fields but not adding a "spam" attribute to the object.)
Co-authored-by: Jeremy Hylton <jeremy@alum.mit.edu>
The PEP 649 implementation will require a way to load NotImplementedError
from the bytecode. @markshannon suggested implementing this by converting
LOAD_ASSERTION_ERROR into a more general mechanism for loading constants.
This PR adds this new opcode. I will work on the rest of the implementation
of the PEP separately.
Josh Cannon [Tue, 21 May 2024 19:37:32 +0000 (14:37 -0500)]
gh-90562: Mention slots pitfall in dataclass docs (#107391)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> Co-authored-by: Erlend E. Aasland <erlend@python.org> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Alyssa Coghlan [Tue, 21 May 2024 03:32:15 +0000 (13:32 +1000)]
gh-74929: PEP 667 general docs update (gh-119201)
* expand on What's New entry for PEP 667 (including porting notes)
* define 'optimized scope' as a glossary term
* cover comprehensions and generator expressions in locals() docs
* review all mentions of "locals" in documentation (updating if needed)
* review all mentions of "f_locals" in documentation (updating if needed)
Victor Stinner [Mon, 20 May 2024 21:05:39 +0000 (17:05 -0400)]
gh-119050: Add XML support to libregrtest refleak checker (#119148)
regrtest test runner: Add XML support to the refleak checker
(-R option).
* run_unittest() now stores XML elements as string, rather than
objects, in support.junit_xml_list.
* runtest_refleak() now saves/restores XML strings before/after
checking for reference leaks. Save XML into a temporary file.
Toshio Kuratomi [Mon, 20 May 2024 19:10:47 +0000 (12:10 -0700)]
gh-92081: Fix for email.generator.Generator with whitespace between encoded words. (#92281)
* Fix for email.generator.Generator with whitespace between encoded words.
email.generator.Generator currently does not handle whitespace between
encoded words correctly when the encoded words span multiple lines. The
current generator will create an encoded word for each line. If the end
of the line happens to correspond with the end real word in the
plaintext, the generator will place an unencoded space at the start of
the subsequent lines to represent the whitespace between the plaintext
words.
A compliant decoder will strip all the whitespace from between two
encoded words which leads to missing spaces in the round-tripped
output.
The fix for this is to make sure that whitespace between two encoded
words ends up inside of one or the other of the encoded words. This
fix places the space inside of the second encoded word.
A second problem happens with continuation lines. A continuation line that
starts with whitespace and is followed by a non-encoded word is fine because
the newline between such continuation lines is defined as condensing to
a single space character. When the continuation line starts with whitespace
followed by an encoded word, however, the RFCs specify that the word is run
together with the encoded word on the previous line. This is because normal
words are filded on syntactic breaks by encoded words are not.
The solution to this is to add the whitespace to the start of the encoded word
on the continuation line.
Test cases are from #92081
* Rename a variable so it's not confused with the final variable.
Hood Chatham [Mon, 20 May 2024 17:42:15 +0000 (13:42 -0400)]
DOCS: Suggest always calling exec with a globals argument and no locals argument (GH-119235)
Many users think they want a locals argument for various reasons but they do not
understand that it makes code be treated as a class definition. They do not want
their code treated as a class definition and get surprised. The reason not
to pass locals specifically is that the following code raises a `NameError`:
```py
exec("""
def f():
print("hi")
f()
def g():
f()
g()
""", {}, {})
```
The reason not to leave out globals is as follows: