Tim Peters [Wed, 22 May 2024 23:25:08 +0000 (18:25 -0500)]
gh-119105: difflib.py Differ.compare is too slow [for degenerate cases] (#119376)
Track all pairs achieving the best ratio in Differ(). This repairs the "very deep recursion and cubic time" bad cases in a way that preserves previous output.
Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer: avoid the creation of a temporary string.
Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().
Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().
Michael Vincent [Wed, 22 May 2024 17:59:47 +0000 (12:59 -0500)]
gh-117505: Run ensurepip in isolated env in Windows installer (GH-118257)
ensurepip forks a subprocess to run pip itself, but that subprocess only inherits a -I isolated mode flag (see _run_pip() in Lib/ensurepip/__init__.py), not the "-E -s" flags that the installer has been using. This means that parts of ensurepip don't actually run in an isolated environment and can make incorrect decisions based on packages installed in the user site-packages.
gh-119247: Add macros to use PySequence_Fast safely in free-threaded build (#119315)
Add `Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST` and
`Py_END_CRITICAL_SECTION_SEQUENCE_FAST` macros and update `str.join` to use
them. Also add a regression test that would crash reliably without this
patch.
Geoffrey Thomas [Wed, 22 May 2024 16:35:18 +0000 (12:35 -0400)]
Remove almost all unpaired backticks in docstrings (#119231)
As reported in #117847 and #115366, an unpaired backtick in a docstring
tends to confuse e.g. Sphinx running on subclasses of standard library
objects, and the typographic style of using a backtick as an opening
quote is no longer in favor. Convert almost all uses of the form
The variable `foo' should do xyz
to
The variable 'foo' should do xyz
and also fix up miscellaneous other unpaired backticks (extraneous /
missing characters).
No functional change is intended here other than in human-readable
docstrings.
Eric Snow [Wed, 22 May 2024 15:57:52 +0000 (11:57 -0400)]
gh-119213: Be More Careful About _PyArg_Parser.kwtuple Across Interpreters (gh-119331)
_PyArg_Parser holds static global data generated for modules by Argument Clinic. The _PyArg_Parser.kwtuple field is a tuple object, even though it's stored within a static global. In some cases the tuple is statically allocated and thus it's okay that it gets shared by multiple interpreters. However, in other cases the tuple is set lazily, allocated from the heap using the active interprepreter at the point the tuple is needed.
This is a problem once that interpreter is destroyed since _PyArg_Parser.kwtuple becomes at dangling pointer, leading to crashes. It isn't a problem if the tuple is allocated under the main interpreter, since its lifetime is bound to the lifetime of the runtime. The solution here is to temporarily switch to the main interpreter. The alternative would be to always statically allocate the tuple.
This change also fixes a bug where only the most recent parser was added to the global linked list.
Batuhan Taskaya [Wed, 22 May 2024 01:39:26 +0000 (04:39 +0300)]
gh-60191: Implement ast.compare (#19211)
* bpo-15987: Implement ast.compare
Add a compare() function that compares two ASTs for structural equality. There are two set of attributes on AST node objects, fields and attributes. The fields are always compared, since they represent the actual structure of the code. The attributes can be optionally be included in the comparison. Attributes capture things like line numbers of column offsets, so comparing them involves test whether the layout of the program text is the same. Since whitespace seems inessential for comparing ASTs, the default is to compare fields but not attributes.
ASTs are just Python objects that can be modified in arbitrary ways. The API for ASTs is under-specified in the presence of user modifications to objects. The comparison respects modifications to fields and attributes, and to _fields and _attributes attributes. A user could create obviously malformed objects, and the code will probably fail with an AttributeError when that happens. (For example, adding "spam" to _fields but not adding a "spam" attribute to the object.)
Co-authored-by: Jeremy Hylton <jeremy@alum.mit.edu>
The PEP 649 implementation will require a way to load NotImplementedError
from the bytecode. @markshannon suggested implementing this by converting
LOAD_ASSERTION_ERROR into a more general mechanism for loading constants.
This PR adds this new opcode. I will work on the rest of the implementation
of the PEP separately.
Josh Cannon [Tue, 21 May 2024 19:37:32 +0000 (14:37 -0500)]
gh-90562: Mention slots pitfall in dataclass docs (#107391)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> Co-authored-by: Erlend E. Aasland <erlend@python.org> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Alyssa Coghlan [Tue, 21 May 2024 03:32:15 +0000 (13:32 +1000)]
gh-74929: PEP 667 general docs update (gh-119201)
* expand on What's New entry for PEP 667 (including porting notes)
* define 'optimized scope' as a glossary term
* cover comprehensions and generator expressions in locals() docs
* review all mentions of "locals" in documentation (updating if needed)
* review all mentions of "f_locals" in documentation (updating if needed)
Victor Stinner [Mon, 20 May 2024 21:05:39 +0000 (17:05 -0400)]
gh-119050: Add XML support to libregrtest refleak checker (#119148)
regrtest test runner: Add XML support to the refleak checker
(-R option).
* run_unittest() now stores XML elements as string, rather than
objects, in support.junit_xml_list.
* runtest_refleak() now saves/restores XML strings before/after
checking for reference leaks. Save XML into a temporary file.
Toshio Kuratomi [Mon, 20 May 2024 19:10:47 +0000 (12:10 -0700)]
gh-92081: Fix for email.generator.Generator with whitespace between encoded words. (#92281)
* Fix for email.generator.Generator with whitespace between encoded words.
email.generator.Generator currently does not handle whitespace between
encoded words correctly when the encoded words span multiple lines. The
current generator will create an encoded word for each line. If the end
of the line happens to correspond with the end real word in the
plaintext, the generator will place an unencoded space at the start of
the subsequent lines to represent the whitespace between the plaintext
words.
A compliant decoder will strip all the whitespace from between two
encoded words which leads to missing spaces in the round-tripped
output.
The fix for this is to make sure that whitespace between two encoded
words ends up inside of one or the other of the encoded words. This
fix places the space inside of the second encoded word.
A second problem happens with continuation lines. A continuation line that
starts with whitespace and is followed by a non-encoded word is fine because
the newline between such continuation lines is defined as condensing to
a single space character. When the continuation line starts with whitespace
followed by an encoded word, however, the RFCs specify that the word is run
together with the encoded word on the previous line. This is because normal
words are filded on syntactic breaks by encoded words are not.
The solution to this is to add the whitespace to the start of the encoded word
on the continuation line.
Test cases are from #92081
* Rename a variable so it's not confused with the final variable.
Hood Chatham [Mon, 20 May 2024 17:42:15 +0000 (13:42 -0400)]
DOCS: Suggest always calling exec with a globals argument and no locals argument (GH-119235)
Many users think they want a locals argument for various reasons but they do not
understand that it makes code be treated as a class definition. They do not want
their code treated as a class definition and get surprised. The reason not
to pass locals specifically is that the following code raises a `NameError`:
```py
exec("""
def f():
print("hi")
f()
def g():
f()
g()
""", {}, {})
```
The reason not to leave out globals is as follows:
pulkin [Sun, 19 May 2024 21:46:37 +0000 (23:46 +0200)]
gh-119105: difflib: improve recursion for degenerate cases (#119131)
Code from https://github.com/pulkin, in PR
https://github.com/python/cpython/pull/119131
Greatly speeds `Differ` when there are many identically scoring pairs, by splitting the recursion near the inputs' midpoints instead of degenerating (as now) into just peeling off the first two lines.
Jelle Zijlstra [Sun, 19 May 2024 02:15:14 +0000 (22:15 -0400)]
marshal docs: Remove reference to "Sun" (#119161)
Nobody has been using a Sun machine for a long time. When I saw
this sentence in a lightning talk just now, I thought it was talking
about sending Python code on a spacecraft.
Tim Peters [Sun, 19 May 2024 01:54:23 +0000 (20:54 -0500)]
Try to repair oddball test bots timing out in test_int (#119166)
Various test bots (outside the ones GH normally runs) are timing out during test_int after ecd8664 (asymptotically faster str->int). Best guess is that they don't build the C _decimal module. So require that module in the most likely tests to time out then. Flying mostly blind, though!
Asymptotically faster (O(n log n)) str->int for very large strings, leveraging the faster multiplication scheme in the C-coded `_decimal` when available. This is used instead of the current Karatsuba-limited method starting at 2 million digits.
Lots of opportunity remains for fine-tuning. Good targets include changing BYTELIM, and possibly changing the internal output base (from 256 to a higher number of bytes).
Doing this was substantial work, and many of the new lines are actually comments giving correctness proofs. The obvious approaches sticking to integers were too slow to be useful, so this is doing variable-precision decimal floating-point arithmetic. Much faster, but worst-possible rounding errors have to be wholly accounted for, using as little precision as possible.
Special thanks to Serhiy Storchaka for asking many good questions in his code reviews!