Bernát Gábor [Tue, 2 Jun 2026 07:45:30 +0000 (00:45 -0700)]
gh-150717: Avoid mark-array allocation for groupless regex patterns (GH-150719)
state_init() always did PyMem_New(state->mark, groups*2), which for a
pattern with no capturing groups is PyMem_Malloc(0) -- a real allocation
(plus matching free) on every match/search/fullmatch call, for an array
that is never read: groupless patterns emit no MARK opcodes and group 0's
span is taken from state->start/ptr.
Guard the allocation with `if (pattern->groups)`. state->mark stays NULL
(set by the preceding memset), and both the error path and state_fini
already PyMem_Free(NULL) safely.
Barry Warsaw [Tue, 2 Jun 2026 01:43:18 +0000 (18:43 -0700)]
gh-150228: Improve the PEP 829 batch processing APIs (#150542)
* gh-150228: Improve the PEP 829 batch processing APIs
As previously discussed with @ncoghlan and approved for 3.15b2 by @hugovk,
this implements the batch processing APIs for addsitedir() and friends. We
remove the `defer_processing_start_files` flag which required some implicit
module global state, and promote StartupState to the public documented API.
This also moves the bulk of the module global functions into methods of the
`StartupState` class, so it removes the awkward APIs in 3.15b1. Now, instances
of this class are an accumulator for startup state, using `StartupState.process()`
to process them. Callers can now batch up startup state themselves by using
the methods on this class. The module global functions are shims for this
which preserve the legacy APIs and semantics using the new state class.
This PR also fixes the interleaving regression identified by @ncoghlan in the
same issue. Now, .pth file sys.path extensions are added to sys.path after
the sitedir that the .pth file is found in, restoring the legacy behavior.
Along the way, I've made a lot of improvements to function docstrings,
site.rst documentation, and comments in the code explaining what's going on.
* Add a note that if known_paths is provided to StartupState.__init__(), it
will get mutated in place.
* Improve some conditional flows.
* Improve some comments.
* Improve the what's new entry.
* Make test_impl_exec_imports_suppressed_by_matching_start() more robust
Based on PR comment, we need to read both the .pth and .start files, and prove
that the .pth file's import line (which passes a bigger increment) is not
called, but the .start file's entry point (which uses the default increment)
is called.
* As per review, move some methods to the private API
_read_pth_file() and _read_start_file() are not intended to be part of the
public API surface outside of the site module, so even though they are used by
methods outside of the StartupState class, make them privately named.
* Resolve several review feedbacks
* Move a `versionadded`
* Better list comprehension formatting (use the output from
`ruff format --line-length 78`)
* Add docs for site.makepath() and point the case-normalization requirement to
this utility function.
* Note that StartupState.process() is not idempotent.
* Address another feedback comment
This time, we get rid of the legacy implementation `reset` local, which was
always difficult to understand, and just implement a return value based on the
processing mode selected.
* Changes based on gh-150228 review
The comment by @encukou that started this change:
```
I still see two red flags here though: an argument that doesn't combine with
other arguments, and (another instance of) changing the return type based on
an argument.
Did you consider adding a StartupState.addsitedir(sitedir) method, instead of
the startup_state argument?
```
As it turns out, this is an even cleaner design. By moving the bulk of the
previous module global functions into `StartupState` methods, we can get rid
of all the awkward `startup_state` keyword-only arguments which conflict
with `known_path` (Petr's first point). We can also get rid of the
return value dichotomy (Petr's second point) because now we can preserve
exactly the Python 3.14 API in the module global functions, and implement
the better APIs in the class methods. We also generally don't have to
pass around `process_known_sitedirs`.
Now the following module global functions are essentially shims around
class methods:
* site.addsitedir() -> StartupState.addsitedir()
* site.addusersitepackages() -> StartupState.addusersitepackages()
* site.addsitepackages() -> StartupState.addsitepackages()
* Additional minor changes
* Remove a now unused parameter
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
gh-150644: Tag Apple system log messages as public. (#150645)
macOS 26 changed the default visibility of "dynamic" system messages. This
changes the logging strategy to tag all messages as "public" so they are
visible in the system log without special configuration.
* Separate tests for module-level API and for the MimeTypes class.
* Add tests for mimetypes.init() and MimeTypes() with knownfiles and with
explicitly passed files.
Mark Shannon [Mon, 1 Jun 2026 16:56:16 +0000 (17:56 +0100)]
GH-148960: Reduce the size of the debug stencils to less than half. (GH-150551)
For AArch64 linux, reduces the total bytes in the code bodies from 489kb to 218kb.
Reduces the size of the stencils files from 394k lines to 167k lines.
Victor Stinner [Mon, 1 Jun 2026 14:50:15 +0000 (16:50 +0200)]
gh-150436: Skip subprocess test on STATUS_DLL_INIT_FAILED (#150704)
If a subprocess spawned with CREATE_NEW_CONSOLE creation flag fails
with STATUS_DLL_INIT_FAILED return code, skip the test. It's likely a
memory allocation failure in the desktop heap memory which caused the
DLL init failure.
Serhiy Storchaka [Fri, 29 May 2026 21:04:50 +0000 (00:04 +0300)]
gh-149489: Fix ElementTree serialization to HTML (GH-149490)
* The content of comments, processing instructions and elements "xmp",
"iframe", "noembed", "noframes", and "plaintext" is no longer escaped.
* The "plaintext" element no longer have the closing tag.
* Add support of empty attributes (with value None).
gh-139398: [Enum] Add supported sunder names to `__dir__` for REPL completions (GH-139985)
* Add supported sunder names to Enum `__dir__`
This change adds the sunder names `_generate_next_value_`
and `_missing_` to the `__dir__` method of `EnumType` and `Enum`.
In Addition, The instance level sunder names
`_add_alias_` and `_add_value_alias_` are added to `Enum.__dir__`.
With the sunder names exposed in the `dir()` method,
the REPL autocomplete will also show them.
Pradyun Gedam [Thu, 28 May 2026 16:48:51 +0000 (17:48 +0100)]
gh-150046: Fix `test_add_python_opts` to ignore `PYTHON*` env vars (#150089)
Avoid the runtime environment from affecting the tests' behaviours,
which notably checks the warning filters which can be controlled by
various PYTHON environment variables.
Armaan Sandhu [Thu, 28 May 2026 12:38:39 +0000 (18:08 +0530)]
gh-150311: Fix minor issues in configure.ac for Cygwin (#150328)
- Use 'CYGWIN' (uppercase) for ac_sys_system to match the casing used
in all case-pattern references throughout configure.ac.
- Fix LDLIBRARY for static builds: use '.a' extension instead of
'.dll.a' when shared libraries are disabled.
- Replace hardcoded 'gcc' and 'g++' with '$(CC)' and '$(CXX)' in
LDSHARED/LDCXXSHARED for Cygwin.
Co-authored-by: Victor Stinner <vstinner@python.org>
Petr Viktorin [Wed, 27 May 2026 12:32:33 +0000 (14:32 +0200)]
gh-141984: Reword docs on "enclosed" atom grammar (GH-148622)
Reorganize and reword the docs on atoms in parentheses, brackets and braces:
parenthesized groups, list/set/dict/tuple displays, and comprehensions.
(Generator expressions and yield atoms are left for later.)
In the spirit of better matching the underlying grammar, *comprehensions* are
covered separately from non-comprehension displays. Also, parenthesized forms
(with a single expression) and tuple displays are separated.
All sections are rewritten to start with simple cases and build up to the full
formal grammar.
Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
Partially supported encodings (only BMP characters): "Big5-HKSCS",
"EUC_JIS-2004", "EUC_JISX0213", "Shift_JIS-2004", "Shift_JISX0213",
"utf-8-sig" and non-standard aliases like "UTF8" (without hyphen).
The parser now raises ValueError for known unsupported
multi-byte encodings such us "ISO-2022-JP" or "raw-unicode-escape"
instead of failing later, when encounter non-ASCII data.
Clément Péron [Tue, 26 May 2026 15:57:08 +0000 (17:57 +0200)]
gh-148557: Use em-config to locate trampoline clang (#148556)
When CC is wrapped by ccache, the Emscripten trampoline rule cannot derive the
matching clang path by treating CC as a single executable path. Query the active
LLVM toolchain path with em-config instead.
Mark Shannon [Tue, 26 May 2026 14:14:17 +0000 (15:14 +0100)]
GH-126910: Make `_Py_get_machine_stack_pointer` return the actual stack pointer (GH-149103)
* Make _Py_ReachedRecursionLimit inline again
* Remove _Py_MakeRecCheck replacing its use with _Py_ReachedRecursionLimit
* Move the check for C stack swtiching into _Py_CheckRecursiveCall
Victor Stinner [Tue, 26 May 2026 02:39:22 +0000 (04:39 +0200)]
gh-149879: Fix test_math and test_statistics on Cygwin (#150432)
* Skip tests which fail on Cygwin: when Python is linked to
the newlib C library.
* Rename test_random() to test_fma_random().
* Move tests on large integer values from testLog2() to
testLog2Exact().
After the perf trampoline assembly was split into per-architecture files,
the macOS universal2 build failed at the lipo step:
fatal error: lipo: Python/asm_trampoline_aarch64.o and
Python/asm_trampoline_x86_64.o have the same architectures (x86_64)
and can't be in the same fat output file
PY_CORE_CFLAGS on universal2 contains "-arch arm64 -arch x86_64", so each
.S file was assembled into a fat .o containing both slices (with one slice
empty because of the #ifdef guards). lipo then refused to merge two fat
objects that share architectures.
Compile each per-arch object with a single -arch flag before merging.