Barney Gale [Fri, 28 Feb 2025 20:33:51 +0000 (20:33 +0000)]
GH-116380: Speed up `glob.[i]glob()` by making fewer system calls. (#116392)
## Filtered recursive walk
Expanding a recursive `**` segment entails walking the entire directory
tree, and so any subsequent pattern segments (except special segments) can
be evaluated by filtering the expanded paths through a regex. For example,
`glob.glob("foo/**/*.py", recursive=True)` recursively walks `foo/` with
`os.scandir()`, and then filters paths through a regex based on "`**/*.py`,
with no further filesystem access needed.
This fixes an issue where `glob()` could return duplicate results.
## Tracking path existence
We store a flag alongside each path indicating whether the path is
guaranteed to exist. As we process the pattern:
- Certain special pattern segments (`""`, `"."` and `".."`) leave the flag
unchanged
- Literal pattern segments (e.g. `foo/bar`) set the flag to false
- Wildcard pattern segments (e.g. `*/*.py`) set the flag to true (because
children are found via `os.scandir()`)
- Recursive pattern segments (e.g. `**`) leave the flag unchanged for the
root path, and set it to true for descendants discovered via
`os.scandir()`.
If the flag is false at the end, we call `lstat()` on each path to filter
out missing paths.
## Minor speed-ups
- Exclude paths that don't match a non-terminal non-recursive wildcard
pattern _prior_ to calling `is_dir()`.
- Use a stack rather than recursion to implement recursive wildcards.
- This fixes a recursion error when globbing deep trees.
- Pre-compile regular expressions and pre-join literal pattern segments.
- Convert to/from `bytes` (a minor use-case) in `iglob()` rather than
supporting `bytes` throughout. This particularly simplifies the code
needed to handle relative bytes paths with `dir_fd`.
- Avoid calling `os.path.join()`; instead we keep paths in a normalized
form and append trailing slashes when needed.
- Avoid calling `os.path.normcase()`; instead we use case-insensitive regex
matching.
## Implementation notes
Much of this functionality is already present in pathlib's implementation
of globbing. The specific additions we make are:
1. Support for `dir_fd`
2. Support for `include_hidden`
3. Support for generating paths relative to `root_dir`
This unifies the implementations of globbing in the `glob` and `pathlib`
modules.
Sam Gross [Fri, 28 Feb 2025 14:27:51 +0000 (09:27 -0500)]
gh-124878: Add temporary TSAN suppression for free_threadstate (gh-130602)
The race condition with `free_threadstate` and daemon threads exists in
both the free threading and default builds. We were missing a
suppression in the default build.
Postpone <stdbool.h> inclusion after Python.h (#130641)
Remove inclusions prior to Python.h.
<stdbool.h> will cause <features.h> to be included before Python.h can
define some macros to enable some additional features, causing multiple
types not to be defined down the line.
gh-129200: Add locking to the iOS testbed startup sequence. (#130564)
Add a lock to ensure that only one iOS testbed per user can start at a time, so
that the simulator discovery process doesn't collide between instances.
Sam Gross [Thu, 27 Feb 2025 18:57:19 +0000 (13:57 -0500)]
gh-130091: Reorder `_PyThreadState_Attach` to avoid data race (gh-130092)
This moves `tstate_activate()` down to avoid a data race in the free
threading build on the `_PyRuntime`'s thread-local `autoTSSkey`. This
key is deleted during runtime finalization, which may happen
concurrently with a call to `_PyThreadState_Attach`.
The earlier `tstate_try/wait_attach` ensures that the thread is blocked
before it attempts to access the deleted `autoTSSkey`.
This fixes a TSAN reported data race in
`test_threading.test_import_from_another_thread`.
Sam Gross [Thu, 27 Feb 2025 13:27:54 +0000 (08:27 -0500)]
gh-130421: Fix data race on timebase initialization (gh-130592)
Windows and macOS require precomputing a "timebase" in order to convert
OS timestamps into nanoseconds. Retrieve and compute this value during
runtime initialization to avoid data races when accessing the time.
Fredrik Ahlberg [Thu, 27 Feb 2025 12:51:47 +0000 (13:51 +0100)]
gh-129288: Add optional l2_cid and l2_bdaddr_type in BTPROTO_L2CAP socket address tuple (#129293)
Add two optional, traling elements in the AF_BLUETOOTH socket address tuple:
- l2_cid, to allow e.g raw LE ATT connections
- l2_bdaddr_type. To be able to connect L2CAP sockets to Bluetooth LE devices,
the l2_bdaddr_type must be set to BDADDR_LE_PUBLIC or BDADDR_LE_RANDOM.
Sam Gross [Wed, 26 Feb 2025 21:36:53 +0000 (16:36 -0500)]
gh-130605: Temporarily disable test_concurrent_futures in TSAN CI job (gh-130606)
There are a number of data races in the default build without
suppressions that are exposed by this test. Disable the test for now
under TSAN until we have suppressions or fix the data races.
Barney Gale [Wed, 26 Feb 2025 21:07:27 +0000 (21:07 +0000)]
GH-125413: Add private `pathlib.Path` method to write metadata (#130238)
Replace `WritablePath._copy_writer` with a new `_write_info()` method. This
method allows the target of a `copy()` to preserve metadata.
Replace `pathlib._os.CopyWriter` and `LocalCopyWriter` classes with new
`copy_file()` and `copy_info()` functions. The `copy_file()` function uses
`source_path.info` wherever possible to save on `stat()`s.
Sam Gross [Wed, 26 Feb 2025 19:55:15 +0000 (14:55 -0500)]
gh-130519: Fix crash in QSBR when destructor reenters QSBR (gh-130553)
The `free_work_item()` function in QSBR may call arbitrary code via
Python object destructors, which may reenter the QSBR code. Reorder
the processing of work items to be robust to reentrancy.
Also fix the TODO for the out of memory situation.
Petr Viktorin [Wed, 26 Feb 2025 14:42:39 +0000 (15:42 +0100)]
gh-128982: Revert "#128982: Substitute regular expression in http.cookiejar.join_header_words for an efficient alternative (GH-128983)" and add tests (GH-130584)
* Revert "gh-128982: Substitute regular expression in `http.cookiejar.join_header_words` for an efficient alternative (GH-128983)"
Neil Schemenauer [Wed, 26 Feb 2025 05:24:20 +0000 (21:24 -0800)]
gh-117657: Use an atomic store to set type flags. (gh-127588)
The `PyType_HasFeature()` function reads the flags with a relaxed atomic
load and without holding the type lock. To avoid data races, use atomic
stores if `PyType_Ready()` has already been called.
Serhiy Storchaka [Tue, 25 Feb 2025 21:04:27 +0000 (23:04 +0200)]
gh-130163: Fix crashes related to PySys_GetObject() (GH-130503)
The use of PySys_GetObject() and _PySys_GetAttr(), which return a borrowed
reference, has been replaced by using one of the following functions, which
return a strong reference and distinguish a missing attribute from an error:
_PySys_GetOptionalAttr(), _PySys_GetOptionalAttrString(),
_PySys_GetRequiredAttr(), and _PySys_GetRequiredAttrString().
Sam Gross [Tue, 25 Feb 2025 17:03:28 +0000 (12:03 -0500)]
gh-130202: Fix bug in `_PyObject_ResurrectEnd` in free threaded build (gh-130281)
This fixes a fairly subtle bug involving finalizers and resurrection in
debug free threaded builds: if `_PyObject_ResurrectEnd` returns `1`
(i.e., the object was resurrected by a finalizer), it's not safe to
access the object because it might still be deallocated. For example:
* The finalizer may have exposed the object to another thread. That
thread may hold the last reference and concurrently deallocate it any
time after `_PyObject_ResurrectEnd()` returns `1`.
* `_PyObject_ResurrectEnd()` may call `_Py_brc_queue_object()`, which
may internally deallocate the object immediately if the owning thread
is dead.
Therefore, it's important not to access the object after it's
resurrected. We only violate this in two cases, and only in debug
builds:
* We assert that the object is tracked appropriately. This is now moved
up betewen the finalizer and the `_PyObject_ResurrectEnd()` call.
* The `--with-trace-refs` builds may need to remember the object if
it's resurrected. This is now handled by `_PyObject_ResurrectStart()`
and `_PyObject_ResurrectEnd()`.
Note that `--with-trace-refs` is currently disabled in `--disable-gil`
builds because the refchain hash table isn't thread-safe, but this
refactoring avoids an additional thread-safety issue.
Sam Gross [Tue, 25 Feb 2025 15:33:04 +0000 (10:33 -0500)]
gh-129824: Temporarily skip InterpreterPoolMixin tests under TSAN (gh-129826)
There are multiple data races reported when running the
InterpreterPoolMixin tests, but it's still useful to run the other
test_concurrent_futures tests under TSAN.
Add test_concurrent_futures to the TSAN test suite.
Bénédikt Tran [Tue, 25 Feb 2025 10:44:59 +0000 (11:44 +0100)]
gh-111178: fix UBSan failures in `Objects/typeobject.c` (#129799)
Fix UBSan failures for `PyTypeObject`.
Introduce a macro cast for `superobject` and remove redundant casts.
Rename the unused parameter in getter/setter methods to `closure`
for semantic purposes.
Barney Gale [Mon, 24 Feb 2025 19:10:50 +0000 (19:10 +0000)]
GH-125413: Fix stale metadata from `pathlib.Path.copy()` and `move()` (#130424)
In `pathlib.Path.copy()` and `move()`, return a fresh `Path` object with an
unpopulated `info` attribute, rather than a `Path` object with information
recorded *prior* to the path's creation.
Bénédikt Tran [Mon, 24 Feb 2025 12:38:18 +0000 (13:38 +0100)]
gh-111178: fix UBSan failures in `Modules/selectmodule.c` (GH-129792)
Fix some UBSan failures for `pollObject`, `devpollObject`, `pyEpoll_Object` as well as
for `kqueue_event_Object`, `kqueue_queue_Object` and `kqueue_tracking_after_fork`.
Suppress unused return values.
Rename the unused parameter in `METH_NOARGS` and getter/setter methods to
`dummy` and `closure` respectively for semantic purposes.
Explicitly declare `_select_exec` as a `static` function.
Kanishk Pachauri [Mon, 24 Feb 2025 02:02:34 +0000 (07:32 +0530)]
gh-130160: use `.. program::` directive for documenting `idle` CLI (#130278)
--------- Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu> Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
mingyu [Sun, 23 Feb 2025 20:07:33 +0000 (05:07 +0900)]
gh-129948: Add `set()` to `multiprocessing.managers.SyncManager` (#129949)
The SyncManager provided support for various data structures such as dict, list, and queue, but oddly, not set.
This introduces support for set by defining SetProxy and registering it with SyncManager.
--- Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
Y5 [Sun, 23 Feb 2025 19:30:33 +0000 (03:30 +0800)]
gh-124096: Enable REPL virtual terminal support on Windows (#124119)
To support virtual terminal mode in Windows PYREPL, we need a scanner
to read over the supported escaped VT sequences.
Windows REPL input was using virtual key mode, which does not support
terminal escape sequences. This patch calls `SetConsoleMode` properly
when initializing and send sequences to enable bracketed-paste modes
to support verbatim copy-and-paste.
Signed-off-by: y5c4l3 <y5c4l3@proton.me> Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com> Co-authored-by: Dustin L. Howett <dustin@howett.net> Co-authored-by: wheeheee <104880306+wheeheee@users.noreply.github.com>
Bénédikt Tran [Sun, 23 Feb 2025 10:34:11 +0000 (11:34 +0100)]
gh-111178: fix UBSan failures in `Modules/_struct.c` (#129793)
Fix some UBSan failures for `PyStructObject` and `unpackiterobject`.
We also perform some cleanup by suppressing unused return values and renaming the
unused parameter in `METH_NOARGS` and getter methods to `dummy` and `closure`
respectively for semantic purposes.