gh-96143: Allow Linux perf profiler to see Python calls (GH-96123)
:warning: :warning: Note for reviewers, hackers and fellow systems/low-level/compiler engineers :warning: :warning:
If you have a lot of experience with this kind of shenanigans and want to improve the **first** version, **please make a PR against my branch** or **reach out by email** or **suggest code changes directly on GitHub**.
If you have any **refinements or optimizations** please, wait until the first version is merged before starting hacking or proposing those so we can keep this PR productive.
Christian Heimes [Tue, 30 Aug 2022 05:56:26 +0000 (07:56 +0200)]
gh-95853: Improve WASM build script (GH-96389)
- pre-build Emscripten ports and system libraries
- check for broken EMSDK versions
- use EMSDK's node for wasm32-emscripten
- warn when PKG_CONFIG_PATH is set
- add support level information
Victor Stinner [Mon, 29 Aug 2022 12:55:46 +0000 (14:55 +0200)]
Fix Py_INCREF() statistics in limited C API 3.10 (#96120)
In the limited C API with a debug build, Py_INCREF() is implemented
by calling _Py_IncRef() which calls Py_INCREF(). Only call
_Py_INCREF_STAT_INC() once.
Petr Viktorin [Mon, 29 Aug 2022 11:10:52 +0000 (13:10 +0200)]
gh-90814: Correct NEWS wording re. optional C11 features (GH-96309)
The previous wording of this entry suggests that CPython
won't work if optional compiler features are enabled.
That's not the case. The change is that we require C11 rather
than C89.
Note that PEP 7 does say "Python 3.11 and newer versions use C11
without optional features." It is correct there: that's
not a guide for users who compile Python, but for CPython devs
who must avoid the features.
TW [Sun, 28 Aug 2022 21:27:42 +0000 (23:27 +0200)]
gh-69142: add %:z strftime format code (gh-95983)
datetime.isoformat generates the tzoffset with colons, but there
was no format code to make strftime output the same format.
for simplicity and consistency the %:z formatting behaves mostly
as %z, with the exception of adding colons. this includes the
dynamic behaviour of adding seconds and microseconds only when
needed (when not 0).
this fixes the still open "generate" part of this issue:
Pankaj Pandey [Fri, 26 Aug 2022 14:20:48 +0000 (10:20 -0400)]
bpo-33587: inspect.getsource: reorder stat on file in linecache (GH-6805)
* inspect.getsource: avoid stat on file in linecache
The check for os.path.exists() on source file is postponed in
inspect.getsourcefile() until needed avoiding an expensive filesystem
stat call and PEP 302 module loader check is moved last for performance
since it is an uncommon case.
Eric Snow [Thu, 25 Aug 2022 21:46:08 +0000 (15:46 -0600)]
gh-90110: Update the c-analyzer Tool (gh-96255)
Here we automatically ignore uses of _PyArg_Parser, "kwlist" arrays, and module/type defs. That way new uses don't trigger false positives in the c-analyzer check script.
gh-96272: Replace `test_source_encoding`'s `test_pep263` with `test_import_encoded_module` from `test_imp` (GH-96275)
Editors don't agree that `test_source_encoding.py` was valid koi8-r, making it
hard to edit that file without the editor breaking it in some way (see gh-96272).
Only one test actually relied on the koi8-r encoding and it was a duplicate of a
test from the deprecated `imp` module's `test_imp`, so here we replace
`test_pep263` with `test_import_encoded_module` stolen from `test_imp` and
set `test_source_encoding.py`'s encoding to utf-8 to make editing it easier
going forward.
Serhiy Storchaka [Wed, 24 Aug 2022 12:07:20 +0000 (15:07 +0300)]
gh-96021: Explicitly close the IsolatedAsyncioTestCase runner in tests (GH-96135)
Tests for IsolatedAsyncioTestCase.debug() rely on the runner be closed
in __del__. It makes tests depending on the GC an unreliable on other
implementations. It is better to close the runner explicitly even if
currently there is no a public API for this.
Christian Heimes [Fri, 19 Aug 2022 06:36:12 +0000 (08:36 +0200)]
gh-95853: WASM: better version and asset handling in scripts (GH-96045)
- support EMSDK tot-upstream and git releases
- allow WASM assents for wasm64-emscripten and WASI. This makes single file distributions on WASI easier.
- decouple WASM assets from browser builds
Christian Heimes [Fri, 19 Aug 2022 06:08:43 +0000 (08:08 +0200)]
gh-96017: Fix some compiler warnings (GH-96018)
- "comparison of integers of different signs" in typeobject.c
- only define static_builtin_index_is_set in DEBUG builds
- only define recreate_gil with ifdef HAVE_FORK
gh-90536: Add support for the BOLT post-link binary optimizer (gh-95908)
* Add support for the BOLT post-link binary optimizer
Using [bolt](https://github.com/llvm/llvm-project/tree/main/bolt)
provides a fairly large speedup without any code or functionality
changes. It provides roughly a 1% speedup on pyperformance, and a
4% improvement on the Pyston web macrobenchmarks.
It is gated behind an `--enable-bolt` configure arg because not all
toolchains and environments are supported. It has been tested on a
Linux x86_64 toolchain, using llvm-bolt built from the LLVM 14.0.6
sources (their binary distribution of this version did not include bolt).
Compared to [a previous attempt](https://github.com/faster-cpython/ideas/issues/224),
this commit uses bolt's preferred "instrumentation" approach, as well as adds some non-PIE
flags which enable much better optimizations from bolt.
The effects of this change are a bit more dependent on CPU microarchitecture
than other changes, since it optimizes i-cache behavior which seems
to be a bit more variable between architectures. The 1%/4% numbers
were collected on an Intel Skylake CPU, and on an AMD Zen 3 CPU I
got a slightly larger speedup (2%/4%), and on a c6i.xlarge EC2 instance
I got a slightly lower speedup (1%/3%).
The low speedup on pyperformance is not entirely unexpected, because
BOLT improves i-cache behavior, and the benchmarks in the pyperformance
suite are small and tend to fit in i-cache.
This change uses the existing pgo profiling task (`python -m test --pgo`),
though I was able to measure about a 1% macrobenchmark improvement by
using the macrobenchmarks as the training task. I personally think that
both the PGO and BOLT tasks should be updated to use macrobenchmarks,
but for the sake of splitting up the work this PR uses the existing pgo task.
* Simplify the build flags
* Add a NEWS entry
* Update Makefile.pre.in
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Update configure.ac
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>
* Add myself to ACKS
* Add docs
* Other review comments
* fix tab/space issue
* Make it more clear that --enable-bolt is experimental
* Add link to bolt's github page
Co-authored-by: Dong-hee Na <donghee.na92@gmail.com>