git.ipfire.org Git - thirdparty/Python/cpython.git/commit

author	Gregory P. Smith <68491+gpshead@users.noreply.github.com>
	Fri, 2 Jan 2026 06:03:05 +0000 (22:03 -0800)
committer	GitHub <noreply@github.com>
	Fri, 2 Jan 2026 06:03:05 +0000 (22:03 -0800)
commit	61fc72a4a431cbfd42f22e2af76177c73431c3e6
tree	7085b1c323b2ea9b4a5bf64f211308e35f458279	tree \| snapshot
parent	6b9a6c6ec3bbc9795df67b87340e2ea58f42b3d4	commit \| diff

gh-124951: Optimize base64 encode & decode for an easy 2-3x speedup [no SIMD] (GH-143262)

Optimize base64 encoding/decoding by eliminating loop-carried dependencies. Key changes:
- Add `base64_encode_trio()` and `base64_decode_quad()` helper functions that process complete groups independently
- Add `base64_encode_fast()` and `base64_decode_fast()` wrappers
- Update `b2a_base64` and `a2b_base64` to use fast path for complete groups

Performance gains (encode/decode speedup vs main, PGO builds):
```
             64 bytes    64K        1M
  Zen2:      1.2x/1.8x   1.7x/2.8x  1.5x/2.8x
  Zen4:      1.2x/1.7x   1.6x/3.0x  1.5x/3.0x  [old data, likely faster]
  M4:        1.3x/1.9x   2.3x/2.8x  2.4x/2.9x  [old data, likely faster]
  RPi5-32:   1.2x/1.2x   2.4x/2.4x  2.0x/2.1x
```

Based on my exploratory work done in https://github.com/python/cpython/compare/main...gpshead:cpython:claude/vectorize-base64-c-S7Hku

See PR and issue for further thoughts on sometimes MUCH faster SIMD vectorized versions of this.

Doc/whatsnew/3.15.rst		diff \| blob \| blame \| history
Misc/NEWS.d/next/Library/2025-12-29-00-42-26.gh-issue-124951.OsC5K4.rst	[new file with mode: 0644]	blob
Modules/binascii.c		diff \| blob \| blame \| history