]> git.ipfire.org Git - thirdparty/Python/cpython.git/commit
gh-124951: Optimize base64 encode & decode for an easy 2-3x speedup [no SIMD] (GH...
authorGregory P. Smith <68491+gpshead@users.noreply.github.com>
Fri, 2 Jan 2026 06:03:05 +0000 (22:03 -0800)
committerGitHub <noreply@github.com>
Fri, 2 Jan 2026 06:03:05 +0000 (22:03 -0800)
commit61fc72a4a431cbfd42f22e2af76177c73431c3e6
tree7085b1c323b2ea9b4a5bf64f211308e35f458279
parent6b9a6c6ec3bbc9795df67b87340e2ea58f42b3d4
gh-124951: Optimize base64 encode & decode for an easy 2-3x speedup [no SIMD] (GH-143262)

Optimize base64 encoding/decoding by eliminating loop-carried dependencies. Key changes:
- Add `base64_encode_trio()` and `base64_decode_quad()` helper functions that process complete groups independently
- Add `base64_encode_fast()` and `base64_decode_fast()` wrappers
- Update `b2a_base64` and `a2b_base64` to use fast path for complete groups

Performance gains (encode/decode speedup vs main, PGO builds):
```
             64 bytes    64K        1M
  Zen2:      1.2x/1.8x   1.7x/2.8x  1.5x/2.8x
  Zen4:      1.2x/1.7x   1.6x/3.0x  1.5x/3.0x  [old data, likely faster]
  M4:        1.3x/1.9x   2.3x/2.8x  2.4x/2.9x  [old data, likely faster]
  RPi5-32:   1.2x/1.2x   2.4x/2.4x  2.0x/2.1x
```

Based on my exploratory work done in https://github.com/python/cpython/compare/main...gpshead:cpython:claude/vectorize-base64-c-S7Hku

See PR and issue for further thoughts on sometimes MUCH faster SIMD vectorized versions of this.
Doc/whatsnew/3.15.rst
Misc/NEWS.d/next/Library/2025-12-29-00-42-26.gh-issue-124951.OsC5K4.rst [new file with mode: 0644]
Modules/binascii.c