]> git.ipfire.org Git - thirdparty/Python/cpython.git/commitdiff
gh-150717: Avoid mark-array allocation for groupless regex patterns (GH-150719)
authorBernát Gábor <gaborjbernat@gmail.com>
Tue, 2 Jun 2026 07:45:30 +0000 (00:45 -0700)
committerGitHub <noreply@github.com>
Tue, 2 Jun 2026 07:45:30 +0000 (10:45 +0300)
state_init() always did PyMem_New(state->mark, groups*2), which for a
pattern with no capturing groups is PyMem_Malloc(0) -- a real allocation
(plus matching free) on every match/search/fullmatch call, for an array
that is never read: groupless patterns emit no MARK opcodes and group 0's
span is taken from state->start/ptr.

Guard the allocation with `if (pattern->groups)`. state->mark stays NULL
(set by the preceding memset), and both the error path and state_fini
already PyMem_Free(NULL) safely.

Misc/NEWS.d/next/Library/2026-06-01-08-12-34.gh-issue-150717.LVRJXH.rst [new file with mode: 0644]
Modules/_sre/sre.c

diff --git a/Misc/NEWS.d/next/Library/2026-06-01-08-12-34.gh-issue-150717.LVRJXH.rst b/Misc/NEWS.d/next/Library/2026-06-01-08-12-34.gh-issue-150717.LVRJXH.rst
new file mode 100644 (file)
index 0000000..da7171f
--- /dev/null
@@ -0,0 +1,2 @@
+Avoid an unnecessary per-call memory allocation when matching :mod:`re`
+patterns that have no capturing groups. Patch by Bernát Gábor.
index 7a07ed1d7aca20cecae6ba03e706032f6dcf23ee..058a03148c823fbf7f22cf2d9232f53e43ee5b3b 100644 (file)
@@ -548,10 +548,16 @@ state_init(SRE_STATE* state, PatternObject* pattern, PyObject* string,
 
     memset(state, 0, sizeof(SRE_STATE));
 
-    state->mark = PyMem_New(const void *, pattern->groups * 2);
-    if (!state->mark) {
-        PyErr_NoMemory();
-        goto err;
+    /* Patterns with no capturing groups never emit MARK opcodes and never
+       read state->mark (group 0's span comes from state->start/ptr), so skip
+       the allocation entirely -- state->mark stays NULL, which both the err
+       path and state_fini already free safely. */
+    if (pattern->groups) {
+        state->mark = PyMem_New(const void *, pattern->groups * 2);
+        if (!state->mark) {
+            PyErr_NoMemory();
+            goto err;
+        }
     }
     state->lastmark = -1;
     state->lastindex = -1;