]> git.ipfire.org Git - thirdparty/glibc.git/commit
Improve performance of strstr
authorWilco Dijkstra <wdijkstr@arm.com>
Wed, 12 Jun 2019 10:38:52 +0000 (11:38 +0100)
committerWilco Dijkstra <wdijkstr@arm.com>
Fri, 13 Sep 2019 14:50:02 +0000 (15:50 +0100)
commit373f8b06a3ea763efd36e8b08af7ed7e726de6f8
tree0b15918e843c3327aec88d6407c608f1040cd8ae
parent4ec1b9e91382e66fb9a89361c93f9f560abc67f7
Improve performance of strstr

This patch significantly improves performance of strstr using a novel
modified Horspool algorithm.  Needles up to size 256 use a bad-character
table indexed by hashed pairs of characters to quickly skip past mismatches.
Long needles use a self-adapting filtering step to avoid comparing the whole
needle repeatedly.

By limiting the needle length to 256, the shift table only requires 8 bits
per entry, lowering preprocessing overhead and minimizing cache effects.
This limit also implies worst-case performance is linear.

Small needles up to size 3 use a dedicated linear search.  Very long needles
use the Two-Way algorithm.

The performance gain using the improved bench-strstr on Cortex-A72 is 5.8
times basic_strstr and 3.7 times twoway_strstr.

Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
(https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* string/str-two-way.h (two_way_short_needle): Add inline to avoid
warning.
(two_way_long_needle): Block inlining.
* string/strstr.c (strstr2): Add new function.
(strstr3): Likewise.
(STRSTR): Completely rewrite strstr to improve performance.

(cherry picked from commit 5e0a7ecb6629461b28adc1a5aabcc0ede122f201)
ChangeLog
string/str-two-way.h
string/strstr.c