]> git.ipfire.org Git - thirdparty/git.git/commit
grep: correctly identify utf-8 characters with \{b,w} in -P
authorCarlo Marcelo Arenas Belón <carenas@gmail.com>
Sun, 8 Jan 2023 15:52:17 +0000 (07:52 -0800)
committerJunio C Hamano <gitster@pobox.com>
Wed, 18 Jan 2023 23:24:52 +0000 (15:24 -0800)
commitacabd2048ee0ee53728100408970ab45a6dab65e
tree52f0571afd786b179b89b71a42c675758b7993e6
parentc48035d29b4e524aed3a32f0403676f0d9128863
grep: correctly identify utf-8 characters with \{b,w} in -P

When UTF is enabled for a PCRE match, the corresponding flags are
added to the pcre2_compile() call, but PCRE2_UCP wasn't included.

This prevents extending the meaning of the character classes to
include those new valid characters and therefore result in failed
matches for expressions that rely on that extention, for ex:

  $ git grep -P '\bÆvar'

Add PCRE2_UCP so that \w will include Æ and therefore \b could
correctly match the beginning of that word.

This has an impact on performance that has been estimated to be
between 20% to 40% and that is shown through the added performance
test.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
grep.c
t/perf/p7822-grep-perl-character.sh [new file with mode: 0755]