run_accel() passed c_end - 1 to shuftiDoubleExec().
shuftiDoubleExecReal already handles the last-byte boundary internally
via check_last_byte(), so shortening the buffer caused it to miss
valid matches near the end and apply the wildcard check to the wrong
byte. Changed to pass c_end.
Fixes: ca70a3d9beca61b58c6709fead60ec662482d36e ("Fix double shufti's
vector end false positive (#325)")
Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
* shufti-double: Preserve first_char_mask after peel-off
first_char_mask was reset to Ones() after the peel-off block,
discarding carry-over state for cross-boundary pattern detection.
Remove the reset.
Fixes: ca70a3d9beca61b58c6709fead60ec662482d36e ("Fix double shufti's
vector end false positive (#325)")
Add three tests exercising the double shufti edge cases.
- ExecMatchVectorEdge: Two-byte pair ("ab") spanning the peel-off to
aligned block boundary is correctly detected. Validates that
first_char_mask state carries over and is not reset after peel-off.
- ExecNoMatchLastByte: First character of a double-byte pair ('x' from
"xy") at the last buffer byte does not cause a false positive when
the second character is absent.
- ExecMatchLastByte: Single-byte pattern ('a') at the last buffer byte
is detected via check_last_byte's reduce.
Add missing sentinel element to state transition vectors for both
16-state and 32-state DFA test configurations. The alpha_size includes
a sentinel entry at index alpha_size-1, so each state's next vector
must have alpha_size elements.
Add tests covering edge cases and broader scenarios:
- Early termination via callback (single and double char)
- No-match scenarios for single and double patterns
- Empty and minimal-length buffer handling
- Large buffer scanning (multi-vector iteration)
- Case-insensitive matching for single and double patterns
- Unaligned buffer scanning
- Various alignment boundary conditions
- All-match dense buffers for single and double patterns
simd: convert rshift64 macros to functions and fix simd_utils bugs (#376)
Convert rshift64_m128/m256/m512 macros to inline functions that
support runtime (non-constant) shift amounts on x86, matching the
existing lshift64 function implementations.
Also fix:
- lshift64_m256/rshift64_m256 parameter type from int to unsigned in
the non-256-bit fallback path (common/simd_utils.h)
- isnonzero512: remove redundant self-OR operations
- load512: fix alignment assertion to check m512 instead of m256
Fixes: 3f0f9e60526d ("move x86 implementations of simd_utils.h to util/arch/x86/") Fixes: 6ff47528ba22 ("add scalar versions of the vectorized functions for architectures that don't support 256-bit/512-bit SIMD vectors such as ARM") Fixes: 75aadb76f82e ("split arch-agnostic simd_utils.h functions into the common file") Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
In nvermicelliExecReal, the tail path loads a vector from
(buf_end - S) so the unprocessed tail bytes sit at the END of the
vector (high offsets). However, first_zero_match_inverted<64>
applies a mask selecting only the FIRST 'len' bytes (low offsets),
which means it re-checks the already-scanned overlap region and
completely misses the actual tail bytes.
This only affects AVX-512 (S=64) because the 16-byte and 32-byte
specializations of first_zero_match_inverted mark 'len' as UNUSED and
always check the full vector.
Fix by passing S instead of (buf_end - d) as the length, so the full
vector is checked. The overlap bytes are guaranteed to already match,
so no false positives are possible, and the existing (rv < buf_end)
guard prevents out-of-range results.
fix: correct SVE accelerator bugs in shufti, truffle, and noodle (#377)
* shufti-double: use predicate mask to prevent false positives
doubleMatched() in shufti_sve.hpp used svptrue_b8() for the final
comparison instead of the caller-provided predicate pg. When called
from dshuftiOnce() with a partial predicate (buffer shorter than SVE
vector length), inactive lanes loaded as zero could satisfy the match
condition, producing false positive matches.
Changed the return statement to use pg for svnot_z, ensuring inactive
lanes are excluded from match results.
Added 5 unit tests covering short/variable-length buffers with
null-byte pair patterns and mixed single/double-byte patterns to
catch regressions.
Fixes: 60b211250562626d6536e992cc1d0d52cd128f44 ("Use SVE for double shufti") Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
* truffle: fix off-by-one in rtruffleExecSVE tail and add unit tests
Fix a bug in rtruffleExecSVE where the tail processing for short
buffers used svwhilele_b8 instead of svwhilelt_b8. svwhilele_b8(0, N)
activates lanes 0..N (N+1 lanes), reading one byte past the buffer
end. The forward path (truffleExecSVE) already correctly uses
svwhilelt_b8, which activates lanes 0..N-1 (N lanes).
Add 26 new unit tests for the truffle accelerator covering:
- Compile roundtrip: character ranges, empty class, same-nibble chars
- Forward exec: single byte buffers, high byte (>=0x80) matching,
same-nibble non-match, NUL char, dot (all chars), buffer-end match,
varying lengths (1-130), alignment sweep, multi-char classes,
all 256 single-char classes, 0x7F/0x80 boundary
- Reverse exec: single byte, high byte, NUL, buffer-start match,
varying lengths, large buffer (4K), alignment sweep, all 256
single-char classes, multiple matches, boundary chars
Fixes: c67076ce22452bdfe423063b273ded8bd7444aae ("Add truffle SVE implementation") Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
* hwlm: correct return types and scan length in SVE noodle engine
- Change return type from hwlmcb_rv_t to hwlm_error_t to match the
actual return type of checkMatched() and singleCheckMatched()
- Fix scanDouble short-path condition: use (e - d) instead of scan_len
which could be stale after adjusting d for history
- Fix formatting: add space after 'if' keyword
Fixes: 0ba1cbb32b5b ("Add SVE2 support for noodle") Fixes: b2332218a474 ("Remove possibly undefined behaviour from Noodle.") Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
---------
Fix out-of-bounds read in shuftiDoubleExecReal tail handling (#381)
Commit 9e9a10ad ("Fix double shufti's vector end false positive", #325)
changed the tail code in shuftiDoubleExecReal to read a full
vector from the current pointer (loadu(d)), which can overread past
buf_end by up to S-1 bytes. When the buffer ends at a page boundary
followed by unmapped memory, this causes a SIGSEGV.
Fix by reading the last S bytes backward from buf_end (matching the
approach used in shuftiExecReal), and falling back to memcpy for
buffers shorter than one vector width.
Fix critical bugs found in the SuperVector SIMD abstraction layer.
SuperVector operator!() — x86 (SSE/AVX2/AVX512), ppc64el:
- Was XOR-ing with self (always returns Zeroes instead of bitwise
NOT).
- Note that some other operators depends on operator!().
SuperVector<16> Ones_vshl() — x86:
- Called vshr_128() instead of vshl_128().
Element-wise shift boundary and Unroller range — x86:
- vshl_32/vshr_32 on SuperVector<16>: zero-boundary was N==16,
instead of N>=32; Unroller range was <1,16> not <1,32>.
- vshl_64/vshr_64 on SuperVector<16>: same issue.
- vshl_64/vshr_64 on SuperVector<32>: same issue.
- vshr_64 on SuperVector<64>: same issue.
SuperVector<32> vshr_256_imm — x86:
- Was a copy-paste of vshl_256_imm.
SuperVector<64> vshr_256_imm — x86:
- Operated on v256[0] only with broken SuperVector<32> logic.
SuperVector<64> vsh{l,r}_* — x86, ppc64el, arm:
- Were incorrectly delegating to vshl_128/vshr_128. (x86)
- Did not have boundary checks. PPC wraps when it tries to shift
more than bit length. (ppc64el)
- Had signed rshifts. (arm)
comparison operators - arm:
- operator>=: used vcgeq_u8 (unsigned) instead of vcgeq_s8 (signed).
- operator<=: used vcgeq_s8 (>=) instead of vcleq_s8 (<=).
Fixes: 1af82e395fdce3117a1e18d9f8198a626b07cc2f Fixes: f0e6b8459c4f1e9fb54d637c3666a4fab97f45cb Fixes: 2f55e5b54f70693bb844015306793d29a29cd51c Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
* supervector: Add more unit tests for SuperVector
These tests covers NOT operator and many of element-wise shifts,
especially for AVX2, AVX512.
Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
* shufti: Use operator!= for non-zero test
The match result vector (c_lo & c_hi) was compared using operator>
against Zeroes to detect non-zero (matching) bytes. On ARM, operator>
delegates to signed comparison (vcgtq_s8), which treats byte values
with the high bit set (0x80–0xFF) as negative, making them compare
as less than zero and falsely reporting no match.
Fixes: 92e0b9a35192ae975de9fd032fb66ddcb1682c5a ("simplify shufti and
provide arch-specific block functions")
accel: Fix offset clamping in do_accel_block to not go before start (#365)
When applying the accel offset, the result was clamped to buf which
could produce a position before *start, potentially causing the caller
to re-scan bytes that were already processed. Clamp to ptr (buf +
*start) instead, so that the accelerator never rewinds past the original
start position.
Tomer Lev [Thu, 12 Feb 2026 13:23:09 +0000 (15:23 +0200)]
cmake: add PKGCONFIG_EXTRA_LIBS option for pkg-config (#361)
Add a CMake cache variable to pass arbitrary flags to the pkg-config
Libs line. This enables CGO builds using GCC (not g++) to link correctly
by allowing users to specify C++ stdlib dependencies.
* remove the use of macros for critical loops, easier to debug
removed switch, merged get_conf_stride functions into 1
* remove the use of macros for critical loops, easier to debug
removed switch, merged get_conf_stride functions into 1
split FDR implementations into arch specific files (same for now)
The nm implementations from GNU binutils and FreeBSD parse the format
argument value only by looking at the first character.
They are accept both 'p' and 'posix'.
Reference in Linux kernel for the same change:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm/Makefile?h=v6.17-rc7&id=76ebc6a429ec2becc2fa738c85ab9688ea4b9006
Multiple changes since last release, this will be the last 100% ABI and
API compatible with Hyperscan release.
Next versions will include major refactors and API extensions, it will
be mostly backwards compatible however.
Without particular order, platform support is now:
In total more than 200 configurations in the CI are tested for every PR.
Other features:
- Fat Runtime supported for Arm as well (ASIMD/SVE/SVE2).
- Initial implementations for Arm SVE/SVE2 algorithms added, thanks to
Yoan Picchi from Arm.
- SIMDe support added, used as an alternative backend for existing
platforms, but mostly interesting for allowing Vectorscan to build in
new platforms without a supported SIMD engine.
- Various speedups and optimizations.
- Cppcheck and clang-tidy fixes throughout the code, both have been
added to CI for multiple configurations, but only cppcheck triggers a
build failure for now.
Various bugfixes, most important listed:
- Speed up truffle with 256b TBL instructions (#290)
- Fix Clang Tidy warnings (#295)
- Clang 17+ is more restrictive on rebind<T> on MacOS/Boost, remove
warning (#332)
- partial_load_u64 will fail if buf == NULL/c_len == 0 (#331)
- Bugfix/fix avx512vbmi regressions (#335)
- fix missing hs_version.h header (closes #198)
- hs_valid_platform: Fix check for SSE4.2 (#310)
- Fixed out of bounds read in AVX512VBMI version of fdr_exec_fat_teddy …
(#333)
- Fix noodle SVE2 off by one bug (#313)
- Make vectorscan accept \0 starting pattern (#312)
- Fix 5.4.11's config step regression (#327)
- Fix double shufti's vector end false positive (#325)
cmake - guard against failed GNUCC_ARCH extraction (#339)
Prevents overwriting GNUCC_ARCH with an empty value when parsing output
of gcc -Q --help=target. Ensures robustness if detection fails and
returns an empty string.
Signed-off-by: Ibrahim Kashif <ibrahim.kashif@arm.com>
Double shufti used to offset one vector, resulting in losing one character
at the end of every vector. This was replaced by a magic value indicating a
match. This meant that if the first char of a pattern fell on the last char of
a vector, double shufti would assume the second character is present and
report a match.
This patch fixes it by keeping the previous vector and feeding its data to the
new one when we shift it, preventing any loss of data.
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
* vshl() will call the correct implementation
* implement missing vshr_512_imm(), simplifies caller x86 code
* Fix x86 case, use alignr instead
* it's the reverse, the avx512 alignr is incorrect, need to fix
* added static libraries in cmake to fix unit-internal seg fault in freebsd, ppc64le, gcc13 error
* Moved gcc13 flags for freebsd-gcc13 in cmake/cflags-ppc64le.make
src/nfa/mcsheng_compile.cpp: No need for an assert here, impl_id can be set to 0
src/nfa/nfa_api_queue.h: Make sure this compiles on both C++ and C
src/nfagraph/ng_fuzzy.cpp: Fix compilation error when DEBUG_OUTPUT=on
src/runtime.c: Fix crash when data == NULL
unit/internal/sheng.cpp: Unit test has to enable AVX512VBMI manually as autodetection does not get trigger, this causes test to fail
src/fdr/teddy_fat.cpp: AVX512 loads need to be 64-bit aligned, caused a crash on clang-18
ypicchi-arm [Wed, 14 May 2025 21:58:01 +0000 (22:58 +0100)]
Fix 5.4.11's config step regression (#327)
An old commit (24ae1670d) had the side effect of moving cmake defines after
they were being used. This patch move them back to be defined before being used.
Speed hsbench back up by ~ 0.8%
* Fix noodle spurious match with \0 chars for SVE2
When sve2's noodle process a non full vector (before the main loop or
at the end of it), a fake \0 was being parsed, trigerring a match for
pattern that ended with \0. This patch fix this.
Michael Tremer [Thu, 22 Aug 2024 07:34:05 +0000 (08:34 +0100)]
hs_valid_platform: Fix check for SSE4.2 (#310)
Vectorscan requires SSE4.2 as a minimum on x86_64. For Hyperscan this
used to be SSSE3.
Applications that use the library call hs_valid_platform() to check if
the CPU fulfils this minimum requirement. However, when Vectorscan
upgraded to SSE4.2, the check was not updated. This leads to the library
trying to execute instructions that are not supported, resulting in the
application to crash.
This might not have been noticed as the CPUs that do not support SSE4.2
are rather old and unlikely to run any load where performance is an
issue. However, I believe that the library should not let the
application crash.
Signed-off-by: Michael Tremer <michael.tremer@ipfire.org>
ypicchi-arm [Thu, 22 Aug 2024 07:32:53 +0000 (08:32 +0100)]
Make vectorscan accept \0 starting pattern (#312)
Vectorscan used to reject such pattern because they were being compared
to "" and found to be an empty string. We now check the pattern length
instead.
ypicchi-arm [Mon, 5 Aug 2024 06:42:56 +0000 (07:42 +0100)]
Fix noodle SVE2 off by one bug (#309)
By using svmatch on 16 bit lanes with a 8 bit predicate, we end up
including an undefined character in the pattern checks. The inactive
lane after load contains an undefined value, usually \0. Patterns
using \0 as the last character would then match this spurious
character, returning a match beyond the buffer's end. The fix checks
for such matches and rejects them.
Fixes some of the clang-tidy warnings
clang-analyzer-deadcode.DeadStores
clang-analyzer-cplusplus.NewDelete
clang-analyzer-core.uninitialized.UndefReturn
closes some:#253
ignored in this pr:
/usr/include/boost/smart_ptr/detail/shared_count.hpp:432:24
/usr/include/boost/smart_ptr/detail/shared_count.hpp:443:24
51 in build/src/parser
gtest ones
src/fdr/teddy_compile.cpp:600:5 refactoring on way
src/fdr/fdr_compile.cpp:209:5 refactoring on way
Speed up truffle with 256b TBL instructions (#290)
256b wide SVE vectors allow some simplification of truffle. Up to 40%
speedup on graviton3. Going from 12500 MB/s to 17000 MB/s onhe
microbenchmark.
SVE2 also offer this capability for 128b vector with a speedup around
25% compared to normal SVE
Add unit tests and benchmark for this wide variant
256b wide SVE vectors allow some simplification of truffle.
Up to 40% speedup on graviton3. Going from 12500 MB/s to 17000 MB/s
onhe microbenchmark.
SVE2 also offer this capability for 128b vector with a speedup around
25% compared to normal SVE
Add unit tests and benchmark for this wide variant
G.E. [Mon, 20 May 2024 15:03:56 +0000 (18:03 +0300)]
revert a change to assert , the original logic might have been
subtely clever (or else totally useless all these years), when we
see which of the two we might delete that assert entirely. for now
put it back as it was.