Multiple changes since last release, this will be the last 100% ABI and
API compatible with Hyperscan release.
Next versions will include major refactors and API extensions, it will
be mostly backwards compatible however.
Without particular order, platform support is now:
In total more than 200 configurations in the CI are tested for every PR.
Other features:
- Fat Runtime supported for Arm as well (ASIMD/SVE/SVE2).
- Initial implementations for Arm SVE/SVE2 algorithms added, thanks to
Yoan Picchi from Arm.
- SIMDe support added, used as an alternative backend for existing
platforms, but mostly interesting for allowing Vectorscan to build in
new platforms without a supported SIMD engine.
- Various speedups and optimizations.
- Cppcheck and clang-tidy fixes throughout the code, both have been
added to CI for multiple configurations, but only cppcheck triggers a
build failure for now.
Various bugfixes, most important listed:
- Speed up truffle with 256b TBL instructions (#290)
- Fix Clang Tidy warnings (#295)
- Clang 17+ is more restrictive on rebind<T> on MacOS/Boost, remove
warning (#332)
- partial_load_u64 will fail if buf == NULL/c_len == 0 (#331)
- Bugfix/fix avx512vbmi regressions (#335)
- fix missing hs_version.h header (closes #198)
- hs_valid_platform: Fix check for SSE4.2 (#310)
- Fixed out of bounds read in AVX512VBMI version of fdr_exec_fat_teddy …
(#333)
- Fix noodle SVE2 off by one bug (#313)
- Make vectorscan accept \0 starting pattern (#312)
- Fix 5.4.11's config step regression (#327)
- Fix double shufti's vector end false positive (#325)
cmake - guard against failed GNUCC_ARCH extraction (#339)
Prevents overwriting GNUCC_ARCH with an empty value when parsing output
of gcc -Q --help=target. Ensures robustness if detection fails and
returns an empty string.
Signed-off-by: Ibrahim Kashif <ibrahim.kashif@arm.com>
Double shufti used to offset one vector, resulting in losing one character
at the end of every vector. This was replaced by a magic value indicating a
match. This meant that if the first char of a pattern fell on the last char of
a vector, double shufti would assume the second character is present and
report a match.
This patch fixes it by keeping the previous vector and feeding its data to the
new one when we shift it, preventing any loss of data.
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
* vshl() will call the correct implementation
* implement missing vshr_512_imm(), simplifies caller x86 code
* Fix x86 case, use alignr instead
* it's the reverse, the avx512 alignr is incorrect, need to fix
* added static libraries in cmake to fix unit-internal seg fault in freebsd, ppc64le, gcc13 error
* Moved gcc13 flags for freebsd-gcc13 in cmake/cflags-ppc64le.make
src/nfa/mcsheng_compile.cpp: No need for an assert here, impl_id can be set to 0
src/nfa/nfa_api_queue.h: Make sure this compiles on both C++ and C
src/nfagraph/ng_fuzzy.cpp: Fix compilation error when DEBUG_OUTPUT=on
src/runtime.c: Fix crash when data == NULL
unit/internal/sheng.cpp: Unit test has to enable AVX512VBMI manually as autodetection does not get trigger, this causes test to fail
src/fdr/teddy_fat.cpp: AVX512 loads need to be 64-bit aligned, caused a crash on clang-18
ypicchi-arm [Wed, 14 May 2025 21:58:01 +0000 (22:58 +0100)]
Fix 5.4.11's config step regression (#327)
An old commit (24ae1670d) had the side effect of moving cmake defines after
they were being used. This patch move them back to be defined before being used.
Speed hsbench back up by ~ 0.8%
* Fix noodle spurious match with \0 chars for SVE2
When sve2's noodle process a non full vector (before the main loop or
at the end of it), a fake \0 was being parsed, trigerring a match for
pattern that ended with \0. This patch fix this.
Michael Tremer [Thu, 22 Aug 2024 07:34:05 +0000 (08:34 +0100)]
hs_valid_platform: Fix check for SSE4.2 (#310)
Vectorscan requires SSE4.2 as a minimum on x86_64. For Hyperscan this
used to be SSSE3.
Applications that use the library call hs_valid_platform() to check if
the CPU fulfils this minimum requirement. However, when Vectorscan
upgraded to SSE4.2, the check was not updated. This leads to the library
trying to execute instructions that are not supported, resulting in the
application to crash.
This might not have been noticed as the CPUs that do not support SSE4.2
are rather old and unlikely to run any load where performance is an
issue. However, I believe that the library should not let the
application crash.
Signed-off-by: Michael Tremer <michael.tremer@ipfire.org>
ypicchi-arm [Thu, 22 Aug 2024 07:32:53 +0000 (08:32 +0100)]
Make vectorscan accept \0 starting pattern (#312)
Vectorscan used to reject such pattern because they were being compared
to "" and found to be an empty string. We now check the pattern length
instead.
ypicchi-arm [Mon, 5 Aug 2024 06:42:56 +0000 (07:42 +0100)]
Fix noodle SVE2 off by one bug (#309)
By using svmatch on 16 bit lanes with a 8 bit predicate, we end up
including an undefined character in the pattern checks. The inactive
lane after load contains an undefined value, usually \0. Patterns
using \0 as the last character would then match this spurious
character, returning a match beyond the buffer's end. The fix checks
for such matches and rejects them.
Fixes some of the clang-tidy warnings
clang-analyzer-deadcode.DeadStores
clang-analyzer-cplusplus.NewDelete
clang-analyzer-core.uninitialized.UndefReturn
closes some:#253
ignored in this pr:
/usr/include/boost/smart_ptr/detail/shared_count.hpp:432:24
/usr/include/boost/smart_ptr/detail/shared_count.hpp:443:24
51 in build/src/parser
gtest ones
src/fdr/teddy_compile.cpp:600:5 refactoring on way
src/fdr/fdr_compile.cpp:209:5 refactoring on way
Speed up truffle with 256b TBL instructions (#290)
256b wide SVE vectors allow some simplification of truffle. Up to 40%
speedup on graviton3. Going from 12500 MB/s to 17000 MB/s onhe
microbenchmark.
SVE2 also offer this capability for 128b vector with a speedup around
25% compared to normal SVE
Add unit tests and benchmark for this wide variant
256b wide SVE vectors allow some simplification of truffle.
Up to 40% speedup on graviton3. Going from 12500 MB/s to 17000 MB/s
onhe microbenchmark.
SVE2 also offer this capability for 128b vector with a speedup around
25% compared to normal SVE
Add unit tests and benchmark for this wide variant
G.E. [Mon, 20 May 2024 15:03:56 +0000 (18:03 +0300)]
revert a change to assert , the original logic might have been
subtely clever (or else totally useless all these years), when we
see which of the two we might delete that assert entirely. for now
put it back as it was.