]>
git.ipfire.org Git - thirdparty/vectorscan.git/log
Justin Viiret [Thu, 11 Feb 2016 03:38:12 +0000 (14:38 +1100)]
Rename reduceQueue to reduceInfixQueue
Justin Viiret [Thu, 11 Feb 2016 02:55:48 +0000 (13:55 +1100)]
Use rose_inline rather than really_inline
Justin Viiret [Thu, 11 Feb 2016 02:54:51 +0000 (13:54 +1100)]
roseTestLeftfix: unify common "nfa is dead" code
Justin Viiret [Mon, 15 Feb 2016 01:35:03 +0000 (12:35 +1100)]
Add CATCH_UP to report_block, not "parent" program
Also ensure that exhaustion check happens after catch up, as catch up
may fire reports (which could exhaust).
Justin Viiret [Fri, 12 Feb 2016 02:58:17 +0000 (13:58 +1100)]
limex_runtime.h: scratch header no longer needed
Justin Viiret [Fri, 12 Feb 2016 02:52:39 +0000 (13:52 +1100)]
Remove more unused structures from unit tests
The NFA, LBR no longer need scratch or the NFAContext structure stored
outside the NFA stack.
Justin Viiret [Thu, 11 Feb 2016 05:51:59 +0000 (16:51 +1100)]
NFA API: Remove unused scratch ptr from struct mq
Justin Viiret [Thu, 11 Feb 2016 05:46:15 +0000 (16:46 +1100)]
NFA API: Remove nfaBlockExecReverse scratch arg
Scratch is no longer used by this function's implementations.
Justin Viiret [Thu, 11 Feb 2016 05:40:16 +0000 (16:40 +1100)]
NFA: Move NFAContext to stack (from scratch)
Justin Viiret [Mon, 8 Feb 2016 23:01:53 +0000 (10:01 +1100)]
Rose: allow block-mode merge of small prefixes
Previously, we disallowed the merging of all Rose prefixes in block mode
where the literal sets are not identical.
This change allows merging if the prefix graphs to be merged are very
small, as a small performance improvement for cases with lots of tiny
prefixes.
This check is deliberately conservative: graphs must have some common
vertices, and the result of the merge must not give up any
accelerability.
Justin Viiret [Wed, 10 Feb 2016 04:14:49 +0000 (15:14 +1100)]
NFA merging: permit different reports
For cases where the edges from start are not to a mix of accept and
acceptEod, report sets can be combined.
Justin Viiret [Wed, 10 Feb 2016 04:33:48 +0000 (15:33 +1100)]
Dump: don't call dumpNfaNotes for SOM reverse NFAs
These NFAs have no queue index.
Justin Viiret [Mon, 8 Feb 2016 02:32:21 +0000 (13:32 +1100)]
SET_GROUPS instr: don't generate more than one
Justin Viiret [Mon, 8 Feb 2016 05:04:41 +0000 (16:04 +1100)]
dedupeCatchup: only call when necessary at runtime
Justin Viiret [Sun, 7 Feb 2016 23:21:17 +0000 (10:21 +1100)]
DEDUPE instr: generate only when necessary
Justin Viiret [Thu, 4 Feb 2016 01:46:53 +0000 (12:46 +1100)]
Rose: clean up use of scratch, RoseContext
Justin Viiret [Mon, 1 Feb 2016 22:42:00 +0000 (09:42 +1100)]
Rose: pack global state bits into one u8
Eliminate the RoseRuntimeState structure in favour of a single status
byte that is stored in scratch and copied to/from stream state.
Justin Viiret [Mon, 1 Feb 2016 00:07:07 +0000 (11:07 +1100)]
Rose: remove alignment req for anchored DFA state
Justin Viiret [Mon, 18 Jan 2016 00:56:01 +0000 (11:56 +1100)]
Rose: move more report handling work into program
Move report preconditions (bounds, exhaustion, etc) into program
instructions and use a more direct path to the user match callback than
the adaptor functions.
Report handling has been moved to new file src/report.h. Reporting from
EOD now uses the same instructions as normal report handling, rather
than its own.
Jump target tracking in rose_build_bytecode.cpp has been cleaned up.
Justin Viiret [Thu, 28 Jan 2016 06:16:53 +0000 (17:16 +1100)]
ng_filter: Fix bug introduced in
98eff64
If the max width is modified for a region, use the modified version when
checking to see if a self-loop must be added on the last vertex.
Justin Viiret [Thu, 14 Jan 2016 05:55:39 +0000 (16:55 +1100)]
RoseRuntimeState no longer needs to be packed
This structure only contains u8 values now. In the future we may wish to
eliminate it entirely and store the few bits we need more directly.
Justin Viiret [Thu, 14 Jan 2016 02:45:44 +0000 (13:45 +1100)]
Docs for Rose callback types
Justin Viiret [Wed, 13 Jan 2016 02:20:38 +0000 (13:20 +1100)]
Make Rose callback types explicitly take scratch
Justin Viiret [Wed, 13 Jan 2016 01:39:28 +0000 (12:39 +1100)]
Remove RoseContext::userCtx
All Rose callbacks receive scratch as their context.
Justin Viiret [Sun, 10 Jan 2016 22:25:32 +0000 (09:25 +1100)]
Move cyclic path redundancy into reduce loop
Sometimes cyclic path redundancy can uncover further reduction work that
can be done by the other passes in the reduce loop.
Justin Viiret [Sun, 17 Jan 2016 22:18:19 +0000 (09:18 +1100)]
roseEodRunMatcher: correct early return value
Justin Viiret [Mon, 11 Jan 2016 04:19:09 +0000 (15:19 +1100)]
nfaCheckFinalState: define return value
Make nfaCheckFinalState return MO_HALT_MATCHING when the user instructs
us (via the callback return value) to halt matching. In the caller,
check this value and stop matching if told.
Anatoly Burakov [Tue, 12 Jan 2016 16:21:20 +0000 (16:21 +0000)]
Don't look for accel friends for multibyte acceleration
Justin Viiret [Thu, 14 Jan 2016 03:48:22 +0000 (14:48 +1100)]
scratch: correctly align fatbit arrays
This fixes an assertion failure on 32-bit targets.
Justin Viiret [Thu, 7 Jan 2016 00:56:57 +0000 (11:56 +1100)]
Use fatbit for anch log, delay slots in scratch
Since these structures are in scratch, they do not have to be as small
as possible and we can use fatbit instead of multibit to improve
performance.
Justin Viiret [Wed, 13 Jan 2016 23:38:24 +0000 (10:38 +1100)]
rose_build_anchored: take ref, not pointer
Justin Viiret [Wed, 13 Jan 2016 23:24:19 +0000 (10:24 +1100)]
Account for multi-dfa case with ANCHORED_DELAY
Specifically, we must set build_context::floatingMinLiteralMatchOffset
to 1 when thew anchored table contains multiple DFAs, as they can
produce unordered matches.
This check was already been done, but too late to affect the generation
of ANCHORED_DELAY instructions.
Justin Viiret [Wed, 13 Jan 2016 22:53:15 +0000 (09:53 +1100)]
Use correct type for anchored matcher build
Justin Viiret [Wed, 13 Jan 2016 21:52:18 +0000 (08:52 +1100)]
Fix release build (unused var)
Justin Viiret [Tue, 12 Jan 2016 03:10:23 +0000 (14:10 +1100)]
Remove dupe engine, state ptrs from RoseContext
Remove the RoseEngine and stream state pointers frose RoseContext, as
they are also present in core_info.
Unify stream state handing in Rose to always use a char * (we were often
a u8 * for no particularly good reason) and tidy up.
Matthew Barr [Tue, 12 Jan 2016 03:48:35 +0000 (14:48 +1100)]
Coverity: Restore output stream format
Justin Viiret [Fri, 18 Dec 2015 04:24:52 +0000 (15:24 +1100)]
Rose: Move all literal operations into program
Replace the RoseLiteral structure with more program instructions; now,
instead of each literal ID leading to a RoseLiteral, it simply has a
program to run (and a delay rebuild program).
This commit also makes some other improvements:
* CHECK_STATE instruction, for use instead of a sparse iterator over a
single element.
* Elide some checks (CHECK_LIT_EARLY, ANCHORED_DELAY, etc) when not
needed.
* Flatten PUSH_DELAYED behaviour to one instruction per delayed
literal, rather than the mask/index-list approach used before.
* Simple program cache at compile time for deduplication.
Alex Coyte [Mon, 11 Jan 2016 02:14:58 +0000 (13:14 +1100)]
squashing: prevent generation of pairs of squash states
Matthew Barr [Mon, 11 Jan 2016 03:28:27 +0000 (14:28 +1100)]
alignof() should operate on a type-id
Justin Viiret [Sun, 10 Jan 2016 21:58:08 +0000 (08:58 +1100)]
Delete unused build_context::depths
Justin Viiret [Thu, 7 Jan 2016 23:10:10 +0000 (10:10 +1100)]
Remove use of depth from Rose entirely
Justin Viiret [Thu, 7 Jan 2016 22:58:20 +0000 (09:58 +1100)]
Don't use depth for in-flight check
Justin Viiret [Thu, 7 Jan 2016 22:11:09 +0000 (09:11 +1100)]
Remove CHECK_DEPTH instruction
Justin Viiret [Wed, 6 Jan 2016 00:45:31 +0000 (11:45 +1100)]
Remove "dot" entries from leftfix lookarounds
Note that we have to be careful to leave the first lookaround entry in
place, if it's a dot. This should eventually be done with a program
instruction.
Justin Viiret [Mon, 4 Jan 2016 04:33:05 +0000 (15:33 +1100)]
ComponentRepeat: remove firsts_cache, precalc code
Firsts are easy to compute in ComponentRepeat::first() now.
Justin Viiret [Mon, 4 Jan 2016 04:23:28 +0000 (15:23 +1100)]
ComponentRepeat: wire X{0,N} and (X?){N} the same
Justin Viiret [Tue, 15 Dec 2015 03:34:25 +0000 (14:34 +1100)]
ComponentRepeat: wire R{0,N} as (R{1,N})?
Change the way that we wire up the edges in a bounded repeat to avoid
large fan-out from predecessors.
Justin Viiret [Sun, 13 Dec 2015 23:08:57 +0000 (10:08 +1100)]
ng_prefilter: turn large max bound into inf
During prefilter region replacement, turn regions with very large max
bounds into repeats with inf max bound. This improves compile time and
the likelihood that we will actually be able to build an implementation
for such patterns.
Anatoly Burakov [Wed, 9 Dec 2015 13:39:16 +0000 (13:39 +0000)]
Multibyte matcher unit-tests
Anatoly Burakov [Wed, 9 Dec 2015 11:24:16 +0000 (11:24 +0000)]
Bitmatcher unit-tests
Anatoly Burakov [Wed, 9 Dec 2015 13:38:58 +0000 (13:38 +0000)]
Multibyte acceleration compile side
Anatoly Burakov [Wed, 9 Dec 2015 13:10:57 +0000 (13:10 +0000)]
Multibyte truffle runtime
Anatoly Burakov [Wed, 9 Dec 2015 12:20:34 +0000 (12:20 +0000)]
Multibyte shufti runtime
Anatoly Burakov [Wed, 9 Dec 2015 11:46:19 +0000 (11:46 +0000)]
Multibyte vermicelli runtime
Anatoly Burakov [Wed, 9 Dec 2015 11:11:49 +0000 (11:11 +0000)]
Adding bitmatchers
Anatoly Burakov [Wed, 9 Dec 2015 12:36:12 +0000 (12:36 +0000)]
Adding AVX2 version of truffle
Justin Viiret [Mon, 4 Jan 2016 05:04:19 +0000 (16:04 +1100)]
scratch: remove sparse iter state (now unused)
Justin Viiret [Mon, 4 Jan 2016 05:02:20 +0000 (16:02 +1100)]
roseRunProgram: iter state on stack
Justin Viiret [Mon, 4 Jan 2016 02:46:51 +0000 (13:46 +1100)]
roseCatchUpLeftfixes: iter state on stack
Justin Viiret [Mon, 4 Jan 2016 02:44:26 +0000 (13:44 +1100)]
roseBlockHasEodWork: iter state on stack
Justin Viiret [Mon, 4 Jan 2016 02:42:21 +0000 (13:42 +1100)]
roseFlushLastByteHistory: iter state on stack
Justin Viiret [Wed, 23 Dec 2015 04:12:28 +0000 (15:12 +1100)]
roseCheckNfaEod: use sparse iterator for EOD
Rather than checking all active outfix/suffix engines, use a sparse
iterator to check only those engines that accept at EOD.
Justin Viiret [Fri, 18 Dec 2015 04:50:56 +0000 (15:50 +1100)]
runtime: hoist broken check in streaming mode
Matthew Barr [Tue, 8 Dec 2015 03:40:20 +0000 (14:40 +1100)]
Build the tools dir only if the cmake file exists
Justin Viiret [Fri, 18 Dec 2015 00:48:33 +0000 (11:48 +1100)]
writeEodProgram: avoid make_move_iterator warning
Avoid an ambiguity between std:: and boost::make_move_iterator on builds
against libc++.
Justin Viiret [Thu, 10 Dec 2015 00:41:47 +0000 (11:41 +1100)]
rose: Extend program to handle literals, iterators
- cleanups
- add sparse iter instructions
- merge "root" and "sparse iter" programs together
- move program execution to new file program_runtime.h
- simplify EOD execution
Justin Viiret [Mon, 14 Dec 2015 23:10:59 +0000 (10:10 +1100)]
make_disjoint: Remove dead code
Justin Viiret [Mon, 14 Dec 2015 23:05:22 +0000 (10:05 +1100)]
convertAnchPrefixToBounds: check size of delay_adj
Avoid subtracting delay_adj from a smaller max bound.
Justin Viiret [Thu, 10 Dec 2015 04:35:12 +0000 (15:35 +1100)]
Perform an early removeRedundancy call on graph
This allows sibling character classes to be merged together before graph
component splitting is done by calcComponents().
In particular, this transforms (A|a)(B|b)(C|c) into [Aa][Bb][Cc]
earlier.
Justin Viiret [Wed, 9 Dec 2015 22:04:36 +0000 (09:04 +1100)]
Remove dead code: EdgeSourceStateCompare
Justin Viiret [Fri, 4 Dec 2015 05:17:28 +0000 (16:17 +1100)]
rose: Extend the interpreter to handle more work
- Use program for EOD sparse iterator
- Use program for literal sparse iterator
- Eliminate RoseRole, RosePred, RoseVertexProps::role
- Small performance optimizations
Justin Viiret [Wed, 18 Nov 2015 22:32:05 +0000 (09:32 +1100)]
rose: Use an interpreter for role runtime
Replace much of the RoseRole structure with an interpreted program,
simplifying the Rose runtime and making it much more flexible.
Alex Coyte [Thu, 26 Nov 2015 01:44:56 +0000 (12:44 +1100)]
detach the sidecar
Alex Coyte [Wed, 2 Dec 2015 04:49:49 +0000 (15:49 +1100)]
make nfaExecCastle0_QR() more efficent
1. Reverse scan for the last escape and only process later events.
2. Only scheck subcastles which may expire for staleness
Alex Coyte [Wed, 2 Dec 2015 04:15:02 +0000 (15:15 +1100)]
Rework literal overlap checks for merging engines
Also increase the size of chunks we consider merging for castles.
Alex Coyte [Wed, 2 Dec 2015 03:41:57 +0000 (14:41 +1100)]
Introduce REPEAT_ALWAYS model for {0,} castle repeats
As Castle guards the repeats, no more state is needed for these repeats
Alex Coyte [Wed, 2 Dec 2015 03:23:02 +0000 (14:23 +1100)]
Allow lag on castle infixes to be reduced
Reducing lag allows for castles to be merged more effectively
Alex Coyte [Sun, 6 Dec 2015 23:23:32 +0000 (10:23 +1100)]
Use add_edge_if_not_present in somMayGoBackwards()
As somMayGoBackwards() operates on a copy of the graph where virtual
starts have been collapsed on to startDs, we need to be careful not to
create parallel edges.
Matthew Barr [Fri, 18 Dec 2015 03:41:50 +0000 (14:41 +1100)]
Merge branch develop into master
Matthew Barr [Fri, 18 Dec 2015 03:37:29 +0000 (14:37 +1100)]
Bump version number
Justin Viiret [Wed, 16 Dec 2015 03:31:43 +0000 (14:31 +1100)]
Small updates to documentation for 4.1
Justin Viiret [Wed, 16 Dec 2015 03:10:46 +0000 (14:10 +1100)]
Add ChangeLog
Xiang Wang [Wed, 2 Dec 2015 12:24:57 +0000 (07:24 -0500)]
simplify max clique analysis
Justin Viiret [Wed, 2 Dec 2015 07:16:49 +0000 (18:16 +1100)]
Add per-top findMinWidth etc for NFA graphs
Justin Viiret [Wed, 2 Dec 2015 22:27:57 +0000 (09:27 +1100)]
CastleProto: track next top explicitly
Repeats may be removed (e.g. by pruning in role aliasing passes)
leaving "holes" in the top map. Track the next top to use explicitly,
rather than using repeats.size().
Justin Viiret [Wed, 2 Dec 2015 00:31:09 +0000 (11:31 +1100)]
CastleProto: track mapping of reports to tops
This allows us to speed up report-based queries, like dedupe checking.
Justin Viiret [Tue, 1 Dec 2015 23:39:32 +0000 (10:39 +1100)]
assignDkeys: use flat_set<ReportID>, not set
Justin Viiret [Tue, 1 Dec 2015 23:24:54 +0000 (10:24 +1100)]
findMinWidth, findMaxWidth: width for a given top
Currently only implemented for Castle suffixes.
Justin Viiret [Tue, 1 Dec 2015 22:54:55 +0000 (09:54 +1100)]
RoseDedupeAuxImpl: collect unique suffixes first
Justin Viiret [Tue, 1 Dec 2015 22:47:59 +0000 (09:47 +1100)]
role aliasing: simplify hashRightRoleProperties
Using the full report set for a suffix as an input to this hash was very
slow at scale.
Justin Viiret [Tue, 1 Dec 2015 22:38:20 +0000 (09:38 +1100)]
castle: simplify find_next_top
Tops are no longer sparse in CastleProto, so the linear scan for holes
isn't necessary.
Justin Viiret [Fri, 27 Nov 2015 02:30:59 +0000 (13:30 +1100)]
Make key 64 bits where large shifts may be used.
This fixes a long-standing issue with large multibit structures.
Justin Viiret [Wed, 25 Nov 2015 06:05:36 +0000 (17:05 +1100)]
PCRE includes U+180E in /[:print:]/8W
Justin Viiret [Wed, 25 Nov 2015 03:53:27 +0000 (14:53 +1100)]
Update defn of class [:punct:] for PCRE 8.38
Justin Viiret [Tue, 17 Nov 2015 06:23:52 +0000 (17:23 +1100)]
Unify handling of caseless flag in class parser
Apply caselessness to each element added to a class, rather than all at
finalize time (which required separated ucp dnf and-ucp working data).
Unifies the behaviour of AsciiComponentClass and Utf8ComponentClass in
this respect.
Justin Viiret [Mon, 16 Nov 2015 05:43:43 +0000 (16:43 +1100)]
Fix defn of POSIX graph, print, punct classes
The POSIX classes [:graph:], [:print:] and [:punct:] are handled
specially in UCP mode by PCRE. This change matches that behaviour.
Mohammad Abdul Awal [Tue, 17 Nov 2015 17:50:23 +0000 (17:50 +0000)]
FDR runtime simplification
Removed static specialisation of domains.
Justin Viiret [Fri, 13 Nov 2015 03:36:28 +0000 (14:36 +1100)]
ng_execute: update interface to use flat_set
This changes all the execute_graph() interfaces so that instead of
mutating a std::set of vertices, they accept an initial flat_set of
states and return a resultant flat_set of states after execution.
(Note that internally execute_graph() still uses bitsets)
This is both faster and more flexible.
Justin Viiret [Thu, 12 Nov 2015 02:27:55 +0000 (13:27 +1100)]
Restore \Q..\E support in character classes
Justin Viiret [Thu, 12 Nov 2015 04:27:11 +0000 (15:27 +1100)]
Introduce copy_bytes for writing into bytecode
Protects memcpy from nullptr sources, which triggers failures in GCC's
UB sanitizer.