Victor Julien [Fri, 28 Oct 2016 10:27:35 +0000 (12:27 +0200)]
pcre: new way of specifying var names
Until now the way to specify a var name in pcre substring capture
into pkt and flow vars was to use the pcre named substring support:
e.g. /(?P<pkt_somename>.*)/
This had 2 drawbacks:
1. limitations of the name. The name could be max 32 chars, only have
alphanumeric and the underscore characters. This imposed limitations
that are not present in flowbits/ints.
2. we didn't actually use the named substrings in pcre through the
API. We parsed the names separately. So putting the names in pcre
would actually be wasteful.
This patch introduces a new way of mapping captures with names:
The order of the captures and the order of the names are mapped 1 on 1.
This method is no longer limited by the pcre API's naming limits. The
'flow:' and 'pkt:' prefixes indicate what the type of variable is. It's
mandatory to specify one.
Victor Julien [Thu, 20 Oct 2016 12:38:33 +0000 (14:38 +0200)]
var-names: expose outside of detect engine
Until now variable names, such as flowbit names, were local to a detect
engine. This made sense as they were only ever used in that context.
For the purpose of logging these names, this needs a different approach.
The loggers live outside of the detect engine. Also, in the case of
reloads and multi-tenancy, there are even multiple detect engines, so
it would be even more tricky to access them from the outside.
This patch brings a new approach. A any time, there is a single active
hash table mapping the variable names and their id's. For multiple
tenants the table is shared between tenants.
The table is set up in a 'staging' area, where locking makes sure that
multiple loading threads don't mess things up. Then when the preparing
of a detection engine is ready, but before the detect threads are made
aware of the new detect engine, the active varname hash is swapped with
the staging instance.
For this to work, all the mappings from the 'current' or active mapping
are added to the staging table.
After the threads have reloaded and the new detection engine is active,
the old table can be freed.
For multi tenancy things are similar. The staging area is used for
setting up until the new detection engines / tenants are applied to
the system.
This patch also changes the variable 'id'/'idx' field to uint32_t. Due
to data structure padding and alignment, this should have no practical
drawback while allowing for a lot more vars.
Victor Julien [Mon, 19 Dec 2016 10:25:58 +0000 (11:25 +0100)]
detect: http_header_names sticky buffer keyword
A sticky buffer that allows content inspection on a contructed buffer
of HTTP header names. The buffer starts with \r\n, the names are
separated by \r\n and the end of the buffer contains an extra \r\n.
E.g. \r\nHost\r\nUser-Agent\r\n\r\n
The leading \r\n is to make sure one can match on a full name in all
cases.
Victor Julien [Mon, 19 Dec 2016 10:25:27 +0000 (11:25 +0100)]
detect: global registery for keyword thread data
Some keywords need a scratch space where they can do store the results
of expensive operations that remain valid for the time of a packets
journey through the detection engine.
An example is the reconstructed 'http_header' field, that is needed
in MPM, and then for each rule that manually inspects it. Storing this
data in the flow is a waste, and reconstructing multiple times on
demand as well.
This API allows for registering a keyword with an init and free function.
It it mean to be used an initialization time, when the keyword is
registered.
Victor Julien [Sat, 15 Oct 2016 16:15:17 +0000 (18:15 +0200)]
detect-engine: memory handling of sm_lists
For lists that are registered multiple times, like http_header and
http_cookie, making the engines owner of the lists is complicated.
Multiple engines in a sig may be pointing to the same list. To
address this the 'free' code needs to be extra careful about not
double freeing, so it takes an approach to first fill an array
of the to-free pointers before freeing them.