From 261f7bc4d21f7cb3e2e4e10b32a91e46da68c3f2 Mon Sep 17 00:00:00 2001 From: "Mike Stepanek (mstepane)" Date: Wed, 17 Nov 2021 21:15:17 +0000 Subject: [PATCH] Pull request #3144: doc: update wizard's information Merge in SNORT/snort3 from ~YVELYKOZ/snort3:wizard_testing to master Squashed commit of the following: commit 4465a1347f1ec17336c5751f111d6fe87f7df3c9 Author: Yehor Velykozhon Date: Tue Nov 2 16:24:30 2021 +0200 doc: update wizard documentation --- doc/user/wizard.txt | 106 ++++++++++++++- src/service_inspectors/wizard/dev_notes.txt | 142 ++++++++++++++++++-- 2 files changed, 234 insertions(+), 14 deletions(-) diff --git a/doc/user/wizard.txt b/doc/user/wizard.txt index 4702e32ed..92698ff63 100644 --- a/doc/user/wizard.txt +++ b/doc/user/wizard.txt @@ -6,7 +6,111 @@ service bindings are reevaluated so the session can be handed off to the appropriate inspector. The wizard is still under development; if you find you need to tweak the defaults please let us know. -Additional Details: +==== Wizard patterns + +Wizard supports 3 kinds of patterns: + + 1. Hexes + 2. Spells + 3. Curses + +Each kind of pattern has its own purpose and features. +It should be noted that the types of patterns are evaluated +exactly in the order in which they are described above. +Thus, if some data matches a hex, it will not be +processed by spells and curses. + +The depth of search for a pattern in the data can be configured using +the `max_search_depth` option + +'TCP' packets form a flow, so wizard checks all data +in the flow for a match. If no pattern matches +and 'max_search_depth' is reached, the flow is abandoned by wizard. + +'UDP' packets form a "meta-flow" based on the addresses and ports of the packets. +However, unlike TCP processing, for UDP wizard only looks at the first arriving +packet from the meta-flow. If no pattern matches that packet or wizard's 'max_search_depth' +is reached, the meta-flow is abandoned by wizard. + +==== Wizard patterns - Spells + +Spell is a text based pattern. The best area of usage - text protocols: http, smtp, sip, etc. +Spells are: + + * Case insensitive + * Whitespace sensitive + * Able to match by a wildcard symbol + +In order to match any sequence of characters in pattern, you should use +"`*`" (glob) symbol in pattern. + + Example: + Pattern: '220-*FTP' + Traffic that would match: '220- Hello world! It's a new FTP server' + +To escape "`*`" symbol, put "`**`" in the pattern. + +Spells are configured as a Lua array, each element of which can +contain following options: + + * 'service' - name of the service that would be assigned + * 'proto' - protocol to scan + * 'client_first' - indicator of which end initiates data transfer + * 'to_server' - list of text patterns to search in the data sent to the client + * 'to_client' - list of text patterns to search in the data sent to the server + + Example of a spell definition in Lua: + { + service = 'smtp', + proto = 'tcp', + client_first = true, + to_server = { 'HELO', 'EHLO' }, + to_client = { '220*SMTP', '220*MAIL' } + } + +==== Wizard patterns - Hexes + +Hexes can be used to match binary protocols: dnp3, http2, ssl, etc. +Hexes use hexadecimal representation of the data for pattern matching. + +Wildcard in hex pattern is a placeholder for exactly one occurrence +of any hexadecimal digit and denoted by the symbol "`?`". + + Example: + Pattern: '|05 ?4|' + Traffic that would match: '|05 84|' + +Hexes are configured in the same way as spells and have an identical set of options. + + Example of a hex definition in Lua: + { + service = 'dnp3', + proto = 'tcp', + client_first = true, + to_server = { '|05 64|' }, + to_client = { '|05 64|' } + } + +==== Wizard patterns - Curses + +Curses are internal algorithms of service identification. +They are implemented as state machines in C++ code +and can have their own unique state information stored on the flow. + +A list of available services can be obtained using `snort --help-config wizard | grep curses`. + + A configuration which enables some curses: + curses = {'dce_udp', 'dce_tcp', 'dce_smb', 'sslv2'} + +==== Additional Details: + +* Note that usually more specific patterns have higher precedence. + + For example: + The following spells against 'foobar' payload. The 3rd spell matches. + { service = 'first', to_server = { 'foo' } }, + { service = 'second', to_server = { 'bar' } } + { service = 'third', to_server = { 'foobar' } } * If the wizard and one or more service inspectors are configured w/o explicitly configuring the binder, default bindings will be generated which diff --git a/src/service_inspectors/wizard/dev_notes.txt b/src/service_inspectors/wizard/dev_notes.txt index f46a21c2c..128ab4867 100644 --- a/src/service_inspectors/wizard/dev_notes.txt +++ b/src/service_inspectors/wizard/dev_notes.txt @@ -3,24 +3,144 @@ on a flow. It does not determine the service with certainty; that is the job of the service inspector or appId. The goal is to get the most likely service inspector engaged as quickly as possible. +The wizard is detached from the flow upon finding a match or finding +that there is no possible match. + +Hexes and spells differ in the following aspects: + + * spells allow wildcards matching any number of consecutive characters + whereas hexes allow a single wild char. + * spells are case insensitive whereas hexes are case sensitive. + * spells automatically skip leading whitespace (at very start of flow). + +To match "`*`" symbol in traffic, put "`**`" in a spell. + + For example: traffic "* OK" will match the pattern "** OK". + +A series of asterisks is matched from left to right. '***' is seen as "*". + +==== Concepts + + * `MagicPage` - leaf of a trie. Represents a symbol of a pattern. + * `MagicBook` - trie itself. Represents a set of patterns for the wizard instance. + ** `SpellBook` - `MagicBook` implementation for spells. + ** `HexBook` - `MagicBook` implementation for hexes. + * `MagicSplitter` - object related to a stream. Applies wizard logic to a stream. + * `Wand` - contains state of wizard patterns for a stream. + * `CurseDetails` - settings of a curse. Contains identifiers and algorithm. + * `CurseTracker` - state of a curse. + * `CurseBook` - contains all configured curses. + * `CurseServiceTracker` - instance of a curse. Contains settings and state. + +==== MagicSplitter + For TCP, the wizard uses a stream splitter to examine the in order data as it becomes available. If the splitter finds a match, it sets the service -on the flow which will result in a reevaluation of the bindings. If a +on the flow, which will result in a reevaluation of the bindings. If a service inspector is bound, its splitter is activated and the stream is rewound to the start. -The wizard is deactivated from the flow upon finding a match or finding -that there is no possible match. +Each flow contains two `MagicSplitter` objects: client-to-server and server-to-client. +Each `MagicSplitter` contains `Wand` that stores pointers unique for the flow: + + 1. MagicPage of Hex + 2. MagicPage of Spell + 3. Vector of all curses + +Where 1 and 2 - point to the current page in pattern. + +==== Spell matching algorithm + +The spell matching algorithm is defined in `SpellBook::find_spell()` method. +In general, `MagicPage::next` array is an alphabet (ASCII table), +each element of which can exist or be absent. Thus, if an element exists +in the position of a certain symbol, it means that there is a pattern with +such a sequence of symbols. + + Example: + User configured only one pattern: "ABC" + MagicPage(root)::next - all elements beside (int)A is nullptr. + MagicPage(A)::next - all elements beside (int)B is nullptr. + MagicPage(B)::next - all elements beside (int)C is nullptr. + +Wizard iterates over the data from begin to end, checking at each iteration +if there is a transition from the current character of the pattern to the +next character of the data. -Hexes, which support binary protocol matching, and spells, which support -text protocol matching, are similar but deliberately different: +`MagicPage::any` reflects a glob (wildcard). If wizard transitioned to a glob of the pattern, +a loop is started, in which wizard is trying to match the pattern from the current symbol of the +data. If it failed to match the pattern from the current symbol of the data, it moves +to the next symbol and tries again, and so either until it matches the pattern or the +data runs out. -* spells allow wild cards matching any number of consecutive characters - whereas hexes allow a single wild char. +`MagicPage::value` is not empty only in those positions that are the ends of some pattern. +Thus, if, after a complete pass through the data, the wizard have reached a position in which this +field is not empty, means that it has matched the pattern. -* spells are case insensitive whereas hexes are case sensitive. +It should be mentioned that the matching of spells is case-independent, this is +implemented by converting each character to an uppercase. -* spells automatically skip leading whitespace (at very start of flow). +Due to the fact that we want to be able to match patterns in data split into +several packets, wizard saves the position of the glob into the `SpellBook::glob`, +which is then saved to the `MagicSplitter::bookmark` local for the flow. + +==== Hex matching algorithm + +The algorithm for matching hexes is defined in `HexBook::find_spell()` and +identical to the algorithm for spells, but lacks: + + 1) converting to an uppercase, since hexes work with raw data; + 2) loops for working with the glob, since glob in hexes replaces exactly one symbol; + 3) saving the position of the glob between packets. + +==== TCP traffic processing + +Execution starts from the `MagicSplitter::scan()`. + +Since we want to be able to match patterns between packets in a stream, wizard need to +save the state of the pattern at the end of the processing of a particular packet. +The state of the pattern is saved in the `MagicSplitter::wand`. However, for spells, +it needs to keep the presence of a glob between packages. This is implemented with +`MagicSplitter::bookmark`. + +Spells, hexes and curses are called inside the `Wizard::cast_spell()`. +There wizard determines the search depth and sequentially calls the processing methods. + +If wizard matched the pattern in the `Wizard::cast_spell()`, it increments `tcp_hits`. +If it didn't, then it checks whether it reached the limit of `max_search_depth`. +If wizard has reached the limit of `max_search_depth` and has't matched a pattern, +then it nullifies `Wand::spell` and `Wand::hex`, thus further in `Wizard::finished()` it'll +know that this flow can be abandoned and raise `tcp_misses` by 1. + +==== UDP traffic processing + +Way of processing UDP is similar to TCP but has some differences: + + 1. Instead `MagicSplitter::scan()`, processing starts from `Wizard::eval()`; + 2. Wizard processes only the first packet of UDP "meta-flow", so for + every packet amount of previously processed bytes sets at 0; + 3. There isn't any bookmark - UDP doesn't support wildcard over several packets. + 4. The wizard don't need to check `Wizard::finished()`, because it processes only the + first packet of UDP "meta-flow". So, if it hasn't matched anything in + `Wizard::cast_spell()`, it increments `udp_misses` and unbinds itself from the flow. + +==== Additional info + +Every flow gets a context (in `MagicSplitter`), where wizard stores flow's processing state. +Each flow is processed independently from others. + +Currently wizard cannot roll back on the pattern, so if it reaches a certain symbol +of the pattern, it cannot go back. In some cases this will lead to the fact that +the pattern that could be matched will not be matched. + + For example: + Patterns: "foobar", "foo*" + Content: "foobaz" + Unfortunately, none of the available patterns will match in such case. + This is due to the fact that the symbols have a higher priority than + the glob. So from the MagicPage(O) wizard will transit to the MagicPage(B) + of the "foobar" pattern and will not process glob. Further, in MagicPage(А)::next[] + it will not find MagicPage by the symbol "z" and will consider the pattern unmatched. Binary protocols are difficult to match with just a short stream prefix. For example suppose one has the pattern "0x12 ?" and another has "? 0x34". @@ -33,10 +153,6 @@ and different implementation and different pattern logic and syntax. Encapsulating everything in the wizard allows the patterns to be easily tweaked as well. -The current implementation of the magic is very straightforward. Due to -the limited number of patterns, space is not a concern and each state has -256 byte array of pointers to the next. - Curses are presently used for binary protocols that require more than pattern matching. They use internal algorithms to identify services, implemented with custom FSMs. -- 2.47.3