appropriate inspector. The wizard is still under development; if you find you
need to tweak the defaults please let us know.
-Additional Details:
+==== Wizard patterns
+
+Wizard supports 3 kinds of patterns:
+
+ 1. Hexes
+ 2. Spells
+ 3. Curses
+
+Each kind of pattern has its own purpose and features.
+It should be noted that the types of patterns are evaluated
+exactly in the order in which they are described above.
+Thus, if some data matches a hex, it will not be
+processed by spells and curses.
+
+The depth of search for a pattern in the data can be configured using
+the `max_search_depth` option
+
+'TCP' packets form a flow, so wizard checks all data
+in the flow for a match. If no pattern matches
+and 'max_search_depth' is reached, the flow is abandoned by wizard.
+
+'UDP' packets form a "meta-flow" based on the addresses and ports of the packets.
+However, unlike TCP processing, for UDP wizard only looks at the first arriving
+packet from the meta-flow. If no pattern matches that packet or wizard's 'max_search_depth'
+is reached, the meta-flow is abandoned by wizard.
+
+==== Wizard patterns - Spells
+
+Spell is a text based pattern. The best area of usage - text protocols: http, smtp, sip, etc.
+Spells are:
+
+ * Case insensitive
+ * Whitespace sensitive
+ * Able to match by a wildcard symbol
+
+In order to match any sequence of characters in pattern, you should use
+"`*`" (glob) symbol in pattern.
+
+ Example:
+ Pattern: '220-*FTP'
+ Traffic that would match: '220- Hello world! It's a new FTP server'
+
+To escape "`*`" symbol, put "`**`" in the pattern.
+
+Spells are configured as a Lua array, each element of which can
+contain following options:
+
+ * 'service' - name of the service that would be assigned
+ * 'proto' - protocol to scan
+ * 'client_first' - indicator of which end initiates data transfer
+ * 'to_server' - list of text patterns to search in the data sent to the client
+ * 'to_client' - list of text patterns to search in the data sent to the server
+
+ Example of a spell definition in Lua:
+ {
+ service = 'smtp',
+ proto = 'tcp',
+ client_first = true,
+ to_server = { 'HELO', 'EHLO' },
+ to_client = { '220*SMTP', '220*MAIL' }
+ }
+
+==== Wizard patterns - Hexes
+
+Hexes can be used to match binary protocols: dnp3, http2, ssl, etc.
+Hexes use hexadecimal representation of the data for pattern matching.
+
+Wildcard in hex pattern is a placeholder for exactly one occurrence
+of any hexadecimal digit and denoted by the symbol "`?`".
+
+ Example:
+ Pattern: '|05 ?4|'
+ Traffic that would match: '|05 84|'
+
+Hexes are configured in the same way as spells and have an identical set of options.
+
+ Example of a hex definition in Lua:
+ {
+ service = 'dnp3',
+ proto = 'tcp',
+ client_first = true,
+ to_server = { '|05 64|' },
+ to_client = { '|05 64|' }
+ }
+
+==== Wizard patterns - Curses
+
+Curses are internal algorithms of service identification.
+They are implemented as state machines in C++ code
+and can have their own unique state information stored on the flow.
+
+A list of available services can be obtained using `snort --help-config wizard | grep curses`.
+
+ A configuration which enables some curses:
+ curses = {'dce_udp', 'dce_tcp', 'dce_smb', 'sslv2'}
+
+==== Additional Details:
+
+* Note that usually more specific patterns have higher precedence.
+
+ For example:
+ The following spells against 'foobar' payload. The 3rd spell matches.
+ { service = 'first', to_server = { 'foo' } },
+ { service = 'second', to_server = { 'bar' } }
+ { service = 'third', to_server = { 'foobar' } }
* If the wizard and one or more service inspectors are configured w/o
explicitly configuring the binder, default bindings will be generated which
the service inspector or appId. The goal is to get the most likely service
inspector engaged as quickly as possible.
+The wizard is detached from the flow upon finding a match or finding
+that there is no possible match.
+
+Hexes and spells differ in the following aspects:
+
+ * spells allow wildcards matching any number of consecutive characters
+ whereas hexes allow a single wild char.
+ * spells are case insensitive whereas hexes are case sensitive.
+ * spells automatically skip leading whitespace (at very start of flow).
+
+To match "`*`" symbol in traffic, put "`**`" in a spell.
+
+ For example: traffic "* OK" will match the pattern "** OK".
+
+A series of asterisks is matched from left to right. '***' is seen as "*<glob>".
+
+==== Concepts
+
+ * `MagicPage` - leaf of a trie. Represents a symbol of a pattern.
+ * `MagicBook` - trie itself. Represents a set of patterns for the wizard instance.
+ ** `SpellBook` - `MagicBook` implementation for spells.
+ ** `HexBook` - `MagicBook` implementation for hexes.
+ * `MagicSplitter` - object related to a stream. Applies wizard logic to a stream.
+ * `Wand` - contains state of wizard patterns for a stream.
+ * `CurseDetails` - settings of a curse. Contains identifiers and algorithm.
+ * `CurseTracker` - state of a curse.
+ * `CurseBook` - contains all configured curses.
+ * `CurseServiceTracker` - instance of a curse. Contains settings and state.
+
+==== MagicSplitter
+
For TCP, the wizard uses a stream splitter to examine the in order data as
it becomes available. If the splitter finds a match, it sets the service
-on the flow which will result in a reevaluation of the bindings. If a
+on the flow, which will result in a reevaluation of the bindings. If a
service inspector is bound, its splitter is activated and the stream is
rewound to the start.
-The wizard is deactivated from the flow upon finding a match or finding
-that there is no possible match.
+Each flow contains two `MagicSplitter` objects: client-to-server and server-to-client.
+Each `MagicSplitter` contains `Wand` that stores pointers unique for the flow:
+
+ 1. MagicPage of Hex
+ 2. MagicPage of Spell
+ 3. Vector of all curses
+
+Where 1 and 2 - point to the current page in pattern.
+
+==== Spell matching algorithm
+
+The spell matching algorithm is defined in `SpellBook::find_spell()` method.
+In general, `MagicPage::next` array is an alphabet (ASCII table),
+each element of which can exist or be absent. Thus, if an element exists
+in the position of a certain symbol, it means that there is a pattern with
+such a sequence of symbols.
+
+ Example:
+ User configured only one pattern: "ABC"
+ MagicPage(root)::next - all elements beside (int)A is nullptr.
+ MagicPage(A)::next - all elements beside (int)B is nullptr.
+ MagicPage(B)::next - all elements beside (int)C is nullptr.
+
+Wizard iterates over the data from begin to end, checking at each iteration
+if there is a transition from the current character of the pattern to the
+next character of the data.
-Hexes, which support binary protocol matching, and spells, which support
-text protocol matching, are similar but deliberately different:
+`MagicPage::any` reflects a glob (wildcard). If wizard transitioned to a glob of the pattern,
+a loop is started, in which wizard is trying to match the pattern from the current symbol of the
+data. If it failed to match the pattern from the current symbol of the data, it moves
+to the next symbol and tries again, and so either until it matches the pattern or the
+data runs out.
-* spells allow wild cards matching any number of consecutive characters
- whereas hexes allow a single wild char.
+`MagicPage::value` is not empty only in those positions that are the ends of some pattern.
+Thus, if, after a complete pass through the data, the wizard have reached a position in which this
+field is not empty, means that it has matched the pattern.
-* spells are case insensitive whereas hexes are case sensitive.
+It should be mentioned that the matching of spells is case-independent, this is
+implemented by converting each character to an uppercase.
-* spells automatically skip leading whitespace (at very start of flow).
+Due to the fact that we want to be able to match patterns in data split into
+several packets, wizard saves the position of the glob into the `SpellBook::glob`,
+which is then saved to the `MagicSplitter::bookmark` local for the flow.
+
+==== Hex matching algorithm
+
+The algorithm for matching hexes is defined in `HexBook::find_spell()` and
+identical to the algorithm for spells, but lacks:
+
+ 1) converting to an uppercase, since hexes work with raw data;
+ 2) loops for working with the glob, since glob in hexes replaces exactly one symbol;
+ 3) saving the position of the glob between packets.
+
+==== TCP traffic processing
+
+Execution starts from the `MagicSplitter::scan()`.
+
+Since we want to be able to match patterns between packets in a stream, wizard need to
+save the state of the pattern at the end of the processing of a particular packet.
+The state of the pattern is saved in the `MagicSplitter::wand`. However, for spells,
+it needs to keep the presence of a glob between packages. This is implemented with
+`MagicSplitter::bookmark`.
+
+Spells, hexes and curses are called inside the `Wizard::cast_spell()`.
+There wizard determines the search depth and sequentially calls the processing methods.
+
+If wizard matched the pattern in the `Wizard::cast_spell()`, it increments `tcp_hits`.
+If it didn't, then it checks whether it reached the limit of `max_search_depth`.
+If wizard has reached the limit of `max_search_depth` and has't matched a pattern,
+then it nullifies `Wand::spell` and `Wand::hex`, thus further in `Wizard::finished()` it'll
+know that this flow can be abandoned and raise `tcp_misses` by 1.
+
+==== UDP traffic processing
+
+Way of processing UDP is similar to TCP but has some differences:
+
+ 1. Instead `MagicSplitter::scan()`, processing starts from `Wizard::eval()`;
+ 2. Wizard processes only the first packet of UDP "meta-flow", so for
+ every packet amount of previously processed bytes sets at 0;
+ 3. There isn't any bookmark - UDP doesn't support wildcard over several packets.
+ 4. The wizard don't need to check `Wizard::finished()`, because it processes only the
+ first packet of UDP "meta-flow". So, if it hasn't matched anything in
+ `Wizard::cast_spell()`, it increments `udp_misses` and unbinds itself from the flow.
+
+==== Additional info
+
+Every flow gets a context (in `MagicSplitter`), where wizard stores flow's processing state.
+Each flow is processed independently from others.
+
+Currently wizard cannot roll back on the pattern, so if it reaches a certain symbol
+of the pattern, it cannot go back. In some cases this will lead to the fact that
+the pattern that could be matched will not be matched.
+
+ For example:
+ Patterns: "foobar", "foo*"
+ Content: "foobaz"
+ Unfortunately, none of the available patterns will match in such case.
+ This is due to the fact that the symbols have a higher priority than
+ the glob. So from the MagicPage(O) wizard will transit to the MagicPage(B)
+ of the "foobar" pattern and will not process glob. Further, in MagicPage(А)::next[]
+ it will not find MagicPage by the symbol "z" and will consider the pattern unmatched.
Binary protocols are difficult to match with just a short stream prefix.
For example suppose one has the pattern "0x12 ?" and another has "? 0x34".
Encapsulating everything in the wizard allows the patterns to be easily
tweaked as well.
-The current implementation of the magic is very straightforward. Due to
-the limited number of patterns, space is not a concern and each state has
-256 byte array of pointers to the next.
-
Curses are presently used for binary protocols that require more than pattern
matching. They use internal algorithms to identify services,
implemented with custom FSMs.