From: Jason Ish Date: Wed, 2 Dec 2015 13:12:02 +0000 (-0600) Subject: doc: fast-pattern X-Git-Tag: suricata-3.2beta1~270 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=33e96c508768b6d4b7c519d1ed9925c23efe4d96;p=thirdparty%2Fsuricata.git doc: fast-pattern --- diff --git a/doc/sphinx/fast-pattern-explained.rst b/doc/sphinx/fast-pattern-explained.rst new file mode 100644 index 0000000000..86654e5954 --- /dev/null +++ b/doc/sphinx/fast-pattern-explained.rst @@ -0,0 +1,186 @@ +Suricata Fast Pattern Determination Explained +============================================= + +If the 'fast_pattern' keyword is explicitly set in a rule, Suricata +will use that as the fast pattern match. The 'fast_pattern' keyword +can only be set once per rule. If 'fast_pattern' is not set, Suricata +automatically determines the content to use as the fast pattern match. + +The following explains the logic Suricata uses to automatically +determine the fast pattern match to use. + +Be aware that if there are positive (i.e. non-negated) content +matches, then negated content matches are ignored for fast pattern +determination. Otherwise, negated content matches are considered. + +Suricata 1.1.x - 1.4.x +---------------------- + +#. The longest (in terms of character/byte length) content match is + used as the fast pattern match. + +#. If multiple content matches qualify for the longest length, the one + with the highest character/byte diversity score ("Pattern + Strength") is used as the fast pattern match. See :ref:`Appendix C + ` for details on the algorithm + used to determine Pattern Strength. + +#. If multiple content matches qualify for the longest length and have + the same highest Pattern Strength, the buffer that has the *lower + "list_id"* is used as the fast pattern match. See :ref:`Appendix A + ` for the list_id of each + buffers/list. + +#. If multiple content matches qualify for the longest length and have + the same highest Pattern Strength, and have the same list_id + (i.e. are looking in the same buffer), then the one that comes + first (from left-to-right) in the rule is used as the fast pattern + match. + +It is worth noting that for content matches that have the same length +and Pattern Strength, regular 'content' matches take precedence over +matches that use the 'http_*' buffers. + +Suricata 2.0.x +-------------- + +#. Suricata first identifies all content matches that have the highest + "priority" that are used in the signature. The priority is based + off of the buffer being matched on and generally 'http_*' buffers + have a higher priority (lower number is higher priority). See + :ref:`Appendix B ` for details + on which buffers have what priority. +#. Within the content matches identified in step 1 (the highest + priority content matches), the longest (in terms of character/byte + length) content match is used as the fast pattern match. +#. If multiple content matches have the same highest priority and + qualify for the longest length, the one with the highest + character/byte diversity score ("Pattern Strength") is used as the + fast pattern match. See :ref:`Appendix C + ` for details on the algorithm + used to determine Pattern Strength. +#. If multiple content matches have the same highest priority, qualify + for the longest length, and the same highest Pattern Strength, the + buffer ("list_id") that was *registered last* is used as the fast + pattern match. See :ref:`Appendix B + ` for the registration order of + the different buffers/lists. +#. If multiple content matches have the same highest priority, qualify + for the longest length, the same highest Pattern Strength, and have + the same list_id (i.e. are looking in the same buffer), then the + one that comes first (from left-to-right) in the rule is used as + the fast pattern match. + +It is worth noting that for content matches that have the same +priority, length, and Pattern Strength, 'http_stat_msg', +'http_stat_code', and 'http_method' take precedence over regular +'content' matches. + +Appendices +---------- + +.. _fast-pattern-explained-appendix-a: + +Appendix A - Buffers, list_id values, and Registration Order for Suricata 1.3.4 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This should be pretty much the same for Suricata 1.1.x - 1.4.x. + +======= ============================== ======================== ================== +list_id Content Modifier Keyword Buffer Name Registration Order +======= ============================== ======================== ================== +1 (regular content match) DETECT_SM_LIST_PMATCH 1 (first) +2 http_uri DETECT_SM_LIST_UMATCH 2 +6 http_client_body DETECT_SM_LIST_HCBDMATCH 3 +7 http_server_body DETECT_SM_LIST_HSBDMATCH 4 +8 http_header DETECT_SM_LIST_HHDMATCH 5 +9 http_raw_header DETECT_SM_LIST_HRHDMATCH 6 +10 http_method DETECT_SM_LIST_HMDMATCH 7 +11 http_cookie DETECT_SM_LIST_HCDMATCH 8 +12 http_raw_uri DETECT_SM_LIST_HRUDMATCH 9 +13 http_stat_msg DETECT_SM_LIST_HSMDMATCH 10 +14 http_stat_code DETECT_SM_LIST_HSCDMATCH 11 +15 http_user_agent DETECT_SM_LIST_HUADMATCH 12 (last) +======= ============================== ======================== ================== + +Note: registration order doesn't matter when it comes to determining the fast pattern match for Suricata 1.3.4 but list_id value does. + +.. _fast-pattern-explained-appendix-b: + +Appendix B - Buffers, list_id values, Priorities, and Registration Order for Suricata 2.0.7 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This should be pretty much the same for Suricata 2.0.x. + +========================================== ================== ============================== ============================= ======= +Priority (lower number is higher priority) Registration Order Content Modifier Keyword Buffer Name list_id +========================================== ================== ============================== ============================= ======= +3 11 (regular content match) DETECT_SM_LIST_PMATCH 1 +3 12 http_method DETECT_SM_LIST_HMDMATCH 12 +3 13 http_stat_code DETECT_SM_LIST_HSCDMATCH 9 +3 14 http_stat_msg DETECT_SM_LIST_HSMDMATCH 8 +2 1 (first) http_client_body DETECT_SM_LIST_HCBDMATCH 4 +2 2 http_server_body DETECT_SM_LIST_HSBDMATCH 5 +2 3 http_header DETECT_SM_LIST_HHDMATCH 6 +2 4 http_raw_header DETECT_SM_LIST_HRHDMATCH 7 +2 5 http_uri DETECT_SM_LIST_UMATCH 2 +2 6 http_raw_uri DETECT_SM_LIST_HRUDMATCH 3 +2 7 http_host DETECT_SM_LIST_HHHDMATCH 10 +2 8 http_raw_host DETECT_SM_LIST_HRHHDMATCH 11 +2 9 http_cookie DETECT_SM_LIST_HCDMATCH 13 +2 10 http_user_agent DETECT_SM_LIST_HUADMATCH 14 +2 15 (last) dns_query DETECT_SM_LIST_DNSQUERY_MATCH 20 +========================================== ================== ============================== ============================= ======= + +Note: list_id value doesn't matter when it comes to determining the +fast pattern match for Suricata 2.0.7 but registration order does. + +.. _fast-pattern-explained-appendix-c: + +Appendix C - Pattern Strength Algorithm +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +From detect-engine-mpm.c. Basically the Pattern Strength "score" +starts at zero and looks at each character/byte in the passed in byte +array from left to right. If the character/byte has not been seen +before in the array, it adds 3 to the score if it is an alpha +character; else it adds 4 to the score if it is a printable character, +0x00, 0x01, or 0xFF; else it adds 6 to the score. If the +character/byte has been seen before it adds 1 to the score. The final +score is returned. + +.. code-block:: c + + /** \brief Predict a strength value for patterns + * + * Patterns with high character diversity score higher. + * Alpha chars score not so high + * Other printable + a few common codes a little higher + * Everything else highest. + * Longer patterns score better than short patters. + * + * \param pat pattern + * \param patlen length of the patternn + * + * \retval s pattern score + */ + uint32_t PatternStrength(uint8_t *pat, uint16_t patlen) { + uint8_t a[256]; + memset(&a, 0 ,sizeof(a)); + uint32_t s = 0; + uint16_t u = 0; + for (u = 0; u < patlen; u++) { + if (a[pat[u]] == 0) { + if (isalpha(pat[u])) + s += 3; + else if (isprint(pat[u]) || pat[u] == 0x00 || pat[u] == 0x01 || pat[u] == 0xFF) + s += 4; + else + s += 6; + a[pat[u]] = 1; + } else { + s++; + } + } + return s; + } diff --git a/doc/sphinx/fast-pattern.rst b/doc/sphinx/fast-pattern.rst index 17be899879..c32b79844f 100644 --- a/doc/sphinx/fast-pattern.rst +++ b/doc/sphinx/fast-pattern.rst @@ -1,4 +1,63 @@ Fast Pattern ============ -Just a place holder now to demontrate linking. +.. toctree:: + + fast-pattern-explained + +Only one content of a signature will be used in the Multi Pattern +Matcher (MPM). If there are multiple contents, then Suricata uses the +'strongest' content. This means a combination of length, how varied a +content is, and what buffer it is looking in. Generally, the longer +and more varied the better. For full details on how Suricata +determines the fast pattern match, see :doc:`fast-pattern-explained`. + +Sometimes a signature writer concludes he wants Suricata to use +another content than it does by default. + +For instance:: + + User-agent: Mozilla/5.0 Badness; + + content:”User-Agent|3A|”; + content:”Badness”; distance:0; + +In this example you see the first content is longer and more varied +than the second one, so you know Suricata will use this content for +the MPM. Because 'User-Agent:' will be a match very often, and +'Badness' appears less often in network traffic, you can make Suricata +use the second content by using 'fast_pattern'. + +:: + + content:”User-Agent|3A|”; + content:”Badness”; distance:0; fast_pattern; + +The keyword fast_pattern modifies the content previous to it. + +.. image:: fast-pattern/fast_pattern.png + +Fast-pattern can also be combined with all previous mentioned +keywords, and all mentioned HTTP-modifiers. + +fast_pattern:only +----------------- + +Sometimes a signature contains only one content. In that case it is +not necessary Suricata will check it any further after a match has +been found in MPM. If there is only one content, the whole signature +matches. Suricata notices this automatically. In some signatures this +is still indicated with 'fast_pattern:only;'. Although Suricata does +not need fast_pattern:only, it does support it. + +Fast_pattern: 'chop' +-------------------- + +If you do not want the MPM to use the whole content, you can use +fast_pattern 'chop'. + +For example:: + + content: “aaaaaaaaabc”; fast_pattern:8,4; + +This way, MPM uses only the last four characters. diff --git a/doc/sphinx/fast-pattern/fast_pattern.png b/doc/sphinx/fast-pattern/fast_pattern.png new file mode 100644 index 0000000000..97163a50ab Binary files /dev/null and b/doc/sphinx/fast-pattern/fast_pattern.png differ