--- /dev/null
+Suricata Fast Pattern Determination Explained
+=============================================
+
+If the 'fast_pattern' keyword is explicitly set in a rule, Suricata
+will use that as the fast pattern match. The 'fast_pattern' keyword
+can only be set once per rule. If 'fast_pattern' is not set, Suricata
+automatically determines the content to use as the fast pattern match.
+
+The following explains the logic Suricata uses to automatically
+determine the fast pattern match to use.
+
+Be aware that if there are positive (i.e. non-negated) content
+matches, then negated content matches are ignored for fast pattern
+determination. Otherwise, negated content matches are considered.
+
+Suricata 1.1.x - 1.4.x
+----------------------
+
+#. The longest (in terms of character/byte length) content match is
+ used as the fast pattern match.
+
+#. If multiple content matches qualify for the longest length, the one
+ with the highest character/byte diversity score ("Pattern
+ Strength") is used as the fast pattern match. See :ref:`Appendix C
+ <fast-pattern-explained-appendix-c>` for details on the algorithm
+ used to determine Pattern Strength.
+
+#. If multiple content matches qualify for the longest length and have
+ the same highest Pattern Strength, the buffer that has the *lower
+ "list_id"* is used as the fast pattern match. See :ref:`Appendix A
+ <fast-pattern-explained-appendix-a>` for the list_id of each
+ buffers/list.
+
+#. If multiple content matches qualify for the longest length and have
+ the same highest Pattern Strength, and have the same list_id
+ (i.e. are looking in the same buffer), then the one that comes
+ first (from left-to-right) in the rule is used as the fast pattern
+ match.
+
+It is worth noting that for content matches that have the same length
+and Pattern Strength, regular 'content' matches take precedence over
+matches that use the 'http_*' buffers.
+
+Suricata 2.0.x
+--------------
+
+#. Suricata first identifies all content matches that have the highest
+ "priority" that are used in the signature. The priority is based
+ off of the buffer being matched on and generally 'http_*' buffers
+ have a higher priority (lower number is higher priority). See
+ :ref:`Appendix B <fast-pattern-explained-appendix-b>` for details
+ on which buffers have what priority.
+#. Within the content matches identified in step 1 (the highest
+ priority content matches), the longest (in terms of character/byte
+ length) content match is used as the fast pattern match.
+#. If multiple content matches have the same highest priority and
+ qualify for the longest length, the one with the highest
+ character/byte diversity score ("Pattern Strength") is used as the
+ fast pattern match. See :ref:`Appendix C
+ <fast-pattern-explained-appendix-c>` for details on the algorithm
+ used to determine Pattern Strength.
+#. If multiple content matches have the same highest priority, qualify
+ for the longest length, and the same highest Pattern Strength, the
+ buffer ("list_id") that was *registered last* is used as the fast
+ pattern match. See :ref:`Appendix B
+ <fast-pattern-explained-appendix-b>` for the registration order of
+ the different buffers/lists.
+#. If multiple content matches have the same highest priority, qualify
+ for the longest length, the same highest Pattern Strength, and have
+ the same list_id (i.e. are looking in the same buffer), then the
+ one that comes first (from left-to-right) in the rule is used as
+ the fast pattern match.
+
+It is worth noting that for content matches that have the same
+priority, length, and Pattern Strength, 'http_stat_msg',
+'http_stat_code', and 'http_method' take precedence over regular
+'content' matches.
+
+Appendices
+----------
+
+.. _fast-pattern-explained-appendix-a:
+
+Appendix A - Buffers, list_id values, and Registration Order for Suricata 1.3.4
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This should be pretty much the same for Suricata 1.1.x - 1.4.x.
+
+======= ============================== ======================== ==================
+list_id Content Modifier Keyword Buffer Name Registration Order
+======= ============================== ======================== ==================
+1 <none> (regular content match) DETECT_SM_LIST_PMATCH 1 (first)
+2 http_uri DETECT_SM_LIST_UMATCH 2
+6 http_client_body DETECT_SM_LIST_HCBDMATCH 3
+7 http_server_body DETECT_SM_LIST_HSBDMATCH 4
+8 http_header DETECT_SM_LIST_HHDMATCH 5
+9 http_raw_header DETECT_SM_LIST_HRHDMATCH 6
+10 http_method DETECT_SM_LIST_HMDMATCH 7
+11 http_cookie DETECT_SM_LIST_HCDMATCH 8
+12 http_raw_uri DETECT_SM_LIST_HRUDMATCH 9
+13 http_stat_msg DETECT_SM_LIST_HSMDMATCH 10
+14 http_stat_code DETECT_SM_LIST_HSCDMATCH 11
+15 http_user_agent DETECT_SM_LIST_HUADMATCH 12 (last)
+======= ============================== ======================== ==================
+
+Note: registration order doesn't matter when it comes to determining the fast pattern match for Suricata 1.3.4 but list_id value does.
+
+.. _fast-pattern-explained-appendix-b:
+
+Appendix B - Buffers, list_id values, Priorities, and Registration Order for Suricata 2.0.7
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This should be pretty much the same for Suricata 2.0.x.
+
+========================================== ================== ============================== ============================= =======
+Priority (lower number is higher priority) Registration Order Content Modifier Keyword Buffer Name list_id
+========================================== ================== ============================== ============================= =======
+3 11 <none> (regular content match) DETECT_SM_LIST_PMATCH 1
+3 12 http_method DETECT_SM_LIST_HMDMATCH 12
+3 13 http_stat_code DETECT_SM_LIST_HSCDMATCH 9
+3 14 http_stat_msg DETECT_SM_LIST_HSMDMATCH 8
+2 1 (first) http_client_body DETECT_SM_LIST_HCBDMATCH 4
+2 2 http_server_body DETECT_SM_LIST_HSBDMATCH 5
+2 3 http_header DETECT_SM_LIST_HHDMATCH 6
+2 4 http_raw_header DETECT_SM_LIST_HRHDMATCH 7
+2 5 http_uri DETECT_SM_LIST_UMATCH 2
+2 6 http_raw_uri DETECT_SM_LIST_HRUDMATCH 3
+2 7 http_host DETECT_SM_LIST_HHHDMATCH 10
+2 8 http_raw_host DETECT_SM_LIST_HRHHDMATCH 11
+2 9 http_cookie DETECT_SM_LIST_HCDMATCH 13
+2 10 http_user_agent DETECT_SM_LIST_HUADMATCH 14
+2 15 (last) dns_query DETECT_SM_LIST_DNSQUERY_MATCH 20
+========================================== ================== ============================== ============================= =======
+
+Note: list_id value doesn't matter when it comes to determining the
+fast pattern match for Suricata 2.0.7 but registration order does.
+
+.. _fast-pattern-explained-appendix-c:
+
+Appendix C - Pattern Strength Algorithm
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+From detect-engine-mpm.c. Basically the Pattern Strength "score"
+starts at zero and looks at each character/byte in the passed in byte
+array from left to right. If the character/byte has not been seen
+before in the array, it adds 3 to the score if it is an alpha
+character; else it adds 4 to the score if it is a printable character,
+0x00, 0x01, or 0xFF; else it adds 6 to the score. If the
+character/byte has been seen before it adds 1 to the score. The final
+score is returned.
+
+.. code-block:: c
+
+ /** \brief Predict a strength value for patterns
+ *
+ * Patterns with high character diversity score higher.
+ * Alpha chars score not so high
+ * Other printable + a few common codes a little higher
+ * Everything else highest.
+ * Longer patterns score better than short patters.
+ *
+ * \param pat pattern
+ * \param patlen length of the patternn
+ *
+ * \retval s pattern score
+ */
+ uint32_t PatternStrength(uint8_t *pat, uint16_t patlen) {
+ uint8_t a[256];
+ memset(&a, 0 ,sizeof(a));
+ uint32_t s = 0;
+ uint16_t u = 0;
+ for (u = 0; u < patlen; u++) {
+ if (a[pat[u]] == 0) {
+ if (isalpha(pat[u]))
+ s += 3;
+ else if (isprint(pat[u]) || pat[u] == 0x00 || pat[u] == 0x01 || pat[u] == 0xFF)
+ s += 4;
+ else
+ s += 6;
+ a[pat[u]] = 1;
+ } else {
+ s++;
+ }
+ }
+ return s;
+ }
Fast Pattern
============
-Just a place holder now to demontrate linking.
+.. toctree::
+
+ fast-pattern-explained
+
+Only one content of a signature will be used in the Multi Pattern
+Matcher (MPM). If there are multiple contents, then Suricata uses the
+'strongest' content. This means a combination of length, how varied a
+content is, and what buffer it is looking in. Generally, the longer
+and more varied the better. For full details on how Suricata
+determines the fast pattern match, see :doc:`fast-pattern-explained`.
+
+Sometimes a signature writer concludes he wants Suricata to use
+another content than it does by default.
+
+For instance::
+
+ User-agent: Mozilla/5.0 Badness;
+
+ content:”User-Agent|3A|”;
+ content:”Badness”; distance:0;
+
+In this example you see the first content is longer and more varied
+than the second one, so you know Suricata will use this content for
+the MPM. Because 'User-Agent:' will be a match very often, and
+'Badness' appears less often in network traffic, you can make Suricata
+use the second content by using 'fast_pattern'.
+
+::
+
+ content:”User-Agent|3A|”;
+ content:”Badness”; distance:0; fast_pattern;
+
+The keyword fast_pattern modifies the content previous to it.
+
+.. image:: fast-pattern/fast_pattern.png
+
+Fast-pattern can also be combined with all previous mentioned
+keywords, and all mentioned HTTP-modifiers.
+
+fast_pattern:only
+-----------------
+
+Sometimes a signature contains only one content. In that case it is
+not necessary Suricata will check it any further after a match has
+been found in MPM. If there is only one content, the whole signature
+matches. Suricata notices this automatically. In some signatures this
+is still indicated with 'fast_pattern:only;'. Although Suricata does
+not need fast_pattern:only, it does support it.
+
+Fast_pattern: 'chop'
+--------------------
+
+If you do not want the MPM to use the whole content, you can use
+fast_pattern 'chop'.
+
+For example::
+
+ content: “aaaaaaaaabc”; fast_pattern:8,4;
+
+This way, MPM uses only the last four characters.