From: Victor Julien Date: Thu, 25 Apr 2019 14:32:29 +0000 (+0200) Subject: doc/dataset: initial documentation X-Git-Tag: suricata-5.0.0-rc1~93 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=refs%2Fpull%2F4166%2Fhead;p=thirdparty%2Fsuricata.git doc/dataset: initial documentation --- diff --git a/doc/userguide/rules/datasets.rst b/doc/userguide/rules/datasets.rst new file mode 100644 index 0000000000..f66f610b2d --- /dev/null +++ b/doc/userguide/rules/datasets.rst @@ -0,0 +1,152 @@ +Datasets +======== + +Using the ``dataset`` and ``datarep`` keyword it is possible to match on +large amounts of data against any sticky buffer. + +For example, to match against a dns black list called ``dns-bl``:: + + dns.query; dataset:isset,dns-bl; + +These keywords are aware of transforms. So to look up a DNS query again +a MD5 black list:: + + dns.query; to_md5; dataset:isset,dns-bl; + +Global config (optional) +------------------------ + +Datasets can optionally be defined in the main config. Sets can also be +declared from the rule syntax. + +Example of sets for tracking unique values:: + + datasets: + - ua-seen: + type: string + state: ua-seen.lst + - dns-sha256-seen: + type: sha256 + state: dns-sha256-seen.lst + +Rules to go with the above:: + + http.user_agent; dataset:set,ua-seen; + dns.query; to_sha256; dataset:set,dns-sha256-seen; + + +Rule keywords +------------- + +dataset +~~~~~~~ + +Datasets are binary: something is in the set or it's not. + +Syntax:: + + dataset:,,; + + dataset:, \ + [, type , save , load , state ]; + +type + the data type: string, md5, sha256 +save + file name for saving the in memory data when Suricata exits +load + file name for load the data when Suricata starts up +state + sets both 'save' and 'load' to the same value + +datarep +~~~~~~~ + +Data Reputation allows matching data against a reputation list. + +Syntax:: + + datarep:,,, \ + [, type , load ]; + +Example rules could look like:: + + alert dns any any -> any any (dns.query; to_md5; datarep:dns_md5, >, 200, load dns_md5.rep, type md5; sid:1;) + alert dns any any -> any any (dns.query; to_sha256; datarep:dns_sha256, >, 200, load dns_sha256.rep, type sha256; sid:2;) + alert dns any any -> any any (dns.query; datarep:dns_string, >, 200, load dns_string.rep, type string; sid:3;) + +In these examples the DNS query string is checked against three different +reputation lists. A MD5 list, a SHA256 list, and a raw string (buffer) list. +The rules will only match if the data is in the list and the reputation +value is higher than 200. + + +Unix Socket +----------- + +dataset-add +~~~~~~~~~~~ + +Unix Socket command to add data to a set. On success, the addition becomes +active instantly. + +Syntax:: + + dataset-add + +set name + Name of an already defined dataset +type + Data type: string, md5, sha256 +data + Data to add in serialized form (base64 for string, hex notation for md5/sha256) + +Example adding 'google.com' to set 'myset':: + + dataset-add myset string Z29vZ2xlLmNvbQ== + + +File formats +------------ + +Datasets use a simple CSV format where data is per line in the file. + +data types +~~~~~~~~~~ + +string + in the file as base64 encoded string +md5 + in the file as hex encoded string +sha256 + in the file as hex encoded string + + +dataset +~~~~~~~ + +Datasets have a simple structure, where there is one piece of data +per line in the file. + +Syntax:: + + + +e.g. for ua-seen with type string:: + + TW96aWxsYS80LjAgKGNvbXBhdGlibGU7ICk= + +which when piped to ``base64 -d`` reveals its value:: + + Mozilla/4.0 (compatible; ) + + +datarep +~~~~~~~ + +The datarep format follows the dataset, expect that there are 1 more CSV +field: + +Syntax:: + + , diff --git a/doc/userguide/rules/index.rst b/doc/userguide/rules/index.rst index f11c9e9ae9..1e2c17ecd0 100644 --- a/doc/userguide/rules/index.rst +++ b/doc/userguide/rules/index.rst @@ -28,5 +28,6 @@ Suricata Rules xbits thresholding ip-reputation-rules + datasets rule-lua-scripting differences-from-snort