From: Eric Leblond Date: Sun, 2 Mar 2025 16:35:47 +0000 (+0100) Subject: doc/userguide: add dataset with json X-Git-Tag: suricata-8.0.0-rc1~66 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=9873c5d2e176fc4045de12420abe5f24ebb9cd0e;p=thirdparty%2Fsuricata.git doc/userguide: add dataset with json --- diff --git a/doc/userguide/rules/datasets.rst b/doc/userguide/rules/datasets.rst index bf6ab9b1ed..2a6ccc6217 100644 --- a/doc/userguide/rules/datasets.rst +++ b/doc/userguide/rules/datasets.rst @@ -3,8 +3,8 @@ Datasets ======== -Using the ``dataset`` and ``datarep`` keyword it is possible to match on -large amounts of data against any sticky buffer. +Using the ``dataset`` and ``datarep`` keyword it is possible +to match on large amounts of data against any sticky buffer. For example, to match against a DNS black list called ``dns-bl``:: @@ -79,7 +79,8 @@ Syntax:: dataset:,,; dataset:, \ - [, type , save , load , state , memcap , hashsize ]; + [, type , save , load , state , memcap , hashsize + , format , enrichment_key , value_key , array_key ]; type the data type: string, md5, sha256, ipv4, ip @@ -94,6 +95,20 @@ memcap maximum memory limit for the respective dataset hashsize allowed size of the hash for the respective dataset +format + the format of the file: csv, json. Defaut to csv. See + :ref:`dataset with json format ` for json + option +enrichment_key + the key to use for the enrichment of the alert event + for json format +value_key + the key to use for the value of the alert + for json format +array_key + the key to use for the array of the alert + for json format + .. note:: 'type' is mandatory and needs to be set. @@ -146,6 +161,37 @@ The rules will only match if the data is in the list and the reputation value is higher than 200. +.. _datasets_datajson: + +dataset with json +~~~~~~~~~~~~~~~~~ + +DataJSON allows matching data against a set and output data attached to the matching +value in the event. + +Syntax:: + + dataset:,,; + + dataset:, \ + [, type , load , format json, memcap , hashsize , enrichment_key \ + , value_key , array_key ]; + +Example rules could look like:: + + alert http any any -> any any (msg:"IP match"; ip.dst; dataset:isset,bad_ips, type ip, load bad_ips.json, format json, enrichment_key bad_ones, value_key ip; sid:8000001;) + +In this example, the match will occur if the destination IP is in the set and the +alert will have an ``alert.extra.bad_ones`` subobject that will contain the JSON +data associated to the value. + +If ``json_key`` is present then the data file has to contains a valid JSON object containing an array +where every elemeents have to contain a key equal to ``json_key``. +If ``array_key`` is present, Suricata will extract the corresponding subobject that has to be +a JSON array. + +See :ref:`Datajson format ` for more information. + Rule Reloads ------------ @@ -243,6 +289,28 @@ Syntax:: dataset-dump +dataset-add-json +~~~~~~~~~~~~~~~~ + +Unix Socket command to add data to a set. On success, the addition becomes +active instantly. + +Syntax:: + + dataset-add-json + +set name + Name of an already defined dataset +type + Data type: string, md5, sha256, ipv4, ip +data + Data to add in serialized form (base64 for string, hex notation for md5/sha256, string representation for ipv4/ip) + +Example adding 'google.com' to set 'myset':: + + dataset-add-json myset string Z29vZ2xlLmNvbQ== {"city":"Mountain View"} + + File formats ------------ @@ -285,13 +353,41 @@ which when piped to ``base64 -d`` reveals its value:: datarep ~~~~~~~ -The datarep format follows the dataset, expect that there are 1 more CSV +The datarep format follows the dataset, except that there are 1 more CSV field: Syntax:: , +.. _datajson_data: + +dataset with JSON enrichment +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If ``format json`` is used in the parameters of a dataset keyword, then the loaded +file has to contain a valid JSON object. + +If ``value_key``` option is present then the file has to contain a valid JSON +object containing an array where the key equal to ``value_key`` value is present. + +For example, if the file ``file.json`` is like the following example (typical of return of REST API call) :: + + { + "time": "2024-12-21", + "response": { + "threats": + [ + {"host": "toto.com", "origin": "japan"}, + {"host": "grenouille.com", "origin": "french"} + ] + } + } + +then the match to check the list of threats using datajson can be defined as :: + + http.host; dataset:isset,threats,load file.json, enrichment_key threat, value_key host, array_key response.threats; + .. _datasets_file_locations: File Locations diff --git a/doc/userguide/rules/payload-keywords.rst b/doc/userguide/rules/payload-keywords.rst index 3a4a123067..955a1f78ef 100644 --- a/doc/userguide/rules/payload-keywords.rst +++ b/doc/userguide/rules/payload-keywords.rst @@ -864,6 +864,67 @@ qualities of pcre as well. These are: .. note:: The following characters must be escaped inside the content: ``;`` ``\`` ``"`` +PCRE extraction +~~~~~~~~~~~~~~~ + +It is possible to capture groups from the regular expression and log them into the +alert events. + +There is 3 capabilities: + +* pkt: the extracted group is logged as pkt variable in ``metadata.pktvars`` +* alert: the extracted group is logged to the ``alert.extra`` subobject +* flow: the extracted group is stored in a flow variable and end up in the ``metadata.flowvars`` + +To use the feature, parameters of pcre keyword need to be updated. +After the regular pcre regex and options, a comma separated lists of variable names. +The prefix here is ``flow:``, ``pkt:`` or ``alert:`` and the names can contain special +characters now. The names map to the capturing substring expressions in order :: + + pcre:"/([a-z]+)\/[a-z]+\/(.+)\/(.+)\/changelog$/GUR, \ + flow:ua/ubuntu/repo,flow:ua/ubuntu/pkg/base, \ + flow:ua/ubuntu/pkg/version"; + +This would result in the alert event has something like :: + + "metadata": { + "flowvars": [ + {"ua/ubuntu/repo": "fr"}, + {"ua/ubuntu/pkg/base": "curl"}, + {"ua/ubuntu/pkg/version": "2.2.1"} + ] + } + +The other events on the same flow such as the ``flow`` one will +also have the flow vars. + +If this is not wanted, you can use the ``alert:`` construct to only +get the event in the alert :: + + pcre:"/([a-z]+)\/[a-z]+\/(.+)\/(.+)\/changelog$/GUR, \ + alert:ua/ubuntu/repo,alert:ua/ubuntu/pkg/base, \ + alert:ua/ubuntu/pkg/version"; + +With that syntax, the result of the extraction will appear like :: + + "alert": { + "extra": { + "ua/ubuntu/repo": "fr", + "ua/ubuntu/pkg/base": "curl", + "ua/ubuntu/pkg/version": "2.2.1" + ] + } + +A combination of the extraction scopes can be combined. + +It is also possible to extract key/value pair in the ``pkt`` scope. +One capture would be the key, the second the value. The notation is similar to the last :: + + pcre:"^/([A-Z]+) (.*)\r\n/, pkt:key,pkt:value"; + +``key`` and ``value`` are simply hardcoded names to trigger the key/value extraction. +As a consequence, they can't be used as name for the variables. + Suricata's modifiers ~~~~~~~~~~~~~~~~~~~~