]> git.ipfire.org Git - people/ms/suricata.git/blame - doc/userguide/file-extraction/file-extraction.rst
doc: Improve grammar, spelling and clarifications
[people/ms/suricata.git] / doc / userguide / file-extraction / file-extraction.rst
CommitLineData
6b8320d1
JI
1.. _File Extraction:
2
253886b9
AH
3File Extraction
4===============
5
253886b9
AH
6Architecture
7~~~~~~~~~~~~
8
ec07f587 9The file extraction code works on top of selected protocol parsers (see supported protocols below). The application layer parsers run on top of the stream reassembly engine and the UDP flow tracking.
253886b9 10
1edd9d19
VJ
11In case of HTTP, the parser takes care of dechunking and unzipping the request and/or response data if necessary.
12
13This means that settings in the stream engine, reassembly engine and the application layer parsers all affect the workings of the file extraction.
253886b9 14
06f41f60 15The rule language controls which files are extracted and stored on disk.
253886b9 16
24f74555
EL
17Supported protocols are:
18
19- HTTP
20- SMTP
24f74555 21- FTP
1edd9d19
VJ
22- NFS
23- SMB
253886b9
AH
24
25Settings
26~~~~~~~~
27
28*stream.checksum_validation* controls whether or not the stream engine rejects packets with invalid checksums. A good idea normally, but the network interface performs checksum offloading a lot of packets may seem to be broken. This setting is enabled by default, and can be disabled by setting to "no". Note that the checksum handling can be controlled per interface, see "checksum_checks" in example configuration.
29
496306e6 30*file-store.stream-depth* controls how far into a stream reassembly is done. Beyond this value no reassembly will be done. This means that after this value the HTTP session will no longer be tracked. By default a setting of 1 Megabyte is used. 0 sets it to unlimited. If set to no, it is disabled and stream.reassembly.depth is considered. Non-zero values must be greater than ``stream.stream-depth`` to be used.
253886b9 31
06f41f60 32*libhtp.default-config.request-body-limit* / *libhtp.server-config.<config>.request-body-limit* controls how much of the HTTP request body is tracked for inspection by the `http_client_body` keyword, but also used to limit file inspection. A value of 0 means unlimited.
253886b9
AH
33
34*libhtp.default-config.response-body-limit* / *libhtp.server-config.<config>.response-body-limit* is like the request body limit, only it applies to the HTTP response body.
35
253886b9
AH
36
37Output
38~~~~~~
39
5420c0ab
JI
40File-Store and Eve Fileinfo
41---------------------------
253886b9 42
06f41f60
JL
43There are two output modules for logging information about extracted files.
44The first is ``eve.files`` which is an ``eve`` sub-logger
5420c0ab
JI
45that logs ``fileinfo`` records. These ``fileinfo`` records provide
46metadata about the file, but not the actual file contents.
253886b9 47
5420c0ab
JI
48This must be enabled in the ``eve`` output::
49
50 - outputs:
51 - eve-log:
ec07f587 52 types:
5420c0ab
JI
53 - files:
54 force-magic: no
55 force-hash: [md5,sha256]
56
57See :ref:`suricata-yaml-outputs-eve` for more details on working
58with the `eve` output.
59
60The other output module, ``file-store`` stores the actual files to
61disk.
62
ec07f587 63The ``file-store`` module uses its own log directory (default: `filestore` in
5420c0ab
JI
64the default logging directory) and logs files using the SHA256 of the
65contents as the filename. Each file is then placed in a directory
66named `00` to `ff` where the directory shares the first 2 characters
67of the filename. For example, if the SHA256 hex string of an extracted
68file starts with "f9bc6d..." the file we be placed in the directory
69`filestore/f9`.
70
972be0a5
GL
71The size of a file that can be stored depends on ``file-store.stream-depth``,
72if this value is reached a file can be truncated and might not be stored completely.
73If not enabled, ``stream.reassembly.depth`` will be considered.
74
ec07f587
JL
75Setting ``file-store.stream-depth`` to 0 permits store of the entire file;
76here, 0 means "unlimited."
972be0a5
GL
77
78``file-store.stream-depth`` will always override ``stream.reassembly.depth``
496306e6
JL
79when filestore keyword is used. However, it is not possible to set ``file-store.stream-depth``
80to a value less than ``stream.reassembly.depth``. Values less than this amount are ignored
81and a warning message will be displayed.
972be0a5
GL
82
83A protocol parser, like modbus, could permit to set a different
84store-depth value and use it rather than ``file-store.stream-depth``.
85
5420c0ab
JI
86Using the SHA256 for file names allows for automatic de-duplication of
87extracted files. However, the timestamp of a pre-existing file will be
88updated if the same files is extracted again, similar to the `touch`
89command.
90
91Optionally a ``fileinfo`` record can be written to its own file
92sharing the same SHA256 as the file it references. To handle recording
93the metadata of each occurrence of an extracted file, these filenames
94include some extra fields to ensure uniqueness. Currently the format
95is::
96
97 <SHA256>.<SECONDS>.<ID>.json
98
99where ``<SECONDS>`` is the seconds from the packet that triggered the
100stored file to be closed and ``<ID>`` is a unique ID for the runtime
101of the Suricata instance. These values should not be depended on, and
102are simply used to ensure uniqueness.
253886b9 103
b116a56a 104These ``fileinfo`` records are identical to the ``fileinfo`` records
5420c0ab
JI
105logged to the ``eve`` output.
106
107See :ref:`suricata-yaml-file-store` for more information on
108configuring the file-store output.
109
6b8320d1 110.. note:: This section documents version 2 of the ``file-store``. Version 1 of the file-store has been removed as of Suricata version 6.
253886b9 111
253886b9
AH
112Rules
113~~~~~
114
115Without rules in place no extraction will happen. The simplest rule would be:
116
253886b9
AH
117::
118
253886b9
AH
119 alert http any any -> any any (msg:"FILE store all"; filestore; sid:1; rev:1;)
120
121This will simply store all files to disk.
122
253886b9 123
06f41f60 124Want to store all files with a pdf extension?
253886b9
AH
125
126::
127
253886b9
AH
128 alert http any any -> any any (msg:"FILE PDF file claimed"; fileext:"pdf"; filestore; sid:2; rev:1;)
129
253886b9 130
06f41f60 131Or rather all actual pdf files?
253886b9
AH
132
133::
134
253886b9
AH
135 alert http any any -> any any (msg:"FILE pdf detected"; filemagic:"PDF document"; filestore; sid:3; rev:1;)
136
0ff60f65 137
06f41f60 138Or rather only store files from black list checksum md5 ?
0ff60f65
PD
139
140::
141
0ff60f65
PD
142 alert http any any -> any any (msg:"Black list checksum match and extract MD5"; filemd5:fileextraction-chksum.list; filestore; sid:4; rev:1;)
143
0ff60f65 144
06f41f60 145Or only store files from black list checksum sha1 ?
0ff60f65
PD
146
147::
148
0ff60f65
PD
149 alert http any any -> any any (msg:"Black list checksum match and extract SHA1"; filesha1:fileextraction-chksum.list; filestore; sid:5; rev:1;)
150
0ff60f65 151
06f41f60 152Or finally store files from black list checksum sha256 ?
0ff60f65
PD
153
154::
0ff60f65
PD
155 alert http any any -> any any (msg:"Black list checksum match and extract SHA256"; filesha256:fileextraction-chksum.list; filestore; sid:6; rev:1;)
156
06f41f60 157Bundled with the Suricata download, is a file with more example rules. In the archive, go to the `rules` directory and check the ``files.rules`` file.
253886b9 158
0ff60f65 159
253886b9
AH
160MD5
161~~~
162
163Suricata can calculate MD5 checksums of files on the fly and log them. See :doc:`md5` for an explanation on how to enable this.
7011d8f3
VJ
164
165
166.. toctree::
167
168 md5
169 public-sha1-md5-data-sets
873bc290 170
6b8320d1
JI
171Updating Filestore Configuration
172~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
173
174.. toctree::
175
176 config-update