and legacy normalizers have mutual exclusion behaviour, so you cannot enable
both at the same time (doing so will cause Snort to fail to load). When the
depth is reached, normalization will be stopped. It's implemented per-script.
-js_normalization_depth = -1, will set the max allowed depth value. By default,
+js_normalization_depth = -1, will set unlimited depth. By default,
the value is set to 0 which means that normalizer is disabled. The enhanced
normalizer provides more precise whitespace normalization of JavaScript, that
removes all redundant whitespaces and line terminators from the JavaScript
syntax point of view (between identifier and punctuator, between identifier and
-operator, etc.) according to ECMAScript 5.1 standard. This is currently
-experimental and still under development.
+operator, etc.) according to ECMAScript 5.1 standard.
+The normalized data is available through the script_data rule option.
+This is currently experimental and still under development.
===== xff_headers
===== script_data
-The script_data ips option is used as sticky buffer and contains only the
-normalized JavaScript HTTP response body without 'script' tags. In scope of
-rules the script_data option takes place with enabled new enhanced normalizer,
-so it is used in combination with http_inspect = { js_normalization_depth = N }.
-The js_normalization_depth option is described above. In rules the script_data
-can be used with file_data option where file_data would contain the whole HTTP
-response body for content matching.
+The script_data is a sticky buffer, which contains normalized JavaScript text
+collected from the whole PDU. It requires the Enhanced Normalizer enabled:
+http_inspect = { js_normalization_depth = N },
+js_normalization_depth option is described above.
+Despite what script_data has, file_data still contains the whole HTTP body
+with an original JavaScript in it.
==== Timing issues and combining rule options
3. The 2.X multi_slash and directory options are combined into a single option called
simplify_path.
+HttpJsNorm class serves as a script Normalizer, and currently has two implementations:
+the Legacy Normalizer and the Enhanced Normalizer.
+
+During message body analysis the Enhanced Normalizer does one of the following:
+1. If Content-Type says its an external script then Normalizer processes the
+ whole message body as a script text.
+2. If it is an HTML-page, Normalizer searches for an opening tag and processes
+ subsequent bytes in a stream mode, until it finds a closing tag.
+ It proceeds and scans the entire message body for inline scripts.
+
+Enhanced Normalizer is a stateful JavaScript whitespace normalizer.
+So, the following whitespace codes will be normalized:
+ * \u0009 Tab <TAB>
+ * \u000B Vertical Tab <VT>
+ * \u000C Form Feed <FF>
+ * \u0020 Space <SP>
+ * \u00A0 No-break space <NBSP>
+ * \uFEFF Byte Order Mark <BOM>
+ * Any other Unicode “space separator” <USP>
+ * Also including new-line and carriage-return line-break characters
+
+Additionally, Normalizer validates the syntax with respect to ECMA-262 Standard,
+and checks for restrictions for contents of script elements (since, it is HTML-embedded JavaScript).
+
+The following rules applied:
+ * no nesting tags allowed, i.e. two opening tags in a row
+ * script closing tag is not allowed in string literal, comment, regex
+
+Upon a bad token seen, Normalizer fires corresponding built-in rule and abandons the current script,
+though the already-processed data remains in the output buffer.
+
+Enhanced Normalizer supports scripts over multiple PDUs.
+So, if the script is not ended, Normalizer's context is saved in HttpFlowData.
+The script continuation will be processed with the saved context.
+This has some limitations, like:
+ * split in the middle of the identifier will result in two identifiers in the output
+ * "<script" or "</script>" sequences split over two PDU will not be detected
+
+In order to support Script Detection feature for inline scripts, Normalizer ensures
+that after reaching the script end (legitimate closing tag or bad token),
+it falls back to an initial state, so that the next script can be processed by the same context.
+If PDU starts with a script continuation, it is not possible to restore
+Normalizer to the right state later (because context on the flow is not in an initial state).
+A buffer dedicated to handle this scenario. It contains a normalized data
+from the script continuation, so later it can be prepended to subsequent normalizations.
+
Algorithm for reassembling chunked message bodies:
NHI parses chunked message bodies using an algorithm based on the HTTP RFC. Chunk headers are not