Pull request #3223: Enhanced JavaScript normalizer doc updates

author Mike Stepanek (mstepane) <mstepane@cisco.com>

Thu, 6 Jan 2022 11:44:44 +0000 (11:44 +0000)

committer Mike Stepanek (mstepane) <mstepane@cisco.com>

Thu, 6 Jan 2022 11:44:44 +0000 (11:44 +0000)
author Mike Stepanek (mstepane) <mstepane@cisco.com>
Thu, 6 Jan 2022 11:44:44 +0000 (11:44 +0000)
committer Mike Stepanek (mstepane) <mstepane@cisco.com>
Thu, 6 Jan 2022 11:44:44 +0000 (11:44 +0000)
diff --git a/doc/reference/builtin_stubs.txt b/doc/reference/builtin_stubs.txt

index 18db0beee018443d0777a2686a87fe4ddef28a47..bbad9a9238504fce01b9e9151a07c00674b6ec69 100644 (file)
--- a/doc/reference/builtin_stubs.txt
+++ b/doc/reference/builtin_stubs.txt
@@ -1218,30 +1218,32 @@ A server granted a request to upgrade a connection from HTTP/1 to HTTP/2.
  
  119:265
  
-JavaScript normalizer has encountered a symbol that is not expected as a part of a valid
+Enhanced JavaScript normalizer has encountered a symbol that is not expected as a part of a valid
  JavaScript statement, making further normalization impossible.
  
  119:266
  
  HTML <script> tag must not have a nested <script> tag inside it. If a nested tag is
-encountered, this alert is raised.
+encountered, this alert is raised. This alert is raised by the enhanced JavaScript normalizer.
  
  119:267
  
  This alert is raised when </script> end-tag is encountered inside a JavaScript comment
  or literal, which is a syntax error, as the last comment or literal is not closed before
-script end.
+script end. This alert is raised by the enhanced JavaScript normalizer.
  
  119:268
  
  When HTML <script> tag contains a reference to an external script, it must not contain
  any executable JavaScript code. This alert is raised if executable (i.e. not comment) code
-is found inside a script tag that has an external reference.
+is found inside a script tag that has an external reference. This alert is raised
+by the enhanced JavaScript normalizer.
  
  119:269
  
  In HTML, a script tag must not be self-closing (written as <script /> without a following
  end-tag). If a self-closing "short-form" script tag is encountered, this alert is raised.
+This alert is raised by the enhanced JavaScript normalizer.
  
  119:270
  
@@ -1250,6 +1252,7 @@ identifiers to a common form. Amount of unique identifiers to normalize is limit
  for memory considerations, with http_inspect.js_norm_identifier_depth parameter. When this
  threshold is reached, a corresponding alert is raised. This alert is not expected for typical
  network traffic and may be an indication that an attacker is trying to exhaust resources.
+This alert is raised by the enhanced JavaScript normalizer.
  
  119:271
  
@@ -1259,7 +1262,7 @@ Also, the normalization tracks the current bracket scope, which requires a stack
  When the depth of nesting exceeds limit set in http_inspect.js_norm_max_tmpl_nest or in
  http_inspect.js_norm_max_bracket_depth, this alert is raised. This alert is not expected
  for typical network traffic and may be an indication that an attacker is trying to exhaust
-resources.
+resources. This alert is raised by the enhanced JavaScript normalizer.
  
  119:272
  
@@ -1274,16 +1277,15 @@ PDUs can be missed and not normalized. Usually it happens when rules have file_d
  js_data ips options and fast-pattern (FP) search is applying to file_data. Some PDUs don’t
  match file_data FP search and JavaScript normalization won't be executed for these PDUs.
  The normalization of the following PDUs for inline/external scripts will be stopped for
-current request within the flow.
+current request within the flow. This alert is raised by the enhanced JavaScript normalizer.
  
  119:274
  
-In JavaScript, a program is split into several scopes such as a global scope, function scope,
-if block, block of code, object, etc. The scope has a nesting nature which requires a stack
-to track it for proper normalization of JavaScript identifiers. When the depth of nesting
-exceeds limit set in http_inspect.js_norm_max_scope_depth, this alert is raised. This alert is
-not expected for typical network traffic and may be an indication that an attacker is trying to
-exhaust resources.
+To resolve variable names in JavaScript, a current stack of variable scopes has to be tracked.
+When the depth of nesting exceeds the limit set in http_inspect.js_norm_max_scope_depth,
+this alert is raised. This alert is not expected for typical network traffic and may be
+an indication that an attacker is trying to exhaust resources. This alert is raised
+by the enhanced JavaScript normalizer.
  
  121:1
  
diff --git a/doc/user/http_inspect.txt b/doc/user/http_inspect.txt

index ea35edf85ad691a675a2ac20e31250ddfab29fdf..c32787ccfe1e7bb2e5d3417d02a2c3f2b3f7f5a4 100755 (executable)
--- a/doc/user/http_inspect.txt
+++ b/doc/user/http_inspect.txt
@@ -208,9 +208,10 @@ through the js_data rule option.
  
  js_norm_identifier_depth = N {0 : 65536} will set a number of unique
  JavaScript identifiers to normalize. When the depth is reached, a built-in
-alert is generated. Every HTTP Response has its own identifier substitution
-context. Thus, all scripts from the same response will be normalized as if
-they are a single script.. By default, the value is set to 65536, which
+alert is generated. Every HTTP response has its own identifier substitution context,
+which means that identifier will retain same normal form in multiple scripts,
+if they are a part of the same HTTP response, and that this limit is set for a single
+HTTP response and not a single script. By default, the value is set to 65536, which
  is the max allowed number of unique identifiers. The generated names are in
  the range from var_0000 to var_ffff.
  
@@ -222,43 +223,54 @@ to be processed. Introduced in ES6, template literals provide syntax to define
  a literal multiline string, which can have arbitrary JavaScript substitutions,
  that will be evaluated and inserted into the string. Such substitutions can be
  nested, and require keeping track of every layer for proper normalization. This option
-is present to limit the amount of memory dedicated to this tracking.
+is present to limit the amount of memory dedicated to template nesting tracking.
  
  ===== js_norm_max_bracket_depth
  
  js_norm_max_bracket_depth = N {1 : 65535} (default 256) is an option of the enhanced
-JavaScript normalizer that determines the deepest level of nested bracket scope. The scope
-term includes code sections("{}"), parentheses("()") and brackets("[]"). This option
-is present to limit the amount of memory dedicated to this tracking.
+JavaScript normalizer that determines the maximum depth of nesting brackets, i.e. parentheses,
+braces and square brackets, nested within a matching pair, in any combination. This option
+is present to limit the amount of memory dedicated to bracket tracking.
  
  ===== js_norm_max_scope_depth
  
  js_norm_max_scope_depth = N {1 : 65535} (default 256) is an option of the enhanced
-JavaScript normalizer that determines the deepest level of nested scope. The scope
-term includes any type of JavaScript program scope such as the global one, function scope,
-if block, loops, code block, object scope, etc. This option is present to limit the amount
-of memory dedicated to this tracking.
+JavaScript normalizer that determines the deepest level of nested variable scope,
+i.e. functions, code blocks, etc. including the global scope.
+This option is present to limit the amount of memory dedicated to scope tracking.
  
  ===== js_norm_ident_ignore
  
-js_norm_ident_ignore = {<a list of ignored identifiers>}.
-The default list is present in "snort_defaults.lua".
+js_norm_ident_ignore = {<list of ignored identifiers>} is an option of the enhanced
+JavaScript normalizer that defines a list of identifiers to keep intact.
  
-The Normalizer does not substitute ignored identifiers, keeping their name unchanged.
-Additionally, the Normalizer tracks expressions with ignored identifiers, so
-the subsequent identifiers are not substituted in the chain of dots, bracket
-accessors and function calls. For example:
+Identifiers in this list will not be put into normal form (var_0000). Subsequent accessors,
+after dot, in square brackets or after function call, will not be normalized as well.
+
+For example:
  
      console.log("bar")
      document.getElementById("id").text
      eval("script")
-    foo["bar"]
+    console["log"]
  
-The list must contain object and function names only.
+Every entry has to be a simple identifier, i.e. not include dots, brackets, etc.
  For example:
  
      http_inspect.js_norm_ident_ignore = { 'console', 'document', 'eval', 'foo' }
  
+When a variable assignment that 'aliases' an identifier from the list is found,
+the assignment will be tracked, and subsequent occurrences of the variable will be
+replaced with the stored value. This substitution will follow JavaScript variable scope 
+limits.
+
+For example:
+
+    var a = console.log
+    a("hello") // will be substituted to 'console.log("hello")'
+
+The default list of ignore-identifiers is present in "snort_defaults.lua".
+
  ===== xff_headers
  
  This configuration supports defining custom x-forwarded-for type headers. In a
diff --git a/src/service_inspectors/http_inspect/dev_notes.txt b/src/service_inspectors/http_inspect/dev_notes.txt

index ed7400496ccf7c76964490734bd50e69037684f1..4e35ec715d02499c97628de64669a681d2f6f0ed 100755 (executable)
--- a/src/service_inspectors/http_inspect/dev_notes.txt
+++ b/src/service_inspectors/http_inspect/dev_notes.txt
@@ -221,16 +221,10 @@ During message body analysis the Enhanced Normalizer does one of the following:
     It proceeds and scans the entire message body for inline scripts.
  
  Enhanced Normalizer is a stateful JavaScript whitespace and identifiers normalizer.
-So, the following whitespace codes will be normalized:
- * \u0009 Tab <TAB>
- * \u000B Vertical Tab <VT>
- * \u000C Form Feed <FF>
- * \u0020 Space <SP>
- * \u00A0 No-break space <NBSP>
- * \uFEFF Byte Order Mark <BOM>
- * Any other Unicode “space separator” <USP>
- * Also including new-line and carriage-return line-break characters
-
+Normalizer will remove all extraneous whitespace and newlines, keeping a single space where 
+syntactically necessary. Comments will be removed, but contents of string literals will
+be kept intact. Semicolons will be inserted, if not already present, according to ECMAScript
+automatic semicolon insertion rules.
  All JavaScript identifier names, except those from the ignore list,
  will be substituted with unified names in the following format: var_0000 -> var_ffff.
  So, the number of unique identifiers available is 65536 names per HTTP transaction.
@@ -238,10 +232,10 @@ If Normalizer overruns the configured limit, built-in alert is generated.
  A config option to set the limit manually:
   * http_inspect.js_norm_identifier_depth.
  
-Identifiers from the ignore list will be placed as is, without substitution.
-Additionally, Normalizer tracks expressions with ignored identifiers, skipping
-name normalization for subsequent identifiers in the chain of dots, bracket
-accessors and function calls. For example:
+Identifiers from the ignore list will be placed as is, without substitution. Starting with 
+the listed identifier, any chain of dot acessors, brackets and function calls will be kept
+intact.
+For example:
   * console.log("bar")
   * document.getElementById("id").text
   * eval("script")
@@ -251,13 +245,24 @@ Ignored identifiers are configured via the following config option,
  it accepts a list of object and function names:
   * http_inspect.js_norm_ident_ignore = { 'console', 'document', 'eval', 'foo' }
  
-JS Normalizer's syntax parser relies on ECMA-262 Standard. Particularly,
-it tracks brackets, program scope, and applies some restrictions on the
-script syntax (since, it is HTML-embedded JavaScript):
+When a variable assignment that 'aliases' an identifier from the list is found,
+the assignment will be tracked, and subsequent occurrences of the variable will be
+replaced with the stored value. This substitution will follow JavaScript variable scope 
+limits.
+
+For example:
+
+    var a = console.log
+    a("hello") // will be substituted to 'console.log("hello")'
+
+JS Normalizer's syntax parser follows ECMA-262 standard. For various features,
+tracking of variable scope and individual brackets is done in accordance to the standard.
+Additionally, Normalizer enforces standard limits on HTML content in JavaScript:
   * no nesting tags allowed, i.e. two opening tags in a row
- * script closing tag is not allowed in string literal, comment, regex
+ * script closing tag is not allowed in string literal, comment, regular expression literal, etc.
  
-Upon a bad token seen, Normalizer fires corresponding built-in rule and abandons the current script,
+If source JavaScript is syntactically incorrect (containing a bad token, brackets mismatch,
+HTML-tags, etc) Normalizer fires corresponding built-in rule and abandons the current script,
  though the already-processed data remains in the output buffer.
  
  Enhanced Normalizer supports scripts over multiple PDUs.
author	Mike Stepanek (mstepane) <mstepane@cisco.com>
	Thu, 6 Jan 2022 11:44:44 +0000 (11:44 +0000)
committer	Mike Stepanek (mstepane) <mstepane@cisco.com>
	Thu, 6 Jan 2022 11:44:44 +0000 (11:44 +0000)
doc/reference/builtin_stubs.txt		patch \| blob \| blame \| history
doc/user/http_inspect.txt		patch \| blob \| blame \| history
src/service_inspectors/http_inspect/dev_notes.txt		patch \| blob \| blame \| history