From: Steve Chew (stechew) Date: Thu, 17 Nov 2022 00:56:37 +0000 (+0000) Subject: Pull request #3621: Doc updates: move Enhanced JS Normalizer from NHI to a standalone... X-Git-Tag: 3.1.47.0~2 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=984d8606e5061c359d6af8255494c4281870828d;p=thirdparty%2Fsnort3.git Pull request #3621: Doc updates: move Enhanced JS Normalizer from NHI to a standalone component Merge in SNORT/snort3 from ~OSERHIIE/snort3:doc_js_module to master Squashed commit of the following: commit da8da5ac9b34f6917ade0e7d2036119c90fe10c3 Author: Oleksandr Serhiienko Date: Tue Aug 30 12:53:32 2022 +0200 doc: add JavaScript Normalization section to user manual commit 9b3f22bc70d9dc2e35cf2521dad22dd504b5cac0 Author: Oleksandr Serhiienko Date: Tue Aug 30 11:26:31 2022 +0200 doc: add js_norm alerts to builtin_stubs.txt --- diff --git a/doc/reference/builtin_stubs.txt b/doc/reference/builtin_stubs.txt index 407c639c6..1cef38c44 100644 --- a/doc/reference/builtin_stubs.txt +++ b/doc/reference/builtin_stubs.txt @@ -1207,22 +1207,6 @@ A client sent a request to upgrade an HTTP/1 connection to HTTP/2. A server granted a request to upgrade a connection from HTTP/1 to HTTP/2. -119:265 - -Enhanced JavaScript normalizer has encountered a symbol that is not expected as a part of a valid -JavaScript statement, making further normalization impossible. - -119:266 - -HTML end-tag is encountered inside a JavaScript comment -or literal, which is a syntax error, as the last comment or literal is not closed before -script end. This alert is raised by the enhanced JavaScript normalizer. - 119:268 When HTML end-tag is encountered inside a JavaScript comment +or literal, which is a syntax error, as the last comment or literal is not closed before +script end. This alert is raised by the enhanced JavaScript normalizer. + +154:6 + +JavaScript normalization includes identifier substitution, which brings arbitrary JavaScript +identifiers to a common form. Amount of unique identifiers to normalize is limited, +for memory considerations, with http_inspect.js_norm_identifier_depth parameter. When this +threshold is reached, a corresponding alert is raised. This alert is not expected for typical +network traffic and may be an indication that an attacker is trying to exhaust resources. +This alert is raised by the enhanced JavaScript normalizer. + +154:7 + +In JavaScript, template literals can have substitutions, that in turn can have nested +template literals, which requires a stack to track for proper whitespace normalization. +Also, the normalization tracks the current bracket scope, which requires a stack as well. +When the depth of nesting exceeds limit set in http_inspect.js_norm_max_tmpl_nest or in +http_inspect.js_norm_max_bracket_depth, this alert is raised. This alert is not expected +for typical network traffic and may be an indication that an attacker is trying to exhaust +resources. This alert is raised by the enhanced JavaScript normalizer. + +154:8 + +This alert is raised for the following situation. During JavaScript normalization +some data can be lost and not normalized. Usually it happens when rules have file_data and +js_data ips options and fast-pattern (FP) search is applying to file_data. Some data +doesn’t match file_data FP search and JavaScript normalization won't be executed for it. +The following normalization for inline/external scripts will be stopped for current +request within the flow. This alert is raised by the enhanced JavaScript normalizer. + +154:9 + +To resolve variable names in JavaScript, a current stack of variable scopes has to be tracked. +When the depth of nesting exceeds the limit set in http_inspect.js_norm_max_scope_depth, +this alert is raised. This alert is not expected for typical network traffic and may be +an indication that an attacker is trying to exhaust resources. This alert is raised +by the enhanced JavaScript normalizer. + 175:1 (domain_filter) configured domain detected diff --git a/doc/user/CMakeLists.txt b/doc/user/CMakeLists.txt index 1511b9273..502cb2e92 100644 --- a/doc/user/CMakeLists.txt +++ b/doc/user/CMakeLists.txt @@ -22,6 +22,7 @@ set ( http_inspect.txt http2_inspect.txt iec104.txt + js_norm.txt mms.txt overview.txt params.txt diff --git a/doc/user/features.txt b/doc/user/features.txt index 564066a81..1bcb323d5 100644 --- a/doc/user/features.txt +++ b/doc/user/features.txt @@ -77,6 +77,10 @@ include::http2_inspect.txt[] include::iec104.txt[] +=== JavaScript Normalization + +include::js_norm.txt[] + === MMS Inspector include::mms.txt[] diff --git a/doc/user/http_inspect.txt b/doc/user/http_inspect.txt index 0498cb9f4..9dee396a2 100755 --- a/doc/user/http_inspect.txt +++ b/doc/user/http_inspect.txt @@ -59,6 +59,12 @@ normalization. Both normalizers are independent and can be configured separately. The Legacy normalizer should be considered deprecated. The Enhanced Normalizer is encouraged to use for JavaScript normalization in the first place as we continue improving functionality and quality. +The Enhanced JavaScript Normalizer has to be configured as a separate +module: + + js_norm = {} + +Refer to JavaScript Normalization section for Enhanced Normalizer specifics. ===== Legacy Normalizer @@ -71,26 +77,6 @@ Normalizer is deprecated preferably to use Enhanced Normalizer. After supporting backward compatibility in the Enhanced Normalizer, Legacy Normalizer will be removed. -===== Enhanced Normalizer - -Having ips option 'js_data' in the rules automatically enables Enhanced -Normalizer. The Enhanced Normalizer can normalize inline/external scripts. -It supports scripts over multiple PDUs. It is a stateful JavaScript whitespace -and identifiers normalizer. Normalizer concatenates string literals whenever -it's possible to do. This also works with any other normalizations that result -in string literals. All JavaScript identifier names, except those from -the ignore lists, will be substituted with unified names in the following -format: var_0000 -> var_ffff. But the unescape-like function names will be removed -from the normalized data. The Normalizer tries to expand an escaped text, -so it will appear in a usual form in the output. Moreover, Normalizer validates -the syntax concerning ECMA-262 Standard, including scope tracking and restrictions -for script elements. For more information on how additionally configure -Enhanced Normalizer check with the following configuration options: -js_norm_bytes_depth, js_norm_identifier_depth, js_norm_max_tmpl_nest, -js_norm_max_bracket_depth, js_norm_max_scope_depth, js_norm_ident_ignore, -js_norm_prop_ignore. -Eventually Enhanced Normalizer will completely replace Legacy Normalizer. - ==== Configuration Configuration can be as simple as adding: @@ -229,120 +215,6 @@ uXXXXi. http_inspect also replaces consecutive whitespaces with a single space and normalizes the plus by concatenating the strings. Such normalizations refer to basic JavaScript normalization. -===== js_norm_bytes_depth - -js_norm_bytes_depth = N {-1 : max53} will set a number of input JavaScript -bytes to normalize. When the depth is reached, normalization will be stopped. -It's implemented per-script. By default js_norm_bytes_depth = -1, will set -unlimited depth. The enhanced normalizer provides more precise whitespace -normalization of JavaScript, that removes all redundant whitespaces and line -terminators from the JavaScript syntax point of view (between identifier and -punctuator, between identifier and operator, etc.) according to ECMAScript 5.1 -standard. Additionally, it performs normalization of JavaScript identifiers making -a substitution of unique names with unified names representation: var_0000:var_ffff. -The identifiers are variables and function names. The normalized data is available -through the js_data rule option. - -===== js_norm_identifier_depth - -js_norm_identifier_depth = N {0 : 65536} will set a number of unique -JavaScript identifiers to normalize. When the depth is reached, a built-in -alert is generated. Every HTTP response has its own identifier substitution context, -which means that identifier will retain same normal form in multiple scripts, -if they are a part of the same HTTP response, and that this limit is set for a single -HTTP response and not a single script. By default, the value is set to 65536, which -is the max allowed number of unique identifiers. The generated names are in -the range from var_0000 to var_ffff. - -===== js_norm_max_tmpl_nest - -js_norm_max_tmpl_nest = N {0 : 255} (default 32) is an option of the enhanced -JavaScript normalizer that determines the deepest level of nested template literals -to be processed. Introduced in ES6, template literals provide syntax to define -a literal multiline string, which can have arbitrary JavaScript substitutions, -that will be evaluated and inserted into the string. Such substitutions can be -nested, and require keeping track of every layer for proper normalization. This option -is present to limit the amount of memory dedicated to template nesting tracking. - -===== js_norm_max_bracket_depth - -js_norm_max_bracket_depth = N {1 : 65535} (default 256) is an option of the enhanced -JavaScript normalizer that determines the maximum depth of nesting brackets, i.e. parentheses, -braces and square brackets, nested within a matching pair, in any combination. This option -is present to limit the amount of memory dedicated to bracket tracking. - -===== js_norm_max_scope_depth - -js_norm_max_scope_depth = N {1 : 65535} (default 256) is an option of the enhanced -JavaScript normalizer that determines the deepest level of nested variable scope, -i.e. functions, code blocks, etc. including the global scope. -This option is present to limit the amount of memory dedicated to scope tracking. - -===== js_norm_ident_ignore - -js_norm_ident_ignore = {} is an option of the enhanced -JavaScript normalizer that defines a list of identifiers to keep intact. - -Identifiers in this list will not be put into normal form (var_0000). Subsequent accessors, -after dot, in square brackets or after function call, will not be normalized as well. - -For example: - - console.log("bar") - document.getElementById("id").text - eval("script") - console["log"] - -Every entry has to be a simple identifier, i.e. not include dots, brackets, etc. -For example: - - http_inspect.js_norm_ident_ignore = { 'console', 'document', 'eval', 'foo' } - -When a variable assignment that 'aliases' an identifier from the list is found, -the assignment will be tracked, and subsequent occurrences of the variable will be -replaced with the stored value. This substitution will follow JavaScript variable scope -limits. - -For example: - - var a = console.log - a("hello") // will be substituted to 'console.log("hello")' - -For class names and constructors in the list, when the class is used with the -keyword 'new', created object will be tracked, and its properties will be kept intact. -Identifier of the object itself, however, will be brought to unified form. - -For example: - - var o = new Array() // normalized to 'var var_0000=new Array()' - o.push(10) // normalized to 'var_0000.push(10)' - -The default list of ignore-identifiers is present in "snort_defaults.lua". - -Unescape function names should remain intact in the output. They ought to be -included in the ignore list. If for some reason the user wants to disable unescape -related features, then removing function's name from the ignore list does the trick. - -===== js_norm_prop_ignore - -js_norm_prop_ignore = {} is an option of the enhanced -JavaScript normalizer that defines a list of object properties and methods that -will be kept intact during the identifiers normalization. This list should include -methods and properties of objects that will not be tracked by assignment substitution -functionality, for example, those that can be created implicitly. - -Subsequent accessors, after dot, in square brackets or after function call, will not be -normalized as well. - -For example: - - http_inspect.js_norm_prop_ignore = { 'split' } - - in: "string".toUpperCase().split("").reverse().join(""); - out: "string".var_0000().split("").reverse().join(""); - -The default list of ignored properties is present in "snort_defaults.lua". - ===== xff_headers This configuration supports defining custom x-forwarded-for type headers. In a @@ -505,35 +377,6 @@ server. Due to this potential evasion tactic, the HTTP inspector will not cut ov it sees any early client-to-server traffic, but will continue normal HTTP processing of the flow regardless of the eventual server response. -==== Trace messages - -When a user needs help to sort out things going on inside HTTP inspector, Trace module becomes handy. - - $ snort --help-module trace | grep http_inspect - -Messages for the enhanced JavaScript Normalizer follow -(more verbosity available in debug build): - -===== trace.module.http_inspect.js_proc - -Messages from script processing flow and their verbosity levels: - -1. Script opening tag location. - -2. Attributes of the detected script. - -3. Return codes from Normalizer. - -===== trace.module.http_inspect.js_dump - -JavaScript data dump and verbosity levels: - -1. js_data buffer as it is passed to detection. - -2. (no messages available currently) - -3. Current script as it is passed to Normalizer. - ==== Detection rules http_inspect parses HTTP messages into their components and makes them @@ -820,8 +663,7 @@ normalize_javascript. The js_data contains normalized JavaScript text collected from the whole PDU (inline or external scripts). It requires the Enhanced Normalizer enabled: -http_inspect = { js_norm_bytes_depth = N }, -js_norm_bytes_depth option is described above. +js_norm = { }, Despite what js_data has, file_data still contains the whole HTTP body with an original JavaScript in it. diff --git a/doc/user/js_norm.txt b/doc/user/js_norm.txt new file mode 100644 index 000000000..ff7905979 --- /dev/null +++ b/doc/user/js_norm.txt @@ -0,0 +1,210 @@ +One of the improvements in Snort 3 is Enhanced JavaScript Normalizer which has its +own module and can be used with any service inspectors where JavaScript code might occur. +Currently it is only used by HTTP inspector. + +==== Overview + +You can configure it by adding: + + js_norm = {} + +to your snort.lua configuration file. Or you can read about it in the +source code under src/js_norm. + +Having 'js_norm' module configured and ips option 'js_data' in the rules automatically +enables Enhanced Normalizer. The Enhanced Normalizer can normalize inline/external +scripts. It supports scripts over multiple PDUs. It is a stateful JavaScript whitespace +and identifiers normalizer. Normalizer concatenates string literals whenever +it's possible to do. This also works with any other normalizations that result +in string literals. All JavaScript identifier names, except those from +the ignore lists, will be substituted with unified names in the following +format: var_0000 -> var_ffff. But the unescape-like function names will be removed +from the normalized data. The Normalizer tries to expand an escaped text, +so it will appear in a usual form in the output. Moreover, Normalizer validates +the syntax concerning ECMA-262 Standard, including scope tracking and restrictions +for script elements. For more information on how additionally configure +Enhanced Normalizer check with the following configuration options: +bytes_depth, identifier_depth, max_tmpl_nest, max_bracket_depth, max_scope_depth, +ident_ignore, prop_ignore. +Eventually Enhanced Normalizer will completely replace Legacy Normalizer from HTTP inspector. + +==== Configuration + +Configuration can be as simple as adding: + + js_norm = {} + +to your snort.lua file. The default configuration provides a thorough +normalization and may be all that you need. But there are some options that +provide extra features, tweak how things are done, or conserve resources by +doing less. + +Also, there are default lists of ignored identifiers and object properties provided. +To get a complete default configuration, use 'default_js_norm' from lua/snort_default.lua +by adding: + + js_norm = default_js_norm + +to your snort.lua file. + +Enhanced JavaScript Normalizer implements JIT approach. Actual normalization takes place +only when js_data option is evaluated. This option also used as a buffer selector for +normalized JavaScript data. + +===== bytes_depth + +bytes_depth = N {-1 : max53} will set a number of input JavaScript +bytes to normalize. When the depth is reached, normalization will be stopped. +It's implemented per-script. By default bytes_depth = -1, will set +unlimited depth. The enhanced normalizer provides more precise whitespace +normalization of JavaScript, that removes all redundant whitespaces and line +terminators from the JavaScript syntax point of view (between identifier and +punctuator, between identifier and operator, etc.) according to ECMAScript 5.1 +standard. Additionally, it performs normalization of JavaScript identifiers making +a substitution of unique names with unified names representation: var_0000:var_ffff. +The identifiers are variables and function names. The normalized data is available +through the 'js_data' rule option. + +===== identifier_depth + +identifier_depth = N {0 : 65536} will set a number of unique +JavaScript identifiers to normalize. When the depth is reached, a built-in +alert is generated. Every response has its own identifier substitution context, +which means that identifier will retain same normal form in multiple scripts, +if they are a part of the same response, and that this limit is set for a single +response and not a single script. By default, the value is set to 65536, which +is the max allowed number of unique identifiers. The generated names are in +the range from var_0000 to var_ffff. + +===== max_tmpl_nest + +max_tmpl_nest = N {0 : 255} (default 32) is an option of the enhanced +JavaScript normalizer that determines the deepest level of nested template literals +to be processed. Introduced in ES6, template literals provide syntax to define +a literal multiline string, which can have arbitrary JavaScript substitutions, +that will be evaluated and inserted into the string. Such substitutions can be +nested, and require keeping track of every layer for proper normalization. This option +is present to limit the amount of memory dedicated to template nesting tracking. + +===== max_bracket_depth + +max_bracket_depth = N {1 : 65535} (default 256) is an option of the enhanced +JavaScript normalizer that determines the maximum depth of nesting brackets, i.e. parentheses, +braces and square brackets, nested within a matching pair, in any combination. This option +is present to limit the amount of memory dedicated to bracket tracking. + +===== max_scope_depth + +max_scope_depth = N {1 : 65535} (default 256) is an option of the enhanced +JavaScript normalizer that determines the deepest level of nested variable scope, +i.e. functions, code blocks, etc. including the global scope. +This option is present to limit the amount of memory dedicated to scope tracking. + +===== ident_ignore + +ident_ignore = {} is an option of the enhanced +JavaScript normalizer that defines a list of identifiers to keep intact. + +Identifiers in this list will not be put into normal form (var_0000). Subsequent accessors, +after dot, in square brackets or after function call, will not be normalized as well. + +For example: + + console.log("bar") + document.getElementById("id").text + eval("script") + console["log"] + +Every entry has to be a simple identifier, i.e. not include dots, brackets, etc. +For example: + + js_norm.ident_ignore = { 'console', 'document', 'eval', 'foo' } + +When a variable assignment that 'aliases' an identifier from the list is found, +the assignment will be tracked, and subsequent occurrences of the variable will be +replaced with the stored value. This substitution will follow JavaScript variable scope +limits. + +For example: + + var a = console.log + a("hello") // will be substituted to 'console.log("hello")' + +For class names and constructors in the list, when the class is used with the +keyword 'new', created object will be tracked, and its properties will be kept intact. +Identifier of the object itself, however, will be brought to unified form. + +For example: + + var o = new Array() // normalized to 'var var_0000=new Array()' + o.push(10) // normalized to 'var_0000.push(10)' + +The default list of ignore-identifiers is present in "snort_defaults.lua". + +Unescape function names should remain intact in the output. They ought to be +included in the ignore list. If for some reason the user wants to disable unescape +related features, then removing function's name from the ignore list does the trick. + +===== prop_ignore + +prop_ignore = {} is an option of the enhanced +JavaScript normalizer that defines a list of object properties and methods that +will be kept intact during the identifiers normalization. This list should include +methods and properties of objects that will not be tracked by assignment substitution +functionality, for example, those that can be created implicitly. + +Subsequent accessors, after dot, in square brackets or after function call, will not be +normalized as well. + +For example: + + js_norm.prop_ignore = { 'split' } + + in: "string".toUpperCase().split("").reverse().join(""); + out: "string".var_0000().split("").reverse().join(""); + +The default list of ignored properties is present in "snort_defaults.lua". + +==== Detection rules + +Enhanced JavaScript Normalizer follows JIT approach which require to have rules with +'js_data' IPS option loaded. +An example rule: + + alert tcp any any -> any any (msg:"JavaScript"; js_data; content:"var var_0000=1;"; sid:1;) + +===== js_data + +The js_data IPS contains normalized JavaScript text collected from the whole PDU. +It requires the Enhanced JavaScript Normalizer configured. + +==== Trace messages + +When a user needs help to sort out things going on inside Enhanced JavaScript Normalizer, +Trace module becomes handy. + + $ snort --help-module trace | grep js_norm + +Messages for the enhanced JavaScript Normalizer follow +(more verbosity available in debug build): + +===== trace.module.js_norm.proc + +Messages from script processing flow and their verbosity levels: + +1. Script opening tag location. + +2. Attributes of the detected script. + +3. Return codes from Normalizer. + +===== trace.module.js_norm.dump + +JavaScript data dump and verbosity levels: + +1. js_data buffer as it is passed to detection. + +2. (no messages available currently) + +3. Current script as it is passed to Normalizer. +