From: Mike Stepanek (mstepane) Date: Tue, 30 Nov 2021 21:49:59 +0000 (+0000) Subject: Pull request #3163: JavaScript scope tracking X-Git-Tag: 3.1.18.0~2 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=00ba08546977f0383b066966f988cec02c0b279d;p=thirdparty%2Fsnort3.git Pull request #3163: JavaScript scope tracking Merge in SNORT/snort3 from ~OSERHIIE/snort3:js_vars to master Squashed commit of the following: commit 7931ba587607cd89ae2efee2c53403d04ab21bef Author: Oleksandr Serhiienko Date: Thu Nov 11 20:06:58 2021 +0200 doc: update user/http_inspect.txt with http_inspect.js_norm_max_scope_depth option description commit 3d8c9c1e4a577196366a847998ef717b8db03fe9 Author: Oleksandr Serhiienko Date: Thu Nov 11 20:05:56 2021 +0200 doc: update builtin_subs.txt with EVENT_JS_SCOPE_NEST_OVERFLOW alert commit 178e5b656222c0f3e72589344950cc4886a130d3 Author: Oleksandr Serhiienko Date: Thu Nov 11 20:04:27 2021 +0200 http_inspect: update dev_notes.txt commit 0d103f24002233f51c4aa9cbba18a1b0b5483509 Author: Oleksandr Serhiienko Date: Mon Oct 25 11:43:25 2021 +0300 utils: (JSNormalizer) add program scope tracking and alias resolution Add JavaScript program scope tracking. The scope term includes all JavaScript program scope types: GLOBAL, FUNCTION, BLOCK, OBJECT. Every scope is represented by a separate object on a stack with its own identifiers mapping hash table, connected together in a list. Add variable definition type identification. Add support for alias names resolution with respect to the current program scope. Add trace messages for scope tracking Add two config options: http_inspect.js_norm_max_bracket_depth - bracket scope nesting limit http_inspect.js_norm_max_scope_depth - program scope nesting limit Add two built-in alerts: 119:271 - bracket nesting overflow 119:274 - scope nesting overflow Add unit tests coverage: scope tracking alias resolution split over multiple PDUs error handling commit aef1de2489928f47af8c4345d745378c340ed8f1 Author: Oleksandr Serhiienko Date: Mon Nov 8 11:19:36 2021 +0200 utils: (JSNormalizer) rework the split over multiple chunks behavior Avoid normalization of the input bytes that were already normalized Update unit test cases due to rework in the split over chunks behavior Add unit tests coverage for combined output after several normalizations --- diff --git a/doc/reference/builtin_stubs.txt b/doc/reference/builtin_stubs.txt index 0e3e3f14b..a745668b9 100644 --- a/doc/reference/builtin_stubs.txt +++ b/doc/reference/builtin_stubs.txt @@ -1255,9 +1255,9 @@ network traffic and may be an indication that an attacker is trying to exhaust r In JavaScript, template literals can have substitutions, that in turn can have nested template literals, which requires a stack to track for proper whitespace normalization. -Also, the normalization tracks the current scope, which requires a stack as well. +Also, the normalization tracks the current bracket scope, which requires a stack as well. When the depth of nesting exceeds limit set in http_inspect.js_norm_max_tmpl_nest or in -http_inspect.js_norm_max_scope_depth, this alert is raised. This alert is not expected +http_inspect.js_norm_max_bracket_depth, this alert is raised. This alert is not expected for typical network traffic and may be an indication that an attacker is trying to exhaust resources. @@ -1276,6 +1276,15 @@ match file_data FP search and JavaScript normalization won't be executed for the The normalization of the following PDUs for inline/external scripts will be stopped for current request within the flow. +119:274 + +In JavaScript, a program is split into several scopes such as a global scope, function scope, +if block, block of code, object, etc. The scope has a nesting nature which requires a stack +to track it for proper normalization of JavaScript identifiers. When the depth of nesting +exceeds limit set in http_inspect.js_norm_max_scope_depth, this alert is raised. This alert is +not expected for typical network traffic and may be an indication that an attacker is trying to +exhaust resources. + 121:1 Invalid flag set on HTTP/2 frame header diff --git a/doc/user/http_inspect.txt b/doc/user/http_inspect.txt index c2d848c5c..9be37d266 100755 --- a/doc/user/http_inspect.txt +++ b/doc/user/http_inspect.txt @@ -83,9 +83,9 @@ validates the syntax concerning ECMA-262 Standard, including scope tracking, and checks for restrictions for contents of script elements (since it is HTML-embedded JavaScript). For more information on how additionally configure Enhanced Normalizer check the following http_inspect options: js_normalization_depth, -js_norm_identifier_depth, js_norm_max_tmpl_nest, js_norm_max_scope_depth, -js_norm_built_in_ident. Eventually Enhanced Normalizer will completely replace -Legacy Normalizer. +js_norm_identifier_depth, js_norm_max_tmpl_nest, js_norm_max_bracket_depth, +js_norm_max_scope_depth, js_norm_built_in_ident. Eventually Enhanced Normalizer will +completely replace Legacy Normalizer. ==== Configuration @@ -225,13 +225,21 @@ that will be evaluated and inserted into the string. Such substitutions can be nested, and require keeping track of every layer for proper normalization. This option is present to limit the amount of memory dedicated to this tracking. -===== js_norm_max_scope_depth +===== js_norm_max_bracket_depth -js_norm_max_scope_depth = N {0 : 65535} (default 256) is an option of the enhanced -JavaScript normalizer that determines the deepest level of nested scope. The scope +js_norm_max_bracket_depth = N {1 : 65535} (default 256) is an option of the enhanced +JavaScript normalizer that determines the deepest level of nested bracket scope. The scope term includes code sections("{}"), parentheses("()") and brackets("[]"). This option is present to limit the amount of memory dedicated to this tracking. +===== js_norm_max_scope_depth + +js_norm_max_scope_depth = N {1 : 65535} (default 256) is an option of the enhanced +JavaScript normalizer that determines the deepest level of nested scope. The scope +term includes any type of JavaScript program scope such as the global one, function scope, +if block, loops, code block, object scope, etc. This option is present to limit the amount +of memory dedicated to this tracking. + ===== js_norm_built_in_ident js_norm_built_in_ident = {}. diff --git a/src/service_inspectors/http_inspect/dev_notes.txt b/src/service_inspectors/http_inspect/dev_notes.txt index 89cebf294..7d254edc9 100755 --- a/src/service_inspectors/http_inspect/dev_notes.txt +++ b/src/service_inspectors/http_inspect/dev_notes.txt @@ -256,8 +256,8 @@ For example: * http_inspect.js_norm_built_in_ident = { 'console', 'document', 'eval', 'foo' } Additionally, Normalizer validates the syntax with respect to ECMA-262 Standard, including -scope tracking, and checks for restrictions for contents of script elements (since, it -is HTML-embedded JavaScript). +bracket scope tracking, program scope tracking, and checks for restrictions for contents of +script elements (since, it is HTML-embedded JavaScript). The following rules applied: * no nesting tags allowed, i.e. two opening tags in a row diff --git a/src/service_inspectors/http_inspect/http_enum.h b/src/service_inspectors/http_inspect/http_enum.h index 065339076..11ac86059 100755 --- a/src/service_inspectors/http_inspect/http_enum.h +++ b/src/service_inspectors/http_inspect/http_enum.h @@ -280,11 +280,12 @@ enum Infraction INF_JS_CODE_IN_EXTERNAL = 124, INF_JS_SHORTENED_TAG = 125, INF_JS_IDENTIFIER_OVERFLOW = 126, - INF_JS_SCOPE_NEST_OVFLOW = 127, + INF_JS_BRACKET_NEST_OVERFLOW = 127, INF_CHUNK_OVER_MAXIMUM = 128, INF_LONG_HOST_VALUE = 129, INF_ACCEPT_ENCODING_CONSECUTIVE_COMMAS = 130, INF_JS_PDU_MISS = 131, + INF_JS_SCOPE_NEST_OVERFLOW = 132, INF__MAX_VALUE }; @@ -414,9 +415,10 @@ enum EventSid EVENT_JS_CODE_IN_EXTERNAL = 268, EVENT_JS_SHORTENED_TAG = 269, EVENT_JS_IDENTIFIER_OVERFLOW = 270, - EVENT_JS_SCOPE_NEST_OVFLOW = 271, + EVENT_JS_BRACKET_NEST_OVERFLOW = 271, EVENT_ACCEPT_ENCODING_CONSECUTIVE_COMMAS = 272, EVENT_JS_PDU_MISS = 273, + EVENT_JS_SCOPE_NEST_OVERFLOW = 274, EVENT__MAX_VALUE }; diff --git a/src/service_inspectors/http_inspect/http_flow_data.cc b/src/service_inspectors/http_inspect/http_flow_data.cc index 059fbbc6a..d058c56e1 100644 --- a/src/service_inspectors/http_inspect/http_flow_data.cc +++ b/src/service_inspectors/http_inspect/http_flow_data.cc @@ -259,7 +259,7 @@ void HttpFlowData::reset_js_ident_ctx() } snort::JSNormalizer& HttpFlowData::acquire_js_ctx(int32_t ident_depth, size_t norm_depth, - uint8_t max_template_nesting, uint32_t max_scope_depth, + uint8_t max_template_nesting, uint32_t max_bracket_depth, uint32_t max_scope_depth, const std::unordered_set& built_in_ident) { if (js_normalizer) @@ -267,7 +267,7 @@ snort::JSNormalizer& HttpFlowData::acquire_js_ctx(int32_t ident_depth, size_t no if (!js_ident_ctx) { - js_ident_ctx = new JSIdentifierCtx(ident_depth, built_in_ident); + js_ident_ctx = new JSIdentifierCtx(ident_depth, max_scope_depth, built_in_ident); update_allocations(js_ident_ctx->size()); debug_logf(4, http_trace, TRACE_JS_PROC, nullptr, @@ -275,7 +275,7 @@ snort::JSNormalizer& HttpFlowData::acquire_js_ctx(int32_t ident_depth, size_t no } js_normalizer = new JSNormalizer(*js_ident_ctx, norm_depth, - max_template_nesting, max_scope_depth); + max_template_nesting, max_bracket_depth); update_allocations(JSNormalizer::size()); debug_logf(4, http_trace, TRACE_JS_PROC, nullptr, @@ -308,7 +308,7 @@ void HttpFlowData::release_js_ctx() } #else void HttpFlowData::reset_js_ident_ctx() {} -snort::JSNormalizer& HttpFlowData::acquire_js_ctx(int32_t, size_t, uint8_t, uint32_t, +snort::JSNormalizer& HttpFlowData::acquire_js_ctx(int32_t, size_t, uint8_t, uint32_t, uint32_t, const std::unordered_set&) { return *js_normalizer; } void HttpFlowData::release_js_ctx() {} diff --git a/src/service_inspectors/http_inspect/http_flow_data.h b/src/service_inspectors/http_inspect/http_flow_data.h index 87f7976a2..7adef8158 100644 --- a/src/service_inspectors/http_inspect/http_flow_data.h +++ b/src/service_inspectors/http_inspect/http_flow_data.h @@ -220,7 +220,7 @@ private: void reset_js_pdu_idx(); void reset_js_ident_ctx(); snort::JSNormalizer& acquire_js_ctx(int32_t ident_depth, size_t norm_depth, - uint8_t max_template_nesting, uint32_t max_scope_depth, + uint8_t max_template_nesting, uint32_t max_bracket_depth, uint32_t max_scope_depth, const std::unordered_set& built_in_ident); void release_js_ctx(); bool is_pdu_missed(); diff --git a/src/service_inspectors/http_inspect/http_inspect.cc b/src/service_inspectors/http_inspect/http_inspect.cc index ddf444ca0..f5e36c7f1 100755 --- a/src/service_inspectors/http_inspect/http_inspect.cc +++ b/src/service_inspectors/http_inspect/http_inspect.cc @@ -165,6 +165,7 @@ void HttpInspect::show(const SnortConfig*) const ConfigLogger::log_value("js_normalization_depth", params->js_norm_param.js_normalization_depth); ConfigLogger::log_value("js_norm_identifier_depth", params->js_norm_param.js_identifier_depth); ConfigLogger::log_value("js_norm_max_tmpl_nest", params->js_norm_param.max_template_nesting); + ConfigLogger::log_value("js_norm_max_bracket_depth", params->js_norm_param.max_bracket_depth); ConfigLogger::log_value("js_norm_max_scope_depth", params->js_norm_param.max_scope_depth); if (!js_built_in_ident.empty()) ConfigLogger::log_list("js_norm_built_in_ident", js_built_in_ident.c_str()); diff --git a/src/service_inspectors/http_inspect/http_js_norm.cc b/src/service_inspectors/http_inspect/http_js_norm.cc index a8c0b1801..edf54a067 100644 --- a/src/service_inspectors/http_inspect/http_js_norm.cc +++ b/src/service_inspectors/http_inspect/http_js_norm.cc @@ -44,6 +44,7 @@ static const char* jsret_codes[] = "bad token", "identifier overflow", "template nesting overflow", + "bracket nesting overflow", "scope nesting overflow", "wrong closing symbol", "ended in inner scope", @@ -80,13 +81,14 @@ static inline JSTokenizer::JSRet js_normalize(JSNormalizer& ctx, const char* con } HttpJsNorm::HttpJsNorm(const HttpParaList::UriParam& uri_param_, int64_t normalization_depth_, - int32_t identifier_depth_, uint8_t max_template_nesting_, uint32_t max_scope_depth_, - const std::unordered_set& built_in_ident_) : + int32_t identifier_depth_, uint8_t max_template_nesting_, uint32_t max_bracket_depth_, + uint32_t max_scope_depth_, const std::unordered_set& built_in_ident_) : uri_param(uri_param_), detection_depth(UINT64_MAX), normalization_depth(normalization_depth_), identifier_depth(identifier_depth_), max_template_nesting(max_template_nesting_), + max_bracket_depth(max_bracket_depth_), max_scope_depth(max_scope_depth_), built_in_ident(built_in_ident_), mpse_otag(nullptr), @@ -158,7 +160,7 @@ void HttpJsNorm::do_external(const Field& input, Field& output, "script continues\n"); auto& js_ctx = ssn->acquire_js_ctx(identifier_depth, normalization_depth, max_template_nesting, - max_scope_depth, built_in_ident); + max_bracket_depth, max_scope_depth, built_in_ident); while (ptr < end) { @@ -197,9 +199,14 @@ void HttpJsNorm::do_external(const Field& input, Field& output, ssn->js_built_in_event = true; break; case JSTokenizer::TEMPLATE_NESTING_OVERFLOW: + case JSTokenizer::BRACKET_NESTING_OVERFLOW: + *infractions += INF_JS_BRACKET_NEST_OVERFLOW; + events->create_event(EVENT_JS_BRACKET_NEST_OVERFLOW); + ssn->js_built_in_event = true; + break; case JSTokenizer::SCOPE_NESTING_OVERFLOW: - *infractions += INF_JS_SCOPE_NEST_OVFLOW; - events->create_event(EVENT_JS_SCOPE_NEST_OVFLOW); + *infractions += INF_JS_SCOPE_NEST_OVERFLOW; + events->create_event(EVENT_JS_SCOPE_NEST_OVERFLOW); ssn->js_built_in_event = true; break; default: @@ -288,7 +295,7 @@ void HttpJsNorm::do_inline(const Field& input, Field& output, } auto& js_ctx = ssn->acquire_js_ctx(identifier_depth, normalization_depth, - max_template_nesting, max_scope_depth, built_in_ident); + max_template_nesting, max_bracket_depth, max_scope_depth, built_in_ident); auto output_size_before = js_ctx.script_size(); auto ret = js_normalize(js_ctx, end, ptr); @@ -322,9 +329,13 @@ void HttpJsNorm::do_inline(const Field& input, Field& output, events->create_event(EVENT_JS_IDENTIFIER_OVERFLOW); break; case JSTokenizer::TEMPLATE_NESTING_OVERFLOW: + case JSTokenizer::BRACKET_NESTING_OVERFLOW: + *infractions += INF_JS_BRACKET_NEST_OVERFLOW; + events->create_event(EVENT_JS_BRACKET_NEST_OVERFLOW); + break; case JSTokenizer::SCOPE_NESTING_OVERFLOW: - *infractions += INF_JS_SCOPE_NEST_OVFLOW; - events->create_event(EVENT_JS_SCOPE_NEST_OVFLOW); + *infractions += INF_JS_SCOPE_NEST_OVERFLOW; + events->create_event(EVENT_JS_SCOPE_NEST_OVERFLOW); break; default: assert(false); diff --git a/src/service_inspectors/http_inspect/http_js_norm.h b/src/service_inspectors/http_inspect/http_js_norm.h index 076f1689e..73e2deb20 100644 --- a/src/service_inspectors/http_inspect/http_js_norm.h +++ b/src/service_inspectors/http_inspect/http_js_norm.h @@ -37,8 +37,8 @@ class HttpJsNorm { public: HttpJsNorm(const HttpParaList::UriParam&, int64_t normalization_depth, - int32_t identifier_depth, uint8_t max_template_nesting, uint32_t max_scope_depth, - const std::unordered_set& built_in_ident); + int32_t identifier_depth, uint8_t max_template_nesting, uint32_t max_bracket_depth, + uint32_t max_scope_depth, const std::unordered_set& built_in_ident); ~HttpJsNorm(); void set_detection_depth(size_t depth) @@ -67,6 +67,7 @@ private: int64_t normalization_depth; int32_t identifier_depth; uint8_t max_template_nesting; + uint32_t max_bracket_depth; uint32_t max_scope_depth; const std::unordered_set& built_in_ident; bool configure_once = false; diff --git a/src/service_inspectors/http_inspect/http_module.cc b/src/service_inspectors/http_inspect/http_module.cc index a9794f68d..41585ab35 100755 --- a/src/service_inspectors/http_inspect/http_module.cc +++ b/src/service_inspectors/http_inspect/http_module.cc @@ -100,7 +100,10 @@ const Parameter HttpModule::http_params[] = "maximum depth of template literal nesting that enhanced javascript normalizer " "will process" }, - { "js_norm_max_scope_depth", Parameter::PT_INT, "0:65535", "256", + { "js_norm_max_bracket_depth", Parameter::PT_INT, "1:65535", "256", + "maximum depth of bracket nesting that enhanced JavaScript normalizer will process" }, + + { "js_norm_max_scope_depth", Parameter::PT_INT, "1:65535", "256", "maximum depth of scope nesting that enhanced JavaScript normalizer will process" }, { "js_norm_built_in_ident", Parameter::PT_LIST, js_built_in_ident_param, nullptr, @@ -278,9 +281,13 @@ bool HttpModule::set(const char*, Value& val, SnortConfig*) { params->js_norm_param.max_template_nesting = val.get_uint8(); } + else if (val.is("js_norm_max_bracket_depth")) + { + params->js_norm_param.max_bracket_depth = val.get_uint32(); + } else if (val.is("js_norm_max_scope_depth")) { - params->js_norm_param.max_scope_depth = val.get_int32(); + params->js_norm_param.max_scope_depth = val.get_uint32(); } else if (val.is("ident_name")) { @@ -469,8 +476,8 @@ bool HttpModule::end(const char* fqn, int, SnortConfig*) params->js_norm_param.js_norm = new HttpJsNorm(params->uri_param, params->js_norm_param.js_normalization_depth, params->js_norm_param.js_identifier_depth, - params->js_norm_param.max_template_nesting, params->js_norm_param.max_scope_depth, - params->js_norm_param.built_in_ident); + params->js_norm_param.max_template_nesting, params->js_norm_param.max_bracket_depth, + params->js_norm_param.max_scope_depth, params->js_norm_param.built_in_ident); params->script_detection_handle = script_detection_handle; diff --git a/src/service_inspectors/http_inspect/http_module.h b/src/service_inspectors/http_inspect/http_module.h index 3703f45ac..79160ea40 100755 --- a/src/service_inspectors/http_inspect/http_module.h +++ b/src/service_inspectors/http_inspect/http_module.h @@ -69,6 +69,7 @@ public: int64_t js_normalization_depth = -1; int32_t js_identifier_depth = 0; uint8_t max_template_nesting = 32; + uint32_t max_bracket_depth = 256; uint32_t max_scope_depth = 256; std::unordered_set built_in_ident; int max_javascript_whitespaces = 200; diff --git a/src/service_inspectors/http_inspect/http_tables.cc b/src/service_inspectors/http_inspect/http_tables.cc index 2dec40dc0..1ff388588 100755 --- a/src/service_inspectors/http_inspect/http_tables.cc +++ b/src/service_inspectors/http_inspect/http_tables.cc @@ -330,10 +330,11 @@ const RuleMap HttpModule::http_events[] = { EVENT_JS_CODE_IN_EXTERNAL, "JavaScript code under the external script tags" }, { EVENT_JS_SHORTENED_TAG, "script opening tag in a short form" }, { EVENT_JS_IDENTIFIER_OVERFLOW, "max number of unique JavaScript identifiers reached" }, - { EVENT_JS_SCOPE_NEST_OVFLOW, "JavaScript scope nesting is over capacity" }, + { EVENT_JS_BRACKET_NEST_OVERFLOW, "JavaScript bracket nesting is over capacity" }, { EVENT_ACCEPT_ENCODING_CONSECUTIVE_COMMAS, "Consecutive commas in HTTP Accept-Encoding " "header" }, { EVENT_JS_PDU_MISS, "missed PDUs during JavaScript normalization" }, + { EVENT_JS_SCOPE_NEST_OVERFLOW, "JavaScript scope nesting is over capacity" }, { 0, nullptr } }; diff --git a/src/service_inspectors/http_inspect/test/http_module_test.cc b/src/service_inspectors/http_inspect/test/http_module_test.cc index 72a54d655..587bf9ce5 100755 --- a/src/service_inspectors/http_inspect/test/http_module_test.cc +++ b/src/service_inspectors/http_inspect/test/http_module_test.cc @@ -65,12 +65,12 @@ long HttpTestManager::print_amount {}; bool HttpTestManager::print_hex {}; HttpJsNorm::HttpJsNorm(const HttpParaList::UriParam& uri_param_, int64_t normalization_depth_, - int32_t identifier_depth_, uint8_t max_template_nesting_, uint32_t max_scope_depth_, - const std::unordered_set& built_in_ident_) : + int32_t identifier_depth_, uint8_t max_template_nesting_, uint32_t max_bracket_depth_, + uint32_t max_scope_depth_, const std::unordered_set& built_in_ident_) : uri_param(uri_param_), normalization_depth(normalization_depth_), identifier_depth(identifier_depth_), max_template_nesting(max_template_nesting_), - max_scope_depth(max_scope_depth_), built_in_ident(built_in_ident_), - mpse_otag(nullptr), mpse_attr(nullptr), mpse_type(nullptr) {} + max_bracket_depth(max_bracket_depth_), max_scope_depth(max_scope_depth_), + built_in_ident(built_in_ident_), mpse_otag(nullptr), mpse_attr(nullptr), mpse_type(nullptr) {} HttpJsNorm::~HttpJsNorm() = default; void HttpJsNorm::configure(){} int64_t Parameter::get_int(char const*) { return 0; } diff --git a/src/service_inspectors/http_inspect/test/http_uri_norm_test.cc b/src/service_inspectors/http_inspect/test/http_uri_norm_test.cc index 1c95abcf2..0e58086bf 100755 --- a/src/service_inspectors/http_inspect/test/http_uri_norm_test.cc +++ b/src/service_inspectors/http_inspect/test/http_uri_norm_test.cc @@ -54,12 +54,12 @@ void show_stats(PegCount*, const PegInfo*, unsigned, const char*) { } void show_stats(PegCount*, const PegInfo*, const IndexVec&, const char*, FILE*) { } HttpJsNorm::HttpJsNorm(const HttpParaList::UriParam& uri_param_, int64_t normalization_depth_, - int32_t identifier_depth_, uint8_t max_template_nesting_, uint32_t max_scope_depth_, - const std::unordered_set& built_in_ident_) : + int32_t identifier_depth_, uint8_t max_template_nesting_, uint32_t max_bracket_depth_, + uint32_t max_scope_depth_, const std::unordered_set& built_in_ident_) : uri_param(uri_param_), normalization_depth(normalization_depth_), identifier_depth(identifier_depth_), max_template_nesting(max_template_nesting_), - max_scope_depth(max_scope_depth_), built_in_ident(built_in_ident_), - mpse_otag(nullptr), mpse_attr(nullptr), mpse_type(nullptr) {} + max_bracket_depth(max_bracket_depth_), max_scope_depth(max_scope_depth_), + built_in_ident(built_in_ident_), mpse_otag(nullptr), mpse_attr(nullptr), mpse_type(nullptr) {} HttpJsNorm::~HttpJsNorm() = default; void HttpJsNorm::configure() {} int64_t Parameter::get_int(char const*) { return 0; } diff --git a/src/utils/js_identifier_ctx.cc b/src/utils/js_identifier_ctx.cc index 277f320a5..5dff0b085 100644 --- a/src/utils/js_identifier_ctx.cc +++ b/src/utils/js_identifier_ctx.cc @@ -23,6 +23,8 @@ #include "js_identifier_ctx.h" +#include + #if !defined(CATCH_TEST_BUILD) && !defined(BENCHMARK_TEST) #include "service_inspectors/http_inspect/http_enum.h" #include "service_inspectors/http_inspect/http_module.h" @@ -43,7 +45,7 @@ public: #endif // CATCH_TEST_BUILD #define MAX_LAST_NAME 65535 -#define HEX_DIGIT_MASK 15 +#define HEX_DIGIT_MASK 15 static const char hex_digits[] = { @@ -62,6 +64,13 @@ static inline std::string format_name(int32_t num) return name; } +JSIdentifierCtx::JSIdentifierCtx(int32_t depth, uint32_t max_scope_depth, + const std::unordered_set& ident_built_in) + : ident_built_in(ident_built_in), depth(depth), max_scope_depth(max_scope_depth) +{ + scopes.emplace_back(JSProgramScopeType::GLOBAL); +} + const char* JSIdentifierCtx::substitute(const char* identifier) { const auto it = ident_names.find(identifier); @@ -81,9 +90,113 @@ bool JSIdentifierCtx::built_in(const char* identifier) const return ident_built_in.count(identifier); } +bool JSIdentifierCtx::scope_push(JSProgramScopeType t) +{ + assert(t != JSProgramScopeType::GLOBAL && t != JSProgramScopeType::PROG_SCOPE_TYPE_MAX); + + if (scopes.size() >= max_scope_depth) + return false; + + scopes.emplace_back(t); + return true; +} + +bool JSIdentifierCtx::scope_pop(JSProgramScopeType t) +{ + assert(t != JSProgramScopeType::GLOBAL && t != JSProgramScopeType::PROG_SCOPE_TYPE_MAX); + + if (scopes.back().type() != t) + return false; + + assert(scopes.size() != 1); + scopes.pop_back(); + return true; +} + void JSIdentifierCtx::reset() { ident_last_name = 0; + ident_names.clear(); + scopes.clear(); + scopes.emplace_back(JSProgramScopeType::GLOBAL); +} + +void JSIdentifierCtx::ProgramScope::add_alias(const char* alias, const std::string& value) +{ + assert(alias); + aliases[alias] = value; } +const char* JSIdentifierCtx::ProgramScope::get_alias_value(const char* alias) const +{ + assert(alias); + + const auto it = aliases.find(alias); + if (it != aliases.end()) + return it->second.c_str(); + else + return nullptr; +} + +// advanced program scope access for testing + +#ifdef CATCH_TEST_BUILD + +void JSIdentifierCtx::add_alias(const char* alias, const std::string& value) +{ + assert(alias); + assert(!scopes.empty()); + scopes.back().add_alias(alias, value); +} + +const char* JSIdentifierCtx::alias_lookup(const char* alias) const +{ + assert(alias); + + for (auto it = scopes.rbegin(); it != scopes.rend(); ++it) + { + if (const char* value = it->get_alias_value(alias)) + return value; + } + return nullptr; +} + +bool JSIdentifierCtx::scope_check(const std::list& compare) const +{ + if (scopes.size() != compare.size()) + return false; + + auto cmp = compare.begin(); + for (auto it = scopes.begin(); it != scopes.end(); ++it, ++cmp) + { + if (it->type() != *cmp) + return false; + } + return true; +} + +const std::list JSIdentifierCtx::get_types() const +{ + std::list return_list; + for(const auto& scope:scopes) + { + return_list.push_back(scope.type()); + } + return return_list; +} + +bool JSIdentifierCtx::scope_contains(size_t pos, const char* alias) const +{ + size_t offset = 0; + for (auto it = scopes.begin(); it != scopes.end(); ++it, ++offset) + { + if (offset == pos) + return it->get_alias_value(alias); + } + assert(false); + return false; +} + +#endif // CATCH_TEST_BUILD + diff --git a/src/utils/js_identifier_ctx.h b/src/utils/js_identifier_ctx.h index c9824b573..b692e466c 100644 --- a/src/utils/js_identifier_ctx.h +++ b/src/utils/js_identifier_ctx.h @@ -20,10 +20,19 @@ #ifndef JS_IDENTIFIER_CTX #define JS_IDENTIFIER_CTX +#include #include #include #include +enum JSProgramScopeType : unsigned int +{ + GLOBAL = 0, // the global scope (the initial one) + FUNCTION, // function declaration + BLOCK, // block of code and object declaration + PROG_SCOPE_TYPE_MAX +}; + class JSIdentifierCtxBase { public: @@ -31,31 +40,70 @@ public: virtual const char* substitute(const char* identifier) = 0; virtual bool built_in(const char* identifier) const = 0; + + virtual bool scope_push(JSProgramScopeType) = 0; + virtual bool scope_pop(JSProgramScopeType) = 0; + virtual void reset() = 0; + virtual size_t size() const = 0; }; class JSIdentifierCtx : public JSIdentifierCtxBase { public: - JSIdentifierCtx(int32_t depth, const std::unordered_set& ident_built_in) - : depth(depth), ident_built_in(ident_built_in) - {} + JSIdentifierCtx(int32_t depth, uint32_t max_scope_depth, + const std::unordered_set& ident_built_in); - const char* substitute(const char* identifier) override; - bool built_in(const char* identifier) const override; - void reset() override; + virtual const char* substitute(const char* identifier) override; + virtual bool built_in(const char* identifier) const override; - // approximated to 500 unique mappings insertions - size_t size() const override - { return (sizeof(JSIdentifierCtx) + (sizeof(std::string) * 2 * 500)); } + virtual bool scope_push(JSProgramScopeType) override; + virtual bool scope_pop(JSProgramScopeType) override; + + virtual void reset() override; + // approximated to 500 unique mappings insertions + // approximated to 3 program scopes in the list + virtual size_t size() const override + { return (sizeof(JSIdentifierCtx) + (sizeof(std::string) * 2 * 500) + + (sizeof(ProgramScope) * 3)); } private: - int32_t ident_last_name = 0; - int32_t depth; + class ProgramScope + { + public: + ProgramScope(JSProgramScopeType t) : t(t) {} + + void add_alias(const char* alias, const std::string& value); + const char* get_alias_value(const char* alias) const; + + JSProgramScopeType type() const + { return t; } + private: + std::unordered_map aliases; + JSProgramScopeType t; + }; + std::list scopes; std::unordered_map ident_names; const std::unordered_set& ident_built_in; + + int32_t ident_last_name = 0; + int32_t depth; + uint32_t max_scope_depth; + +// advanced program scope access for testing +#ifdef CATCH_TEST_BUILD +public: + // alias tracking + void add_alias(const char* alias, const std::string& value); + const char* alias_lookup(const char* alias) const; + + // compare scope list with the passed pattern + bool scope_check(const std::list& compare) const; + const std::list get_types() const; + bool scope_contains(size_t pos, const char* alias) const; +#endif // CATCH_TEST_BUILD }; #endif // JS_IDENTIFIER_CTX diff --git a/src/utils/js_normalizer.cc b/src/utils/js_normalizer.cc index 639ee7b19..571670c5b 100644 --- a/src/utils/js_normalizer.cc +++ b/src/utils/js_normalizer.cc @@ -29,7 +29,7 @@ using namespace snort; using namespace std; JSNormalizer::JSNormalizer(JSIdentifierCtxBase& js_ident_ctx, size_t norm_depth, - uint8_t max_template_nesting, uint32_t max_scope_depth, int tmp_cap_size) + uint8_t max_template_nesting, uint32_t max_bracket_depth, int tmp_cap_size) : depth(norm_depth), rem_bytes(norm_depth), unlim(norm_depth == static_cast(-1)), @@ -38,7 +38,7 @@ JSNormalizer::JSNormalizer(JSIdentifierCtxBase& js_ident_ctx, size_t norm_depth, tmp_buf_size(0), in(&in_buf), out(&out_buf), - tokenizer(in, out, js_ident_ctx, max_template_nesting, max_scope_depth, tmp_buf, tmp_buf_size, tmp_cap_size) + tokenizer(in, out, js_ident_ctx, max_template_nesting, max_bracket_depth, tmp_buf, tmp_buf_size, tmp_cap_size) { } @@ -79,13 +79,13 @@ JSTokenizer::JSRet JSNormalizer::normalize(const char* src, size_t src_len) ->pubsetbuf(const_cast(src), len); out_buf.reserve(src_len * BUFF_EXP_FACTOR); - size_t t_bytes = in_buf.last_chunk_offset(); - tokenizer.pre_yylex(t_bytes != 0); + tokenizer.pre_yylex(); JSTokenizer::JSRet ret = static_cast(tokenizer.yylex()); in.clear(); out.clear(); + size_t t_bytes = in_buf.last_chunk_offset(); size_t r_bytes = tokenizer.get_bytes_read(); r_bytes = max(r_bytes, t_bytes) - t_bytes; diff --git a/src/utils/js_normalizer.h b/src/utils/js_normalizer.h index 508380d9d..2aea26f03 100644 --- a/src/utils/js_normalizer.h +++ b/src/utils/js_normalizer.h @@ -34,7 +34,7 @@ class JSNormalizer { public: JSNormalizer(JSIdentifierCtxBase& js_ident_ctx, size_t depth, - uint8_t max_template_nesting, uint32_t max_scope_depth, + uint8_t max_template_nesting, uint32_t max_bracket_depth, int tmp_cap_size = JSTOKENIZER_BUF_MAX_SIZE); ~JSNormalizer(); diff --git a/src/utils/js_tokenizer.h b/src/utils/js_tokenizer.h index fbfb22173..476dd3955 100644 --- a/src/utils/js_tokenizer.h +++ b/src/utils/js_tokenizer.h @@ -39,6 +39,8 @@ extern THREAD_LOCAL const snort::Trace* http_trace; // To hold potentially long identifiers #define JSTOKENIZER_BUF_MAX_SIZE 256 +enum JSProgramScopeType : unsigned int; + class JSIdentifierCtxBase; class JSTokenizer : public yyFlexLexer @@ -54,25 +56,47 @@ private: LITERAL, DIRECTIVE, DOT, - CLOSING_BRACKET + COLON, + CLOSING_BRACKET, + KEYWORD_VAR_DECL, // var, let, const + KEYWORD_FUNCTION, + KEYWORD_BLOCK, // for all block-definition keywords e.g. if, else, for, etc. + KEYWORD_CLASS, + OPERATOR_ASSIGNMENT, + OPERATOR_COMPLEX_ASSIGNMENT, + OPERATOR_COMPARISON, + OPERATOR_LOGICAL, + OPERATOR_SHIFT }; enum ScopeType { - GLOBAL = 0, + GLOBAL = 0, // not in the brackets (the initial one) BRACES, // {} PARENTHESES, // () BRACKETS // [] }; + enum ScopeMetaType + { + NOT_SET = 0, + FUNCTION, // function, arrow function + BLOCK, // if, else, for, while, do, with, switch, try, catch, finally, block of code + OBJECT, // object definition, class definition + SCOPE_META_TYPE_MAX + }; struct Scope { - Scope(ScopeType t) - : type(t), ident_norm(true), func_call(false) + Scope(ScopeType t) : + type(t), meta_type(ScopeMetaType::NOT_SET), ident_norm(true), func_call(false), + block_param(false), do_loop(false) {} ScopeType type; + ScopeMetaType meta_type; bool ident_norm; bool func_call; + bool block_param; + bool do_loop; }; enum ASIGroup @@ -104,6 +128,7 @@ public: BAD_TOKEN, IDENTIFIER_OVERFLOW, TEMPLATE_NESTING_OVERFLOW, + BRACKET_NESTING_OVERFLOW, SCOPE_NESTING_OVERFLOW, WRONG_CLOSING_SYMBOL, ENDED_IN_INNER_SCOPE, @@ -112,12 +137,12 @@ public: JSTokenizer() = delete; explicit JSTokenizer(std::istream& in, std::ostream& out, JSIdentifierCtxBase& ident_ctx, - uint8_t max_template_nesting, uint32_t max_scope_depth, char*& buf, size_t& buf_size, + uint8_t max_template_nesting, uint32_t max_bracket_depth, char*& buf, size_t& buf_size, int cap_size = JSTOKENIZER_BUF_MAX_SIZE); ~JSTokenizer() override; // internal actions before calling main loop - void pre_yylex(bool adjust_output = false); + void pre_yylex(); // returns JSRet int yylex() override; @@ -134,30 +159,47 @@ private: void switch_to_temporal(const std::string& data); JSRet eval_eof(); JSRet do_spacing(JSToken cur_token); - JSRet do_operator_spacing(JSToken cur_token); - void do_semicolon_insertion(ASIGroup current); + JSRet do_operator_spacing(); + JSRet do_semicolon_insertion(ASIGroup current); JSRet do_identifier_substitution(const char* lexeme, bool id_part); bool unescape(const char* lexeme); - void process_punctuator(); + void process_punctuator(JSToken tok = PUNCTUATOR); void process_closing_brace(); JSRet process_subst_open(); - void states_push(); + bool states_process(); void states_correct(int); void states_reset(); void states_over(); + void states_adjust(); // scope stack servicing JSRet scope_push(ScopeType); JSRet scope_pop(ScopeType); Scope& scope_cur(); + // program scope stack servicing + JSRet p_scope_push(ScopeMetaType); + JSRet p_scope_pop(ScopeMetaType); + // interactions with the current scope bool global_scope(); + void set_meta_type(ScopeMetaType); + ScopeMetaType meta_type(); void set_ident_norm(bool); bool ident_norm(); void set_func_call(bool); bool func_call(); + void set_block_param(bool); + bool block_param(); + void set_do_loop(bool); + bool do_loop(); + + static JSProgramScopeType m2p(ScopeMetaType); + static const char* m2str(ScopeMetaType); + static bool is_operator(JSToken); + + static const char* p_scope_codes[]; void* cur_buffer; void* tmp_buffer = nullptr; @@ -175,14 +217,17 @@ private: JSToken token = UNDEFINED; // the token before int orig_len = 0; // current token original length int norm_len = 0; // normalized length of previous tokens - int sc = 0; // current Starting Condition + int sc = 0; // current Starting Condition (0 means NOT_SET) } states[JSTOKENIZER_MAX_STATES]; int sp = 0; // points to the top of states + int eof_sp = 0; // points to the last state before the EOF + JSToken eof_token = UNDEFINED; // the last token before the EOF + int eof_sc = 0; // the last Starting Condition before the EOF + int bytes_skip = 0; // num of bytes to skip of processing in the next chunk char*& tmp_buf; size_t& tmp_buf_size; const int tmp_cap_size; - int output_steps_back; bool newline_found = false; constexpr static bool insert_semicolon[ASI_GROUP_MAX][ASI_GROUP_MAX] @@ -200,7 +245,7 @@ private: {false, false, false, false, false, false, false, false, false, false, false,} }; - const uint32_t max_scope_depth; + const uint32_t max_bracket_depth; std::stack scope_stack; }; diff --git a/src/utils/js_tokenizer.l b/src/utils/js_tokenizer.l index 8191d3dde..4159e6617 100644 --- a/src/utils/js_tokenizer.l +++ b/src/utils/js_tokenizer.l @@ -52,7 +52,8 @@ debug_logf(5, http_trace, TRACE_JS_DUMP, nullptr, \ "text '%s'\n", YYText()); \ \ - states_push(); \ + if (!states_process()) \ + break; \ } #define RETURN(r) \ @@ -906,28 +907,49 @@ USE_STRICT_DIRECTIVE_SC "\"use strict\"";*|"\'use strict\'";* /* keywords */ /* according to https://ecma-international.org/ecma-262/5.1/#sec-7.6.1.1 */ /* keywords that can appear at the begining or the end of Statement*/ -KEYWORD_BA break|continue|debugger|return +KEYWORD_BA break|continue|debugger|return /* keywords that can appear at the beginning of Statement*/ -KEYWORD_B delete|do|for|function|if|new|switch|throw|try|typeof|var|void|while|with +KEYWORD_VAR_DECL var|let|const +KEYWORD_FUNCTION function +KEYWORD_IF if +KEYWORD_FOR for +KEYWORD_WHILE while +KEYWORD_DO do +KEYWORD_WITH with +KEYWORD_SWITCH switch +KEYWORD_TRY try +KEYWORD_B delete|new|throw|typeof|void /* keywords that can not appear at the beginning or the end of Statement*/ -KEYWORD_OTHER case|catch|class|const|default|else|enum|export|extends|finally|implements|import|in|instanceof|interface|let|package|private|protected|public|static|super|yield +KEYWORD_ELSE else +KEYWORD_CATCH catch +KEYWORD_FINALLY finally +KEYWORD_CLASS class +KEYWORD_OTHER case|default|enum|export|extends|implements|import|in|instanceof|interface|package|private|protected|public|static|super|yield /* punctuators */ /* according to https://ecma-international.org/ecma-262/5.1/#sec-7.7 */ -OPEN_BRACE "{" -CLOSE_BRACE "}" -OPEN_PARENTHESIS "(" -CLOSE_PARENTHESIS ")" -OPEN_BRACKET "[" -CLOSE_BRACKET "]" -DOT_ACCESSOR "." -PUNCTUATOR_PREFIX "~"|"!" -PUNCTUATOR ">="|"=="|"!="|"==="|"!=="|";"|","|"<"|">"|"<="|"<<"|">>"|">>>"|"&"|"|"|"^"|"&&"|"||"|"?"|":"|"="|"+="|"-="|"*="|"%="|"<<="|">>="|">>>="|"&="|"|="|"^=" -OPERATOR_PREFIX "+"|"-" -OPERATOR_INCR_DECR "--"|"++" -OPERATOR "*"|"%" -DIV_OPERATOR "/" -DIV_ASSIGNMENT_OPERATOR "/=" +OPEN_BRACE "{" +CLOSE_BRACE "}" +OPEN_PARENTHESIS "(" +CLOSE_PARENTHESIS ")" +OPEN_BRACKET "[" +CLOSE_BRACKET "]" +DOT_ACCESSOR "." +PUNCTUATOR_PREFIX "~"|"!" +PUNCTUATOR_SEMICOLON ";" +PUNCTUATOR_COLON ":" +PUNCTUATOR_COMMA "," +OPERATOR_COMPARISON ">="|"=="|"!="|"==="|"!=="|"<"|">"|"<=" +OPERATOR_COMPLEX_ASSIGNMENT "+="|"-="|"*="|"%="|"<<="|">>="|">>>="|"&="|"|="|"^=" +OPERATOR_ASSIGNMENT "=" +OPERATOR_LOGICAL "?"|"&"|"|"|"^"|"&&"|"||" +OPERATOR_SHIFT "<<"|">>"|">>>" +OPERATOR_PREFIX "+"|"-" +OPERATOR_INCR_DECR "--"|"++" +OPERATOR "*"|"%" +DIV_OPERATOR "/" +DIV_ASSIGNMENT_OPERATOR "/=" +PUNCTUATOR_ARROW "=>" /* identifiers */ /* according to https://ecma-international.org/ecma-262/5.1/#sec-7.6 */ @@ -1025,7 +1047,7 @@ ALL_UNICODE [\0-\x7F]|[\xC2-\xDF][\x80-\xBF]|(\xE0[\xA0-\xBF]|[\xE1-\xEF][\x8 {BLOCK_COMMENT_SKIP} { } <> { RETURN(SCRIPT_CONTINUE) } - {LITERAL_DQ_STRING_START} { do_semicolon_insertion(ASI_GROUP_7); EXEC(do_spacing(LITERAL)) ECHO; BEGIN(dqstr); set_ident_norm(true); } + {LITERAL_DQ_STRING_START} { EXEC(do_semicolon_insertion(ASI_GROUP_7)) EXEC(do_spacing(LITERAL)) ECHO; BEGIN(dqstr); set_ident_norm(true); } {LITERAL_DQ_STRING_END} { ECHO; BEGIN(divop); } {HTML_TAG_SCRIPT_CLOSE} { BEGIN(regst); RETURN(CLOSING_TAG) } \\{CR}{LF} { } @@ -1036,7 +1058,7 @@ ALL_UNICODE [\0-\x7F]|[\xC2-\xDF][\x80-\xBF]|(\xE0[\xA0-\xBF]|[\xE1-\xEF][\x8 {LITERAL_DQ_STRING_TEXT} { ECHO; } <> { RETURN(SCRIPT_CONTINUE) } - {LITERAL_SQ_STRING_START} { do_semicolon_insertion(ASI_GROUP_7); EXEC(do_spacing(LITERAL)) ECHO; BEGIN(sqstr); set_ident_norm(true); } + {LITERAL_SQ_STRING_START} { EXEC(do_semicolon_insertion(ASI_GROUP_7)) EXEC(do_spacing(LITERAL)) ECHO; BEGIN(sqstr); set_ident_norm(true); } {LITERAL_SQ_STRING_END} { ECHO; BEGIN(divop); } {HTML_TAG_SCRIPT_CLOSE} { BEGIN(regst); RETURN(CLOSING_TAG) } \\{CR}{LF} { } @@ -1047,7 +1069,7 @@ ALL_UNICODE [\0-\x7F]|[\xC2-\xDF][\x80-\xBF]|(\xE0[\xA0-\xBF]|[\xE1-\xEF][\x8 {LITERAL_SQ_STRING_TEXT} { ECHO; } <> { RETURN(SCRIPT_CONTINUE) } - {LITERAL_TEMPLATE_START} { do_semicolon_insertion(ASI_GROUP_7); EXEC(do_spacing(LITERAL)) ECHO; BEGIN(tmpll); set_ident_norm(true); } + {LITERAL_TEMPLATE_START} { EXEC(do_semicolon_insertion(ASI_GROUP_7)) EXEC(do_spacing(LITERAL)) ECHO; BEGIN(tmpll); set_ident_norm(true); } (\\\\)*{LITERAL_TEMPLATE_END} { ECHO; BEGIN(divop); } (\\\\)*{LITERAL_TEMPLATE_SUBST_START} { EXEC(process_subst_open()) } {HTML_TAG_SCRIPT_CLOSE} { BEGIN(regst); RETURN(CLOSING_TAG) } @@ -1056,7 +1078,7 @@ ALL_UNICODE [\0-\x7F]|[\xC2-\xDF][\x80-\xBF]|(\xE0[\xA0-\xBF]|[\xE1-\xEF][\x8 {LITERAL_TEMPLATE_OTHER} { ECHO; } <> { RETURN(SCRIPT_CONTINUE) } -{LITERAL_REGEX_START} { do_semicolon_insertion(ASI_GROUP_7); EXEC(do_spacing(LITERAL)) yyout << '/'; states_correct(1); yyless(1); BEGIN(regex); set_ident_norm(true); } +{LITERAL_REGEX_START} { EXEC(do_semicolon_insertion(ASI_GROUP_7)) EXEC(do_spacing(LITERAL)) yyout << '/'; states_correct(1); yyless(1); BEGIN(regex); set_ident_norm(true); } {LITERAL_REGEX_END} { ECHO; BEGIN(divop); } {HTML_TAG_SCRIPT_CLOSE} { BEGIN(regst); RETURN(CLOSING_TAG) } {LITERAL_REGEX_SKIP} { ECHO; } @@ -1069,29 +1091,50 @@ ALL_UNICODE [\0-\x7F]|[\xC2-\xDF][\x80-\xBF]|(\xE0[\xA0-\xBF]|[\xE1-\xEF][\x8 {DIV_OPERATOR} | {DIV_ASSIGNMENT_OPERATOR} { previous_group = ASI_OTHER; ECHO; token = PUNCTUATOR; BEGIN(INITIAL); set_ident_norm(true); } -{OPEN_BRACE} { do_semicolon_insertion(ASI_GROUP_1); EXEC(scope_push(BRACES)) if (!brace_depth.empty()) brace_depth.top()++; process_punctuator(); } -{CLOSE_BRACE} { do_semicolon_insertion(ASI_GROUP_2); EXEC(scope_pop(BRACES)) process_closing_brace(); set_ident_norm(true); } -{OPEN_PARENTHESIS} { do_semicolon_insertion(ASI_GROUP_3); EXEC(scope_push(PARENTHESES)) if (token == IDENTIFIER || token == CLOSING_BRACKET || token == KEYWORD) set_func_call(true); process_punctuator(); } -{CLOSE_PARENTHESIS} { do_semicolon_insertion(ASI_GROUP_5); bool f_call = func_call(); bool id_norm = ident_norm(); EXEC(scope_pop(PARENTHESES)) if (!f_call) set_ident_norm(id_norm); ECHO; token = PUNCTUATOR; BEGIN(divop); } -{OPEN_BRACKET} { do_semicolon_insertion(ASI_GROUP_3); do_semicolon_insertion(ASI_GROUP_4); EXEC(scope_push(BRACKETS)) process_punctuator(); } -{CLOSE_BRACKET} { do_semicolon_insertion(ASI_GROUP_4); EXEC(scope_pop(BRACKETS)) ECHO; token = CLOSING_BRACKET; BEGIN(divop); } +{OPEN_BRACE} { EXEC(do_semicolon_insertion(ASI_GROUP_1)) if (meta_type() == ScopeMetaType::NOT_SET) { if (is_operator(token) || token == COLON || func_call()) set_meta_type(ScopeMetaType::OBJECT); else { set_meta_type(ScopeMetaType::BLOCK); EXEC(p_scope_push(meta_type())) } } EXEC(scope_push(BRACES)) if (!brace_depth.empty()) brace_depth.top()++; process_punctuator(); } +{CLOSE_BRACE} { EXEC(do_semicolon_insertion(ASI_GROUP_2)) if (meta_type() != ScopeMetaType::NOT_SET) EXEC(p_scope_pop(meta_type())) EXEC(scope_pop(BRACES)) process_closing_brace(); set_ident_norm(true); } +{OPEN_PARENTHESIS} { EXEC(do_semicolon_insertion(ASI_GROUP_3)) EXEC(scope_push(PARENTHESES)) if (token == IDENTIFIER || token == CLOSING_BRACKET || token == KEYWORD) set_func_call(true); process_punctuator(); } +{CLOSE_PARENTHESIS} { bool f_call = func_call(); bool id_norm = ident_norm(); if (meta_type() != ScopeMetaType::NOT_SET) EXEC(p_scope_pop(meta_type())) EXEC(scope_pop(PARENTHESES)) if (!f_call) set_ident_norm(id_norm); if (block_param()) { previous_group = ASI_OTHER; set_block_param(false); } else { EXEC(do_semicolon_insertion(ASI_GROUP_5)) } ECHO; token = PUNCTUATOR; BEGIN(divop); } +{OPEN_BRACKET} { EXEC(do_semicolon_insertion(ASI_GROUP_3)) EXEC(do_semicolon_insertion(ASI_GROUP_4)) EXEC(scope_push(BRACKETS)) process_punctuator(); } +{CLOSE_BRACKET} { EXEC(do_semicolon_insertion(ASI_GROUP_4)) EXEC(scope_pop(BRACKETS)) ECHO; token = CLOSING_BRACKET; BEGIN(divop); } -{PUNCTUATOR_PREFIX} { do_semicolon_insertion(ASI_GROUP_10); process_punctuator(); set_ident_norm(true); } +{PUNCTUATOR_PREFIX} { EXEC(do_semicolon_insertion(ASI_GROUP_10)) process_punctuator(); set_ident_norm(true); } {DOT_ACCESSOR} { previous_group = ASI_OTHER; ECHO; token = DOT; BEGIN(regst); } -{PUNCTUATOR} { previous_group = ASI_OTHER; process_punctuator(); set_ident_norm(true); } +{PUNCTUATOR_ARROW} { previous_group = ASI_OTHER; process_punctuator(); set_ident_norm(true); if (meta_type() == ScopeMetaType::NOT_SET) { set_meta_type(ScopeMetaType::FUNCTION); EXEC(p_scope_push(meta_type())) } } +{PUNCTUATOR_SEMICOLON} { previous_group = ASI_OTHER; process_punctuator(); set_ident_norm(true); if (meta_type() != ScopeMetaType::NOT_SET) { EXEC(p_scope_pop(meta_type())) set_meta_type(ScopeMetaType::NOT_SET); } } +{PUNCTUATOR_COLON} { previous_group = ASI_OTHER; process_punctuator(COLON); set_ident_norm(true); } +{OPERATOR_COMPARISON} { previous_group = ASI_OTHER; process_punctuator(OPERATOR_COMPARISON); set_ident_norm(true); } +{OPERATOR_COMPLEX_ASSIGNMENT} { previous_group = ASI_OTHER; process_punctuator(OPERATOR_COMPLEX_ASSIGNMENT); set_ident_norm(true); } +{OPERATOR_LOGICAL} { previous_group = ASI_OTHER; process_punctuator(OPERATOR_LOGICAL); set_ident_norm(true); } +{OPERATOR_SHIFT} { previous_group = ASI_OTHER; process_punctuator(OPERATOR_SHIFT); set_ident_norm(true); } +{PUNCTUATOR_COMMA} { previous_group = ASI_OTHER; process_punctuator(); set_ident_norm(true); } {USE_STRICT_DIRECTIVE} { previous_group = ASI_OTHER; EXEC(do_spacing(DIRECTIVE)) ECHO; BEGIN(INITIAL); yyout << ';'; set_ident_norm(true); } {USE_STRICT_DIRECTIVE_SC} { previous_group = ASI_OTHER; EXEC(do_spacing(DIRECTIVE)) ECHO; BEGIN(INITIAL); set_ident_norm(true); } -{KEYWORD_B} { do_semicolon_insertion(ASI_GROUP_10); if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD)) ECHO; BEGIN(regst); } -{KEYWORD_BA} { do_semicolon_insertion(ASI_GROUP_9); if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD)) ECHO; BEGIN(regst); } +{KEYWORD_VAR_DECL} { EXEC(do_semicolon_insertion(ASI_GROUP_10)) if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD_VAR_DECL)) ECHO; BEGIN(regst); } +{KEYWORD_FUNCTION} { EXEC(do_semicolon_insertion(ASI_GROUP_10)) if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD_FUNCTION)) ECHO; BEGIN(regst); if (meta_type() == ScopeMetaType::NOT_SET) set_meta_type(ScopeMetaType::FUNCTION); } +{KEYWORD_IF} | +{KEYWORD_FOR} | +{KEYWORD_WITH} | +{KEYWORD_SWITCH} | +{KEYWORD_CATCH} { EXEC(do_semicolon_insertion(ASI_GROUP_10)) if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD_BLOCK)) ECHO; BEGIN(regst); if (meta_type() == ScopeMetaType::NOT_SET) { set_meta_type(ScopeMetaType::BLOCK); EXEC(p_scope_push(meta_type())) } set_block_param(true); } +{KEYWORD_WHILE} { EXEC(do_semicolon_insertion(ASI_GROUP_10)) if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD_BLOCK)) ECHO; BEGIN(regst); if (meta_type() == ScopeMetaType::NOT_SET) { set_meta_type(ScopeMetaType::BLOCK); EXEC(p_scope_push(meta_type())) } if (do_loop()) set_do_loop(false); else set_block_param(true); } +{KEYWORD_B} { EXEC(do_semicolon_insertion(ASI_GROUP_10)) if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD)) ECHO; BEGIN(regst); } +{KEYWORD_BA} { EXEC(do_semicolon_insertion(ASI_GROUP_9)) if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD)) ECHO; BEGIN(regst); } +{KEYWORD_TRY} | +{KEYWORD_ELSE} | +{KEYWORD_FINALLY} { EXEC(do_semicolon_insertion(ASI_GROUP_10)) if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD_BLOCK)) ECHO; BEGIN(regst); if (meta_type() == ScopeMetaType::NOT_SET) { set_meta_type(ScopeMetaType::BLOCK); EXEC(p_scope_push(meta_type())) } } +{KEYWORD_DO} { EXEC(do_semicolon_insertion(ASI_GROUP_10)) if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD_BLOCK)) ECHO; BEGIN(regst); if (meta_type() == ScopeMetaType::NOT_SET) { set_meta_type(ScopeMetaType::BLOCK); EXEC(p_scope_push(meta_type())) } set_do_loop(true); } +{KEYWORD_CLASS} { previous_group = ASI_OTHER; if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD_CLASS)) ECHO; BEGIN(regst); if (meta_type() == ScopeMetaType::NOT_SET) set_meta_type(ScopeMetaType::OBJECT); } {KEYWORD_OTHER} { previous_group = ASI_OTHER; if (token != DOT) set_ident_norm(true); EXEC(do_spacing(KEYWORD)) ECHO; BEGIN(regst); } -{OPERATOR_PREFIX} { do_semicolon_insertion(ASI_GROUP_6); EXEC(do_operator_spacing(OPERATOR)) ECHO; BEGIN(divop); set_ident_norm(true); } -{OPERATOR_INCR_DECR} { do_semicolon_insertion(ASI_GROUP_8); EXEC(do_operator_spacing(OPERATOR)) ECHO; BEGIN(divop); set_ident_norm(true); } -{OPERATOR} { previous_group = ASI_OTHER; EXEC(do_operator_spacing(OPERATOR)) ECHO; BEGIN(divop); set_ident_norm(true); } -{LITERAL} { do_semicolon_insertion(ASI_GROUP_7); EXEC(do_spacing(LITERAL)) ECHO; BEGIN(divop); set_ident_norm(true); } -{IDENTIFIER} { do_semicolon_insertion(ASI_GROUP_7); if (unescape(YYText())) { bool id_part = (token == DOT); EXEC(do_spacing(IDENTIFIER)) EXEC(do_identifier_substitution(YYText(), id_part)) } BEGIN(divop); } +{OPERATOR_ASSIGNMENT} { previous_group = ASI_OTHER; process_punctuator(OPERATOR_ASSIGNMENT); set_ident_norm(true); } +{OPERATOR_PREFIX} { EXEC(do_semicolon_insertion(ASI_GROUP_6)) EXEC(do_operator_spacing()) ECHO; BEGIN(divop); set_ident_norm(true); } +{OPERATOR_INCR_DECR} { EXEC(do_semicolon_insertion(ASI_GROUP_8)) EXEC(do_operator_spacing()) ECHO; BEGIN(divop); set_ident_norm(true); } +{OPERATOR} { previous_group = ASI_OTHER; EXEC(do_operator_spacing()) ECHO; BEGIN(divop); set_ident_norm(true); } +{LITERAL} { EXEC(do_semicolon_insertion(ASI_GROUP_7)) EXEC(do_spacing(LITERAL)) ECHO; BEGIN(divop); set_ident_norm(true); } +{IDENTIFIER} { EXEC(do_semicolon_insertion(ASI_GROUP_7)) if (unescape(YYText())) { bool id_part = (token == DOT); EXEC(do_spacing(IDENTIFIER)) EXEC(do_identifier_substitution(YYText(), id_part)) } BEGIN(divop); } .|{ALL_UNICODE} { previous_group = ASI_OTHER; ECHO; token = UNDEFINED; BEGIN(INITIAL); set_ident_norm(true); } <> { EEOF(eval_eof()) } @@ -1174,9 +1217,18 @@ static std::string unescape_unicode(const char* lexeme) // JSTokenizer members +const char* JSTokenizer::p_scope_codes[] = +{ + "invalid", + "function", + "block", + "object", + "unknown" +}; + JSTokenizer::JSTokenizer(std::istream& in, std::ostream& out, JSIdentifierCtxBase& mapper, uint8_t max_template_nesting, - uint32_t max_scope_depth, char*& buf, size_t& buf_size, int cap_size) + uint32_t max_bracket_depth, char*& buf, size_t& buf_size, int cap_size) : yyFlexLexer(in, out), max_template_nesting(max_template_nesting), ident_ctx(mapper), @@ -1184,10 +1236,9 @@ JSTokenizer::JSTokenizer(std::istream& in, std::ostream& out, tmp_buf(buf), tmp_buf_size(buf_size), tmp_cap_size(cap_size), - output_steps_back(0), - max_scope_depth(max_scope_depth) + max_bracket_depth(max_bracket_depth) { - scope_push(GLOBAL); + scope_stack.emplace(GLOBAL); BEGIN(regst); } @@ -1199,13 +1250,8 @@ JSTokenizer::~JSTokenizer() tmp_buf_size = 0; } -void JSTokenizer::pre_yylex(bool adjust_output) +void JSTokenizer::pre_yylex() { - assert(output_steps_back >= 0); - - if (adjust_output) - yyout.seekp(-output_steps_back, std::ios_base::cur); - yy_flush_buffer(YY_CURRENT_BUFFER); } @@ -1251,6 +1297,12 @@ JSTokenizer::JSRet JSTokenizer::do_spacing(JSToken cur_token) switch (token) { case PUNCTUATOR: + case COLON: + case OPERATOR_ASSIGNMENT: + case OPERATOR_COMPLEX_ASSIGNMENT: + case OPERATOR_COMPARISON: + case OPERATOR_LOGICAL: + case OPERATOR_SHIFT: case OPERATOR: case DIRECTIVE: case DOT: @@ -1261,10 +1313,25 @@ JSTokenizer::JSRet JSTokenizer::do_spacing(JSToken cur_token) case IDENTIFIER: case KEYWORD: + case KEYWORD_FUNCTION: + case KEYWORD_BLOCK: + case KEYWORD_CLASS: case LITERAL: yyout << ' '; token = cur_token; return EOS; + + case KEYWORD_VAR_DECL: + { + if (cur_token == IDENTIFIER || cur_token == DOT) + { + yyout << ' '; + token = cur_token; + return EOS; + } + else + return BAD_TOKEN; + } } assert(false); @@ -1272,25 +1339,37 @@ JSTokenizer::JSRet JSTokenizer::do_spacing(JSToken cur_token) return BAD_TOKEN; } -JSTokenizer::JSRet JSTokenizer::do_operator_spacing(JSToken cur_token) +JSTokenizer::JSRet JSTokenizer::do_operator_spacing() { switch (token) { case IDENTIFIER: case KEYWORD: + case KEYWORD_FUNCTION: + case KEYWORD_BLOCK: + case KEYWORD_CLASS: case PUNCTUATOR: + case COLON: + case OPERATOR_ASSIGNMENT: + case OPERATOR_COMPLEX_ASSIGNMENT: + case OPERATOR_COMPARISON: + case OPERATOR_LOGICAL: + case OPERATOR_SHIFT: case LITERAL: case DIRECTIVE: case DOT: case CLOSING_BRACKET: case UNDEFINED: - token = cur_token; + token = OPERATOR; return EOS; case OPERATOR: yyout << ' '; - token = cur_token; + token = OPERATOR; return EOS; + + case KEYWORD_VAR_DECL: + return BAD_TOKEN; } assert(false); @@ -1331,7 +1410,7 @@ JSTokenizer::JSRet JSTokenizer::do_identifier_substitution(const char* lexeme, b return EOS; } -void JSTokenizer::do_semicolon_insertion(ASIGroup current) +JSTokenizer::JSRet JSTokenizer::do_semicolon_insertion(ASIGroup current) { assert(current >= 0 and current < ASI_GROUP_MAX); if (newline_found) @@ -1340,12 +1419,22 @@ void JSTokenizer::do_semicolon_insertion(ASIGroup current) if (insert_semicolon[previous_group][current]) { yyout << ';'; + previous_group = ASI_OTHER; token = PUNCTUATOR; - return; + JSRet ret = EOS; + + if (meta_type() != ScopeMetaType::NOT_SET) + { + ret = p_scope_pop(meta_type()); + set_meta_type(ScopeMetaType::NOT_SET); + } + + return ret; } } previous_group = current; + return EOS; } bool JSTokenizer::unescape(const char* lexeme) @@ -1359,10 +1448,10 @@ bool JSTokenizer::unescape(const char* lexeme) return true; } -void JSTokenizer::process_punctuator() +void JSTokenizer::process_punctuator(JSToken tok) { ECHO; - token = PUNCTUATOR; + token = tok; BEGIN(regst); } @@ -1403,37 +1492,20 @@ void JSTokenizer::states_reset() token = UNDEFINED; previous_group = ASI_OTHER; + bytes_skip = 0; memset(&states, 0, sizeof(states)); delete[] tmp_buf; tmp_buf = nullptr; tmp_buf_size = 0; - output_steps_back = 0; newline_found = false; - scope_stack = {}; - scope_push(GLOBAL); + scope_stack = {}; + scope_stack.emplace(GLOBAL); BEGIN(regst); } -void JSTokenizer::states_push() -{ - if (!yyleng) - return; - - bytes_read += yyleng; - - sp++; - sp %= JSTOKENIZER_MAX_STATES; - auto& state = states[sp]; - - state.token = token; - state.orig_len = yyleng; - state.norm_len = yyout.rdbuf()->pubseekoff(0, std::ios_base::cur, std::ios_base::out); - state.sc = yy_start; -} - void JSTokenizer::states_correct(int take_off) { auto delta = yyleng - take_off; @@ -1445,31 +1517,37 @@ void JSTokenizer::states_correct(int take_off) void JSTokenizer::states_over() { + int sp_idx = 0; int tail_size = 0; - int outbuf_pos = yyout.tellp(); - int outbuf_back = outbuf_pos; + // Store the last state before EOF + eof_sp = sp; + eof_token = token; + eof_sc = yy_start; + + // Evaluate a tail to renormalize and shift the current state for (int i = JSTOKENIZER_MAX_STATES; i > 0 && tail_size < tmp_cap_size; --i) { auto idx = sp + i; idx %= JSTOKENIZER_MAX_STATES; auto& state = states[idx]; - outbuf_back = state.norm_len; - - if (state.orig_len == 0) + // Continue if NOT_SET + if (state.sc == 0) continue; token = state.token; yy_start = state.sc; + sp_idx = idx; tail_size += state.orig_len; tail_size = tail_size < tmp_cap_size ? tail_size : tmp_cap_size; } - output_steps_back = outbuf_pos - outbuf_back; + // Number of already normalized bytes to be skipped + bytes_skip = tail_size; - for (int i = 0; i < JSTOKENIZER_MAX_STATES; ++i) - states[i].orig_len = 0; + // Set state pointer to the first state to be skipped + sp = sp_idx; char* buf = new char[tail_size]; @@ -1480,40 +1558,141 @@ void JSTokenizer::states_over() delete[] tmp_buf; tmp_buf = buf; tmp_buf_size = tail_size; +} + +bool JSTokenizer::states_process() +{ + if (!yyleng) + return true; - // Reverse traversal over buffer to adjust scope stack before the next PDU buffer starts - bool is_tmpl = false; - const char* c = tmp_buf + tmp_buf_size; - const char* const s = tmp_buf; - while (c-- > s) + bytes_read += yyleng; + + // Fulfillment goes after this check only in case of split over several input scripts. + // Otherwise, new state is pushed. + if (bytes_skip == 0) { - switch (*c) - { - case '{': scope_pop(BRACES); if (is_tmpl) brace_depth.pop(); break; - case '}': scope_push(BRACES); if (is_tmpl) brace_depth.push(0); break; - case '(': scope_pop(PARENTHESES); break; - case ')': - { - bool id_norm = ident_norm(); - scope_push(PARENTHESES); - if (!id_norm) - set_func_call(true); - break; - } - case '[': scope_pop(BRACKETS); break; - case ']': scope_push(BRACKETS); break; - case '`': is_tmpl = !is_tmpl; break; - } + sp++; + sp %= JSTOKENIZER_MAX_STATES; + auto& state = states[sp]; + + state.token = token; + state.orig_len = yyleng; + state.norm_len = yyout.rdbuf()->pubseekoff(0, std::ios_base::cur, std::ios_base::out); + state.sc = yy_start; + + return true; + } + + bytes_skip = bytes_skip - yyleng; + + // Ignore normalization till all the already normalized bytes are skipped or mismatch found. + // If mismatch found, adjust normalization state and renormalize from the mismatch point. + if (bytes_skip < 0) + { + bytes_skip = 0; + states_adjust(); + + // Push new state + sp++; + sp %= JSTOKENIZER_MAX_STATES; + auto& state = states[sp]; + + state.token = token; + state.orig_len = yyleng; + state.norm_len = yyout.rdbuf()->pubseekoff(0, std::ios_base::cur, std::ios_base::out); + state.sc = yy_start; + + return true; + } + // Otherwise, continue normalization from the last state without any changes + else if (bytes_skip == 0) + { + token = eof_token; + yy_start = eof_sc; } + // Meanwhile, update parsing state every match + else + { + do { ++sp; sp %= JSTOKENIZER_MAX_STATES; } + while (states[sp].sc == 0); + + auto& state = states[sp]; + token = state.token; + yy_start = state.sc; + } + + return false; +} + +void JSTokenizer::states_adjust() +{ + int outbuf_pos = yyout.rdbuf()->pubseekoff(0, std::ios_base::cur, std::ios_base::out); + assert(outbuf_pos >= 0); + + // Adjust output buffer if it was not cleaned up + if (outbuf_pos > 0) + { + // A valid state always here + auto& state = states[sp]; + assert(state.sc != 0); + + int ignore_norm_len = outbuf_pos - state.norm_len; + assert(ignore_norm_len >= 0); + + yyout.seekp(-ignore_norm_len, std::ios_base::cur); + } + + // Adjust normalization state based on specific tokens + switch (eof_token) + { + case KEYWORD_FUNCTION: set_meta_type(ScopeMetaType::NOT_SET); break; + case KEYWORD_BLOCK: p_scope_pop(meta_type()); set_meta_type(ScopeMetaType::NOT_SET); break; + case KEYWORD_CLASS: set_meta_type(ScopeMetaType::NOT_SET); break; + default: break; + } + + assert((eof_sp >= 0 && eof_sp < JSTOKENIZER_MAX_STATES)); + + // Reset all the states after the current state till the state before EOF + if (sp <= eof_sp) + memset((void*)(states + sp), 0, sizeof(states[0]) * (eof_sp - sp)); + else + { + memset((void*)(states + sp), 0, sizeof(states[0]) * (JSTOKENIZER_MAX_STATES - sp)); + memset(&states, 0, sizeof(states[0]) * eof_sp); + } + --sp; } JSTokenizer::JSRet JSTokenizer::scope_push(ScopeType t) { - if (scope_stack.size() > max_scope_depth) - return SCOPE_NESTING_OVERFLOW; + if (scope_stack.size() >= max_bracket_depth) + return BRACKET_NESTING_OVERFLOW; + + JSRet ret = EOS; + switch (meta_type()) + { + case ScopeMetaType::FUNCTION: + { + if (t == PARENTHESES) + ret = p_scope_push(meta_type()); + + break; + } + case ScopeMetaType::OBJECT: + { + if (t == BRACES) + ret = p_scope_push(meta_type()); + + break; + } + case ScopeMetaType::BLOCK: break; + case ScopeMetaType::NOT_SET: break; + default: assert(false); return BAD_TOKEN; + } scope_stack.emplace(t); - return EOS; + return ret; } JSTokenizer::JSRet JSTokenizer::scope_pop(ScopeType t) @@ -1522,7 +1701,16 @@ JSTokenizer::JSRet JSTokenizer::scope_pop(ScopeType t) return WRONG_CLOSING_SYMBOL; scope_stack.pop(); - return EOS; + + JSRet ret = EOS; + + if (t == BRACES && meta_type() != ScopeMetaType::NOT_SET) + { + ret = p_scope_pop(meta_type()); + set_meta_type(ScopeMetaType::NOT_SET); + } + + return ret; } JSTokenizer::Scope& JSTokenizer::scope_cur() @@ -1536,6 +1724,16 @@ bool JSTokenizer::global_scope() return scope_cur().type == GLOBAL; } +void JSTokenizer::set_meta_type(ScopeMetaType t) +{ + scope_cur().meta_type = t; +} + +JSTokenizer::ScopeMetaType JSTokenizer::meta_type() +{ + return scope_cur().meta_type; +} + void JSTokenizer::set_ident_norm(bool f) { scope_cur().ident_norm = f; @@ -1555,3 +1753,82 @@ bool JSTokenizer::func_call() { return scope_cur().func_call; } + +void JSTokenizer::set_block_param(bool f) +{ + scope_cur().block_param = f; +} + +bool JSTokenizer::block_param() +{ + return scope_cur().block_param; +} + +void JSTokenizer::set_do_loop(bool f) +{ + scope_cur().do_loop = f; +} + +bool JSTokenizer::do_loop() +{ + return scope_cur().do_loop; +} + +JSTokenizer::JSRet JSTokenizer::p_scope_push(ScopeMetaType t) +{ + if (!ident_ctx.scope_push(m2p(t))) + return SCOPE_NESTING_OVERFLOW; + + debug_logf(5, http_trace, TRACE_JS_PROC, nullptr, "scope pushed: '%s'\n", m2str(t)); + + return EOS; +} + +JSTokenizer::JSRet JSTokenizer::p_scope_pop(ScopeMetaType t) +{ + if (!ident_ctx.scope_pop(m2p(t))) + return WRONG_CLOSING_SYMBOL; + + debug_logf(5, http_trace, TRACE_JS_PROC, nullptr, "scope popped: '%s'\n", m2str(t)); + + return EOS; +} + +JSProgramScopeType JSTokenizer::m2p(ScopeMetaType mt) +{ + switch (mt) + { + case ScopeMetaType::FUNCTION: + return JSProgramScopeType::FUNCTION; + case ScopeMetaType::BLOCK: + case ScopeMetaType::OBJECT: + return JSProgramScopeType::BLOCK; + case ScopeMetaType::NOT_SET: + default: + assert(false); + return JSProgramScopeType::PROG_SCOPE_TYPE_MAX; + } +} + +const char* JSTokenizer::m2str(ScopeMetaType mt) +{ + mt = mt < ScopeMetaType::SCOPE_META_TYPE_MAX ? mt : ScopeMetaType::SCOPE_META_TYPE_MAX; + return p_scope_codes[mt]; +} + +bool JSTokenizer::is_operator(JSToken tok) +{ + switch (tok) + { + case OPERATOR: + case OPERATOR_ASSIGNMENT: + case OPERATOR_COMPLEX_ASSIGNMENT: + case OPERATOR_COMPARISON: + case OPERATOR_LOGICAL: + case OPERATOR_SHIFT: + return true; + default: + return false; + } +} + diff --git a/src/utils/test/js_identifier_ctx_test.cc b/src/utils/test/js_identifier_ctx_test.cc index 2b37036d1..be30c8fca 100644 --- a/src/utils/test/js_identifier_ctx_test.cc +++ b/src/utils/test/js_identifier_ctx_test.cc @@ -31,6 +31,7 @@ #include "utils/js_identifier_ctx.h" #define DEPTH 65536 +#define SCOPE_DEPTH 256 static const std::unordered_set s_ident_built_in { "console" }; @@ -38,14 +39,14 @@ TEST_CASE("JSIdentifierCtx::substitute()", "[JSIdentifierCtx]") { SECTION("same name") { - JSIdentifierCtx ident_ctx(DEPTH, s_ident_built_in); + JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ident_built_in); CHECK(!strcmp(ident_ctx.substitute("a"), "var_0000")); CHECK(!strcmp(ident_ctx.substitute("a"), "var_0000")); } SECTION("different names") { - JSIdentifierCtx ident_ctx(DEPTH, s_ident_built_in); + JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ident_built_in); CHECK(!strcmp(ident_ctx.substitute("a"), "var_0000")); CHECK(!strcmp(ident_ctx.substitute("b"), "var_0001")); @@ -53,7 +54,7 @@ TEST_CASE("JSIdentifierCtx::substitute()", "[JSIdentifierCtx]") } SECTION("depth reached") { - JSIdentifierCtx ident_ctx(2, s_ident_built_in); + JSIdentifierCtx ident_ctx(2, SCOPE_DEPTH, s_ident_built_in); CHECK(!strcmp(ident_ctx.substitute("a"), "var_0000")); CHECK(!strcmp(ident_ctx.substitute("b"), "var_0001")); @@ -63,7 +64,7 @@ TEST_CASE("JSIdentifierCtx::substitute()", "[JSIdentifierCtx]") } SECTION("max names") { - JSIdentifierCtx ident_ctx(DEPTH + 2, s_ident_built_in); + JSIdentifierCtx ident_ctx(DEPTH + 2, SCOPE_DEPTH, s_ident_built_in); std::vector n, e; n.reserve(DEPTH + 2); @@ -90,9 +91,91 @@ TEST_CASE("JSIdentifierCtx::substitute()", "[JSIdentifierCtx]") TEST_CASE("JSIdentifierCtx::built_in()", "[JSIdentifierCtx]") { - JSIdentifierCtx ident_ctx(DEPTH, s_ident_built_in); + JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ident_built_in); - SECTION("match") { CHECK(ident_ctx.built_in("console") == true); } - SECTION("no match") { CHECK(ident_ctx.built_in("foo") == false); } + CHECK(ident_ctx.built_in("console") == true); + CHECK(ident_ctx.built_in("foo") == false); +} + +TEST_CASE("JSIdentifierCtx::scopes", "[JSIdentifierCtx]") +{ + JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ident_built_in); + + SECTION("scope stack") + { + CHECK(ident_ctx.scope_check({GLOBAL})); + + ident_ctx.scope_push(JSProgramScopeType::FUNCTION); + ident_ctx.scope_push(JSProgramScopeType::BLOCK); + ident_ctx.scope_push(JSProgramScopeType::BLOCK); + CHECK(ident_ctx.scope_check({GLOBAL, FUNCTION, BLOCK, BLOCK})); + + CHECK(ident_ctx.scope_pop(JSProgramScopeType::BLOCK)); + CHECK(ident_ctx.scope_check({GLOBAL, FUNCTION, BLOCK})); + + ident_ctx.reset(); + CHECK(ident_ctx.scope_check({GLOBAL})); + } + SECTION("aliases") + { + ident_ctx.add_alias("a", "console.log"); + ident_ctx.add_alias("b", "document"); + CHECK(ident_ctx.scope_contains(0, "a")); + CHECK(ident_ctx.scope_contains(0, "b")); + CHECK(!strcmp(ident_ctx.alias_lookup("a"), "console.log")); + CHECK(!strcmp(ident_ctx.alias_lookup("b"), "document")); + + REQUIRE(ident_ctx.scope_push(JSProgramScopeType::FUNCTION)); + ident_ctx.add_alias("a", "document"); + CHECK(ident_ctx.scope_contains(1, "a")); + CHECK(!ident_ctx.scope_contains(1, "b")); + CHECK(!strcmp(ident_ctx.alias_lookup("a"), "document")); + CHECK(!strcmp(ident_ctx.alias_lookup("b"), "document")); + + REQUIRE(ident_ctx.scope_push(JSProgramScopeType::BLOCK)); + ident_ctx.add_alias("b", "console.log"); + CHECK(ident_ctx.scope_contains(2, "b")); + CHECK(!ident_ctx.scope_contains(2, "a")); + CHECK(!strcmp(ident_ctx.alias_lookup("b"), "console.log")); + CHECK(!strcmp(ident_ctx.alias_lookup("a"), "document")); + + REQUIRE(ident_ctx.scope_pop(JSProgramScopeType::BLOCK)); + REQUIRE(ident_ctx.scope_pop(JSProgramScopeType::FUNCTION)); + ident_ctx.add_alias("a", "eval"); + CHECK(ident_ctx.scope_contains(0, "a")); + CHECK(ident_ctx.scope_contains(0, "b")); + CHECK(!strcmp(ident_ctx.alias_lookup("a"), "eval")); + CHECK(!strcmp(ident_ctx.alias_lookup("b"), "document")); + + CHECK(ident_ctx.alias_lookup("c") == nullptr); + } + SECTION("scope mismatch") + { + CHECK(!ident_ctx.scope_pop(JSProgramScopeType::FUNCTION)); + CHECK(ident_ctx.scope_check({GLOBAL})); + CHECK(!ident_ctx.scope_check({FUNCTION})); + + CHECK(ident_ctx.scope_push(JSProgramScopeType::FUNCTION)); + CHECK(ident_ctx.scope_check({GLOBAL, FUNCTION})); + CHECK(!ident_ctx.scope_pop(JSProgramScopeType::BLOCK)); + CHECK(ident_ctx.scope_check({GLOBAL, FUNCTION})); + CHECK(!ident_ctx.scope_check({GLOBAL})); + } + SECTION("scope max nesting") + { + JSIdentifierCtx ident_ctx_limited(DEPTH, 2, s_ident_built_in); + + CHECK(ident_ctx_limited.scope_push(JSProgramScopeType::FUNCTION)); + CHECK(ident_ctx_limited.scope_check({GLOBAL, FUNCTION})); + + CHECK(!ident_ctx_limited.scope_push(JSProgramScopeType::FUNCTION)); + CHECK(ident_ctx_limited.scope_check({GLOBAL, FUNCTION})); + CHECK(!ident_ctx_limited.scope_push(JSProgramScopeType::FUNCTION)); + CHECK(ident_ctx_limited.scope_check({GLOBAL, FUNCTION})); + + CHECK(ident_ctx_limited.scope_pop(JSProgramScopeType::FUNCTION)); + CHECK(ident_ctx_limited.scope_push(JSProgramScopeType::FUNCTION)); + CHECK(ident_ctx_limited.scope_check({GLOBAL, FUNCTION})); + } } diff --git a/src/utils/test/js_normalizer_test.cc b/src/utils/test/js_normalizer_test.cc index 77a606f2c..4ac4d1444 100644 --- a/src/utils/test/js_normalizer_test.cc +++ b/src/utils/test/js_normalizer_test.cc @@ -24,6 +24,7 @@ #include "catch/catch.hpp" #include +#include #include "utils/js_identifier_ctx.h" #include "utils/js_normalizer.h" @@ -41,15 +42,17 @@ void TraceApi::filter(const Packet&) {} THREAD_LOCAL const snort::Trace* http_trace = nullptr; -class JSIdentifierCtxTest : public JSIdentifierCtxBase +class JSIdentifierCtxStub : public JSIdentifierCtxBase { public: - JSIdentifierCtxTest() = default; + JSIdentifierCtxStub() = default; const char* substitute(const char* identifier) override { return identifier; } bool built_in(const char*) const override { return false; } + bool scope_push(JSProgramScopeType) override { return true; } + bool scope_pop(JSProgramScopeType) override { return true; } void reset() override {} size_t size() const override { return 0; } }; @@ -60,6 +63,7 @@ using namespace snort; #define DEPTH 65535 #define MAX_TEMPLATE_NESTING 4 +#define MAX_BRACKET_DEPTH 256 #define MAX_SCOPE_DEPTH 256 static const std::unordered_set s_ident_built_in { "console", "eval", "document" }; @@ -71,8 +75,8 @@ static const std::unordered_set s_ident_built_in { "console", "eval #define DST_SIZE 512 #define NORMALIZE(src) \ - JSIdentifierCtxTest ident_ctx; \ - JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ auto ret = norm.normalize(src, sizeof(src)); \ const char* ptr = norm.get_src_next(); \ int act_len = norm.script_size(); \ @@ -95,8 +99,8 @@ static const std::unordered_set s_ident_built_in { "console", "eval #define NORMALIZE_L(src, src_len, dst, dst_len, depth, ret, ptr, len) \ { \ - JSIdentifierCtxTest ident_ctx; \ - JSNormalizer norm(ident_ctx, depth, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, depth, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ ret = norm.normalize(src, src_len); \ ptr = norm.get_src_next(); \ len = norm.script_size(); \ @@ -139,8 +143,8 @@ static const std::unordered_set s_ident_built_in { "console", "eval { \ char dst1[sizeof(exp1)]; \ \ - JSIdentifierCtx ident_ctx(DEPTH, s_ident_built_in); \ - JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); \ + JSIdentifierCtx ident_ctx(DEPTH, MAX_SCOPE_DEPTH, s_ident_built_in); \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ \ DO(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1); \ CHECK(!memcmp(exp1, dst1, sizeof(exp1) - 1)); \ @@ -153,8 +157,8 @@ static const std::unordered_set s_ident_built_in { "console", "eval char dst1[sizeof(exp1)]; \ char dst2[sizeof(exp2)]; \ \ - JSIdentifierCtx ident_ctx(DEPTH, s_ident_built_in); \ - JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); \ + JSIdentifierCtx ident_ctx(DEPTH, MAX_SCOPE_DEPTH, s_ident_built_in); \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ \ DO(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1); \ CHECK(!memcmp(exp1, dst1, sizeof(exp1) - 1)); \ @@ -169,8 +173,8 @@ static const std::unordered_set s_ident_built_in { "console", "eval { \ char dst1[sizeof(exp1)]; \ \ - JSIdentifierCtxTest ident_ctx; \ - JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ \ DO(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1); \ CHECK(!memcmp(exp1, dst1, sizeof(exp1) - 1)); \ @@ -183,8 +187,8 @@ static const std::unordered_set s_ident_built_in { "console", "eval char dst1[sizeof(exp1)]; \ char dst2[sizeof(exp2)]; \ \ - JSIdentifierCtxTest ident_ctx; \ - JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ \ DO(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1); \ CHECK(!memcmp(exp1, dst1, sizeof(exp1) - 1)); \ @@ -201,8 +205,8 @@ static const std::unordered_set s_ident_built_in { "console", "eval char dst2[sizeof(exp2)]; \ char dst3[sizeof(exp3)]; \ \ - JSIdentifierCtxTest ident_ctx; \ - JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ \ DO(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1); \ CHECK(!memcmp(exp1, dst1, sizeof(exp1) - 1)); \ @@ -220,8 +224,8 @@ static const std::unordered_set s_ident_built_in { "console", "eval { \ char dst1[sizeof(exp1)]; \ \ - JSIdentifierCtxTest ident_ctx; \ - JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ \ TRY(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1, code); \ CHECK(!memcmp(exp1, dst1, sizeof(exp1) - 1)); \ @@ -232,8 +236,8 @@ static const std::unordered_set s_ident_built_in { "console", "eval char dst1[sizeof(exp1)]; \ char dst2[sizeof(exp2)]; \ \ - JSIdentifierCtxTest ident_ctx; \ - JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ \ DO(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1); \ CHECK(!memcmp(exp1, dst1, sizeof(exp1) - 1)); \ @@ -248,8 +252,8 @@ static const std::unordered_set s_ident_built_in { "console", "eval char dst2[sizeof(exp2)]; \ char dst3[sizeof(exp3)]; \ \ - JSIdentifierCtxTest ident_ctx; \ - JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ \ DO(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1); \ CHECK(!memcmp(exp1, dst1, sizeof(exp1) - 1)); \ @@ -266,8 +270,8 @@ static const std::unordered_set s_ident_built_in { "console", "eval char dst1[sizeof(exp1)]; \ char dst2[sizeof(exp2)]; \ \ - JSIdentifierCtxTest ident_ctx; \ - JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH, limit); \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH, limit); \ \ DO(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1); \ CHECK(!memcmp(exp1, dst1, sizeof(exp1) - 1)); \ @@ -278,6 +282,136 @@ static const std::unordered_set s_ident_built_in { "console", "eval CLOSE(); \ } +#define NORM_COMBINED_2(src1, src2, exp) \ + { \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ + \ + auto ret = norm.normalize(src1, sizeof(src1) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + ret = norm.normalize(src2, sizeof(src2) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + const char end[] = ""; \ + ret = norm.normalize(end, sizeof(end) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_ENDED); \ + \ + size_t act_len = norm.script_size(); \ + REQUIRE(act_len == sizeof(exp) - 1); \ + \ + const char* dst = norm.get_script(); \ + CHECK(!memcmp(exp, dst, sizeof(exp) - 1)); \ + } + +#define NORM_COMBINED_3(src1, src2, src3, exp) \ + { \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ + \ + auto ret = norm.normalize(src1, sizeof(src1) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + ret = norm.normalize(src2, sizeof(src2) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + ret = norm.normalize(src3, sizeof(src3) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + const char end[] = ""; \ + ret = norm.normalize(end, sizeof(end) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_ENDED); \ + \ + size_t act_len = norm.script_size(); \ + REQUIRE(act_len == sizeof(exp) - 1); \ + \ + const char* dst = norm.get_script(); \ + CHECK(!memcmp(exp, dst, sizeof(exp) - 1)); \ + } + +#define NORM_COMBINED_BAD_2(src1, src2, exp, eret) \ + { \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ + \ + auto ret = norm.normalize(src1, sizeof(src1) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + ret = norm.normalize(src2, sizeof(src2) - 1); \ + REQUIRE(ret == eret); \ + \ + size_t act_len = norm.script_size(); \ + REQUIRE(act_len == sizeof(exp) - 1); \ + \ + const char* dst = norm.get_script(); \ + CHECK(!memcmp(exp, dst, sizeof(exp) - 1)); \ + } + +#define NORM_COMBINED_BAD_3(src1, src2, src3, exp, eret) \ + { \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ + \ + auto ret = norm.normalize(src1, sizeof(src1) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + ret = norm.normalize(src2, sizeof(src2) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + ret = norm.normalize(src3, sizeof(src3) - 1); \ + REQUIRE(ret == eret); \ + \ + size_t act_len = norm.script_size(); \ + REQUIRE(act_len == sizeof(exp) - 1); \ + \ + const char* dst = norm.get_script(); \ + CHECK(!memcmp(exp, dst, sizeof(exp) - 1)); \ + } + +#define NORM_COMBINED_LIMITED_2(limit, src1, src2, exp) \ + { \ + JSIdentifierCtxStub ident_ctx; \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH, limit); \ + \ + auto ret = norm.normalize(src1, sizeof(src1) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + ret = norm.normalize(src2, sizeof(src2) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + const char end[] = ""; \ + ret = norm.normalize(end, sizeof(end) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_ENDED); \ + \ + size_t act_len = norm.script_size(); \ + REQUIRE(act_len == sizeof(exp) - 1); \ + \ + const char* dst = norm.get_script(); \ + CHECK(!memcmp(exp, dst, sizeof(exp) - 1)); \ + } + +#define NORM_COMBINED_S_2(src1, src2, exp) \ + { \ + JSIdentifierCtx ident_ctx(DEPTH, MAX_SCOPE_DEPTH, s_ident_built_in); \ + JSNormalizer norm(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); \ + \ + auto ret = norm.normalize(src1, sizeof(src1) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + ret = norm.normalize(src2, sizeof(src2) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_CONTINUE); \ + \ + const char end[] = ""; \ + ret = norm.normalize(end, sizeof(end) - 1); \ + REQUIRE(ret == JSTokenizer::SCRIPT_ENDED); \ + \ + size_t act_len = norm.script_size(); \ + REQUIRE(act_len == sizeof(exp) - 1); \ + \ + const char* dst = norm.get_script(); \ + CHECK(!memcmp(exp, dst, sizeof(exp) - 1)); \ + } + // ClamAV test vectors from: https://github.com/Cisco-Talos/clamav/blob/main/unit_tests/check_jsnorm.c static const char clamav_buf0[] = "function foo(a, b) {\n" @@ -508,14 +642,14 @@ static const char all_patterns_expected2[] = static const char all_patterns_buf3[] = "break case debugger in import protected do else function try " - "implements static instanceof new this class let typeof var with enum private catch " - "continue default extends public finally for if super yield return switch throw const " + "implements static instanceof new this class let a typeof var a with enum private catch " + "continue default extends public finally for if super yield return switch throw const a " "interface void while delete export package"; static const char all_patterns_expected3[] = "break case debugger in import protected do else function try " - "implements static instanceof new this class let typeof var with enum private catch " - "continue default extends public finally for if super yield return switch throw const " + "implements static instanceof new this class let a typeof var a with enum private catch " + "continue default extends public finally for if super yield return switch throw const a " "interface void while delete export package"; static const char all_patterns_buf4[] = @@ -1444,8 +1578,8 @@ TEST_CASE("endings", "[JSNormalizer]") const char* ptr; int ret; - JSIdentifierCtxTest ident_ctx; - JSNormalizer norm(ident_ctx, 7, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); + JSIdentifierCtxStub ident_ctx; + JSNormalizer norm(ident_ctx, 7, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); ret = norm.normalize(src, sizeof(src)); ptr = norm.get_src_next(); int act_len1 = norm.script_size(); @@ -1809,18 +1943,22 @@ TEST_CASE("split between tokens", "[JSNormalizer]") const char dat1[] = "var s = "; const char dat2[] = "'string';"; const char exp1[] = "var s="; - const char exp2[] = "var s='string';"; + const char exp2[] = "'string';"; + const char exp[] = "var s='string';"; NORMALIZE_2(dat1, dat2, exp1, exp2); + NORM_COMBINED_2(dat1, dat2, exp); } SECTION("operator number") { const char dat1[] = "a = 5 +"; const char dat2[] = "b + c;"; const char exp1[] = "a=5+"; - const char exp2[] = "a=5+b+c;"; + const char exp2[] = "b+c;"; + const char exp[] = "a=5+b+c;"; NORMALIZE_2(dat1, dat2, exp1, exp2); + NORM_COMBINED_2(dat1, dat2, exp); } SECTION("comment function") { @@ -1828,8 +1966,10 @@ TEST_CASE("split between tokens", "[JSNormalizer]") const char dat2[] = "foo(bar, baz);"; const char exp1[] = ""; const char exp2[] = "foo(bar,baz);"; + const char exp[] = "foo(bar,baz);"; NORMALIZE_2(dat1, dat2, exp1, exp2); + NORM_COMBINED_2(dat1, dat2, exp); } SECTION("operator identifier") { @@ -1837,10 +1977,12 @@ TEST_CASE("split between tokens", "[JSNormalizer]") const char dat2[] = "a = "; const char dat3[] = "b ;"; const char exp1[] = "var"; - const char exp2[] = "var a="; - const char exp3[] = "var a=b;"; + const char exp2[] = " a="; + const char exp3[] = "b;"; + const char exp[] = "var a=b;"; NORMALIZE_3(dat1, dat2, dat3, exp1, exp2, exp3); + NORM_COMBINED_3(dat1, dat2, dat3, exp); } } @@ -1852,8 +1994,10 @@ TEST_CASE("split in comments", "[JSNormalizer]") const char dat2[] = "/comment\n"; const char exp1[] = "/"; const char exp2[] = ""; + const char exp[] = ""; NORMALIZE_2(dat1, dat2, exp1, exp2); + NORM_COMBINED_2(dat1, dat2, exp); } SECTION("/ / msg") { @@ -1861,8 +2005,10 @@ TEST_CASE("split in comments", "[JSNormalizer]") const char dat2[] = "comment\n"; const char exp1[] = ""; const char exp2[] = ""; + const char exp[] = ""; NORMALIZE_2(dat1, dat2, exp1, exp2); + NORM_COMBINED_2(dat1, dat2, exp); } SECTION("/ / LF") { @@ -1870,8 +2016,10 @@ TEST_CASE("split in comments", "[JSNormalizer]") const char dat2[] = "\n"; const char exp1[] = ""; const char exp2[] = ""; + const char exp[] = ""; NORMALIZE_2(dat1, dat2, exp1, exp2); + NORM_COMBINED_2(dat1, dat2, exp); } SECTION("/ *") @@ -1880,8 +2028,10 @@ TEST_CASE("split in comments", "[JSNormalizer]") const char dat2[] = "* comment */"; const char exp1[] = "/"; const char exp2[] = ""; + const char exp[] = ""; NORMALIZE_2(dat1, dat2, exp1, exp2); + NORM_COMBINED_2(dat1, dat2, exp); } SECTION("/ * msg") { @@ -1889,8 +2039,10 @@ TEST_CASE("split in comments", "[JSNormalizer]") const char dat2[] = "ext */"; const char exp1[] = ""; const char exp2[] = ""; + const char exp[] = ""; NORMALIZE_2(dat1, dat2, exp1, exp2); + NORM_COMBINED_2(dat1, dat2, exp); } SECTION("* /") { @@ -1898,8 +2050,10 @@ TEST_CASE("split in comments", "[JSNormalizer]") const char dat2[] = "/"; const char exp1[] = ""; const char exp2[] = ""; + const char exp[] = ""; NORMALIZE_2(dat1, dat2, exp1, exp2); + NORM_COMBINED_2(dat1, dat2, exp); } SECTION("/ * msg * /") { @@ -1909,8 +2063,10 @@ TEST_CASE("split in comments", "[JSNormalizer]") const char exp1[] = "/"; const char exp2[] = ""; const char exp3[] = ""; + const char exp[] = ""; NORMALIZE_3(dat1, dat2, dat3, exp1, exp2, exp3); + NORM_COMBINED_3(dat1, dat2, dat3, exp); } SECTION("< !--") @@ -1919,8 +2075,10 @@ TEST_CASE("split in comments", "[JSNormalizer]") const char dat2[] = "!-- comment\n"; const char exp1[] = "<"; const char exp2[] = ""; + const char exp[] = ""; NORMALIZE_2(dat1, dat2, exp1, exp2); + NORM_COMBINED_2(dat1, dat2, exp); } SECTION(""; const char exp1[] = "<"; const char exp2[] = ""; + const char exp[] = ""; NORM_BAD_2(dat1, dat2, exp1, exp2, JSTokenizer::SCRIPT_ENDED); + NORM_COMBINED_BAD_2(dat1, dat2, exp, JSTokenizer::SCRIPT_ENDED); } SECTION("") { @@ -2032,26 +2210,32 @@ TEST_CASE("split in closing tag", "[JSNormalizer]") const char dat2[] = ">"; const char exp1[] = "") { @@ -2059,10 +2243,12 @@ TEST_CASE("split in closing tag", "[JSNormalizer]") const char dat2[] = "scr"; const char dat3[] = "ipt>"; const char exp1[] = "'") { @@ -2070,10 +2256,25 @@ TEST_CASE("split in closing tag", "[JSNormalizer]") const char dat2[] = "rip"; const char dat3[] = "t>\";"; const char exp1[] = "var str=\"") + { + const char dat1[] = ":::: stack) +{ + std::string buf(context); + buf += ""; + JSIdentifierCtx ident_ctx(DEPTH, MAX_SCOPE_DEPTH, s_ident_built_in); + JSNormalizer normalizer(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); + normalizer.normalize(buf.c_str(), buf.size()); + CHECK(ident_ctx.get_types() == stack); +} + +TEST_CASE("Scope tracking - basic","[JSNormalizer]") +{ + SECTION("Global only") + test_scope("",{GLOBAL}); + + SECTION("Function scope - named function") + test_scope("function f(){",{GLOBAL,FUNCTION}); + + SECTION("Function scope - anonymous function") + test_scope("var f = function(){",{GLOBAL,FUNCTION}); + + SECTION("Function scope - arrow function") + test_scope("var f = (a,b)=>{",{GLOBAL,FUNCTION}); + + SECTION("Function scope - arrow function without scope") + test_scope("var f = (a,b)=> a",{GLOBAL,FUNCTION}); + + SECTION("Function scope - method in object initialization") + test_scope("var o = { f(){",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Function scope - method in object operation") + test_scope("+{ f(){",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Function scope - method in object as a function parameter") + test_scope("call({ f(){",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Function scope - keyword name method") + test_scope("var o = { let(){",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Function scope - 'get' name method") + test_scope("var o = { get(){",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Function scope - expression method") + test_scope("var o = { [a + 12](){",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Function scope - method as anonymous function") + test_scope("var o = { f: function(){",{GLOBAL,BLOCK,FUNCTION}); + + SECTION("Function scope - keyword name method as anonymous function") + test_scope("var o = { let: function(){",{GLOBAL,BLOCK,FUNCTION}); + + SECTION("Function scope - 'get' name method as anonymous function") + test_scope("var o = { get: function(){",{GLOBAL,BLOCK,FUNCTION}); + + SECTION("Function scope - expression method as anonymous function") + test_scope("var o = { [a + 12]: function(){",{GLOBAL,BLOCK,FUNCTION}); + + SECTION("Function scope - getter") + test_scope("var o = { get f(){",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Function scope - parametric getter") + test_scope("var o = { get [a + 12](){",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Function scope - setter") + test_scope("var o = { set f(){",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Function scope - parametric setter") + test_scope("var o = { set [a + 12](){",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Block scope - regular block") + test_scope("{",{GLOBAL,BLOCK}); + + SECTION("Block scope - object initializer") + test_scope("o = {",{GLOBAL,BLOCK}); + + SECTION("Block scope - class") + test_scope("class C{",{GLOBAL,BLOCK}); + + SECTION("Block scope - class with extends") + test_scope("class C extends A{",{GLOBAL,BLOCK}); + + SECTION("Block scope - if") + test_scope("if(true){",{GLOBAL,BLOCK}); + + SECTION("Block scope - single statement if") + test_scope("if(true) func()",{GLOBAL,BLOCK}); + + SECTION("Block scope - nested multiple single statement ifs") + test_scope("if(a) if(b) if(c) if(d) func()",{GLOBAL,BLOCK}); + + SECTION("Block scope - nested multiple single statement ifs with newline") + test_scope("if(a)\nif(b)\nif(c)\nif(d)\nfunc()",{GLOBAL,BLOCK}); + + SECTION("Block scope - else") + test_scope("if(true);else{",{GLOBAL,BLOCK}); + + SECTION("Block scope - single statement else") + test_scope("if(true);else func()",{GLOBAL,BLOCK}); + + SECTION("Block scope - for loop") + test_scope("for(;;){",{GLOBAL,BLOCK}); + + SECTION("Block scope - for loop in range") + test_scope("for(i in range()){",{GLOBAL,BLOCK}); + + SECTION("Block scope - for loop of iterable") + test_scope("for(i of o){",{GLOBAL,BLOCK}); + + SECTION("Block scope - for await loop") + test_scope("for await(i of o){",{GLOBAL,BLOCK}); + + SECTION("Block scope - inside for statement") + test_scope("for(",{GLOBAL,BLOCK}); + + SECTION("Block scope - inside for statement, after semicolon") + test_scope("for(;",{GLOBAL,BLOCK}); + + SECTION("Block scope - single statement for") + test_scope("for(;;) func()",{GLOBAL,BLOCK}); + + SECTION("Block scope - for nested in single line conditional") + test_scope("if(true) for(;;) a++",{GLOBAL,BLOCK}); + + SECTION("Block scope - while") + test_scope("while(true){",{GLOBAL,BLOCK}); + + SECTION("Block scope - single statement while") + test_scope("while(true) func()",{GLOBAL,BLOCK}); + + SECTION("Block scope - do-while") + test_scope("do{",{GLOBAL,BLOCK}); + + SECTION("Block scope - single statement do-while") + test_scope("do func()",{GLOBAL,BLOCK}); + + SECTION("Block scope - try") + test_scope("try{",{GLOBAL,BLOCK}); + + SECTION("Block scope - catch") + test_scope("try{}catch(e){",{GLOBAL,BLOCK}); + + SECTION("Block scope - catch exception declaration") + test_scope("try{}catch(",{GLOBAL,BLOCK}); + + SECTION("Block scope - finally") + test_scope("try{}finally{",{GLOBAL,BLOCK}); + + SECTION("Block scope - nested object - named") + test_scope("var o = {s:{",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Block scope - nested object - keyword named") + test_scope("var o = {let:{",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Block scope - nested object - 'get' named") + test_scope("var o = {get:{",{GLOBAL,BLOCK,BLOCK}); + + SECTION("Block scope - nested object - parametric") + test_scope("var o = {[a+12]:{",{GLOBAL,BLOCK,BLOCK}); +} + +TEST_CASE("Scope tracking - closing","[JSNormalizer]") +{ + + SECTION("Function scope - named function") + test_scope("function f(){}",{GLOBAL}); + + SECTION("Function scope - anonymous function") + test_scope("var f = function(){}",{GLOBAL}); + + SECTION("Function scope - arrow function") + test_scope("var f = (a,b)=>{}",{GLOBAL}); + + SECTION("Function scope - arrow function without scope") + test_scope("var f = (a,b)=>a;",{GLOBAL}); + + SECTION("Function scope - arrow function as a function parameter") + test_scope("console.log(a=>c)",{GLOBAL}); + + SECTION("Function scope - method") + test_scope("var o = { f(){}",{GLOBAL,BLOCK}); + + SECTION("Function scope - keyword name method") + test_scope("var o = { let(){}",{GLOBAL,BLOCK}); + + SECTION("Function scope - expression method") + test_scope("var o = { [a + 12](){}",{GLOBAL,BLOCK}); + + SECTION("Function scope - method as anonymous function") + test_scope("var o = { f: function(){}",{GLOBAL,BLOCK}); + + SECTION("Function scope - keyword name method as anonymous function") + test_scope("var o = { let: function(){}",{GLOBAL,BLOCK}); + + SECTION("Function scope - expression method as anonymous function") + test_scope("var o = { [a + 12]: function(){}",{GLOBAL,BLOCK}); + + SECTION("Function scope - getter") + test_scope("var o = { get f(){}",{GLOBAL,BLOCK}); + + SECTION("Function scope - parametric getter") + test_scope("var o = { get [a + 12](){}",{GLOBAL,BLOCK}); + + SECTION("Function scope - setter") + test_scope("var o = { set f(){}",{GLOBAL,BLOCK}); + + SECTION("Function scope - parametric setter") + test_scope("var o = { set [a + 12](){}",{GLOBAL,BLOCK}); + + SECTION("Block scope - regular block") + test_scope("{}",{GLOBAL}); + + SECTION("Block scope - object initializer") + test_scope("o = {}",{GLOBAL}); + + SECTION("Block scope - class") + test_scope("class C{}",{GLOBAL}); + + SECTION("Block scope - class with extends") + test_scope("class C extends A{}",{GLOBAL}); + + SECTION("Block scope - if") + test_scope("if(true){}",{GLOBAL}); + + SECTION("Block scope - single statement if") + test_scope("if(true);",{GLOBAL}); + + SECTION("Block scope - single statement if, semicolon group terminated") + test_scope("if(true)\na++\nreturn",{GLOBAL}); + + SECTION("Block scope - nested multiple single statement ifs") + test_scope("if(a) if(b) if(c) if(d) func();",{GLOBAL}); + + SECTION("Block scope - nested multiple single statement ifs with newline") + test_scope("if(a)\nif(b)\nif(c)\nif(d)\nfunc()\nfunc()",{GLOBAL}); + + SECTION("Block scope - else") + test_scope("if(true);else{}",{GLOBAL}); + + SECTION("Block scope - single statement else") + test_scope("if(true);else;",{GLOBAL}); + + SECTION("Block scope - for loop") + test_scope("for(;;){}",{GLOBAL}); + + SECTION("Block scope - for loop in range") + test_scope("for(i in range()){}",{GLOBAL}); + + SECTION("Block scope - for loop of iterable") + test_scope("for(i of o){}",{GLOBAL}); + + SECTION("Block scope - for await loop") + test_scope("for await(i of o){}",{GLOBAL}); + + SECTION("Block scope - single statement for") + test_scope("for(;;);",{GLOBAL}); + + SECTION("Block scope - while") + test_scope("while(true){}",{GLOBAL}); + + SECTION("Block scope - single statement while") + test_scope("while(true);",{GLOBAL}); + + SECTION("Block scope - do-while") + test_scope("do{}while(",{GLOBAL, BLOCK}); + + SECTION("Block scope - single statement do-while") + test_scope("do;while(",{GLOBAL, BLOCK}); + + SECTION("Block scope - try") + test_scope("try{}",{GLOBAL}); + + SECTION("Block scope - catch") + test_scope("try{}catch(e){}",{GLOBAL}); + + SECTION("Block scope - finally") + test_scope("try{}finally{}",{GLOBAL}); + + SECTION("Block scope - nested object - named") + test_scope("var o = {s:{}",{GLOBAL,BLOCK}); + + SECTION("Block scope - nested object - keyword named") + test_scope("var o = {let:{}",{GLOBAL,BLOCK}); + + SECTION("Block scope - nested object - parametric") + test_scope("var o = {[a+12]:{}",{GLOBAL,BLOCK}); + + SECTION("Block scope - advanced automatic semicolon insertion") + test_scope( + "var\na\n=\n0\n\n" // var a=0; + "for\n(\nlet\na\n=\n0\na\n<\n5\n++\na\n)\na\n+=\n2\n\n" // for (let a = 0;a<5;++a) a+=2; + "do\nlet\na\n=\n0\nwhile\n(\na\n<\n5\n)\n\n" // do let a=0; while (a < 5); + "++\na\n\n" // ++a; + "while\n(a\n<\n5\n)\na\n+=\n2\n\n" // while (a<5) a+=2; + "if\n(\ntrue\n)\nlet\na\n=\n0\n\n" // if (true) let a=0; + "else\nlet\na\n=\n0\n\na;", // else let a=0;a; + {GLOBAL} + ); + + SECTION("Block scope - inline block in the end of outer scope") + test_scope("function() { if (true)\nfor ( ; ; ) a = 2 }", {GLOBAL}); +} + +typedef std::tuple> PduCase; +static void test_normalization(std::list pdus) +{ + JSIdentifierCtx ident_ctx(DEPTH, MAX_SCOPE_DEPTH, s_ident_built_in); + JSNormalizer normalizer(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); + for(auto pdu:pdus) + { + const char* source; + const char* expected; + std::list stack; + std::tie(source,expected,stack) = pdu; + normalizer.normalize(source, strlen(source)); + std::string result_buf(normalizer.get_script(), normalizer.script_size()); + CHECK(ident_ctx.get_types() == stack); + CHECK(result_buf == expected); + } +} + +TEST_CASE("Scope tracking - over multiple PDU","[JSNormalizer]") +{ + // Every line represents a PDU. Each pdu has input buffer, expected script + // and expected scope stack, written in that order + SECTION("general - variable extension") + test_normalization({ + {"long_", "var_0000", {GLOBAL}}, + {"variable", "var_0001", {GLOBAL}} + //FIXIT-E: if variable index will be preserved across PDUs, second pdu expected + // will be "var_0000" + }); + + SECTION("general - variable extension: builtin to identifier") + test_normalization({ + {"console", "console", {GLOBAL}}, + {"Writer", "var_0000", {GLOBAL}} + }); + + SECTION("general - variable extension: identifier to builtin") + test_normalization({ + {"con", "var_0000", {GLOBAL}}, + {"sole", "console", {GLOBAL}} + }); + + SECTION("general - variable extension that overwrites existing variable") + test_normalization({ + {"a, b, an", "var_0000,var_0001,var_0002", {GLOBAL}}, + {"other = a", "var_0000,var_0001,var_0003=var_0000", {GLOBAL}} + }); + + SECTION("general - variable extension that overwrites existing variable inside inner scope") + test_normalization({ + {"f(a, x=>{var an", "var_0000(var_0001,var_0002=>{var var_0003", {GLOBAL,FUNCTION}}, + {"other = a})", "var_0000(var_0001,var_0002=>{var var_0004=var_0001})", {GLOBAL}} + }); + + SECTION("block scope - basic open") + test_normalization({ + {"{", "{", {GLOBAL, BLOCK}}, + {"var", "{var", {GLOBAL, BLOCK}} + }); + + SECTION("block scope - basic close") + test_normalization({ + {"{", "{", {GLOBAL, BLOCK}}, + {"}", "{}", {GLOBAL}} + }); + + SECTION("block scope - open outside cross-PDU states") + test_normalization({ + {"{[1,2,3,4,5,6,7,8]", "{[1,2,3,4,5,6,7,8]", {GLOBAL, BLOCK}}, + {"}", "{[1,2,3,4,5,6,7,8]}", {GLOBAL}} + }); + + SECTION("block scope - closing brace in a string") + test_normalization({ + {"{[1,2,3,4,5,6,7,'}']", "{[1,2,3,4,5,6,7,'}']", {GLOBAL, BLOCK}}, + {"}", "{[1,2,3,4,5,6,7,'}']}", {GLOBAL}} + }); + + SECTION("block scope - for keyword split") + test_normalization({ + {"fin", "var_0000", {GLOBAL}}, + {"ally {", "finally{", {GLOBAL, BLOCK}} + }); + + SECTION("block scope - between 'for' and '('") + test_normalization({ + {"for", "for", {GLOBAL, BLOCK}}, + {"(", "for(", {GLOBAL, BLOCK}} + }); + + SECTION("block scope - fake 'for'") + test_normalization({ + {"for", "for", {GLOBAL, BLOCK}}, + {"k", "var_0000", {GLOBAL}} + }); + + SECTION("block scope - inside for-loop parentheses") + test_normalization({ + {"for(;;", "for(;;", {GLOBAL, BLOCK}}, + {");", "for(;;);", {GLOBAL}} + }); + + SECTION("block scope - between for-loop parentheses and code block") + test_normalization({ + {"for(;;)", "for(;;)", {GLOBAL, BLOCK}}, + {"{}", "for(;;){}", {GLOBAL}} + }); + + SECTION("function scope: split in 'function'") + test_normalization({ + {"func", "var_0000", {GLOBAL}}, + {"tion(", "function(", {GLOBAL,FUNCTION}} + }); + + SECTION("function scope: fake function") + test_normalization({ + {"function", "function", {GLOBAL}}, + {"al(", "var_0000(", {GLOBAL}} + }); + + SECTION("function scope: split inside string literal") + test_normalization({ + {"`$$$$$$$$function", "`$$$$$$$$function", {GLOBAL}}, + {"(){a = 0", "`$$$$$$$$function(){a = 0", {GLOBAL}} + }); + + SECTION("function scope: inside parameters") + test_normalization({ + {"function(", "function(", {GLOBAL, FUNCTION}}, + {")", "function()", {GLOBAL,FUNCTION}} + }); + + SECTION("function scope: between parameters and body") + test_normalization({ + {"function()", "function()", {GLOBAL, FUNCTION}}, + {"{", "function(){", {GLOBAL,FUNCTION}} + }); + + SECTION("function scope: inside code") + test_normalization({ + {"function(){", "function(){", {GLOBAL, FUNCTION}}, + {"}", "function(){}", {GLOBAL}} + }); + + SECTION("object initializer: basic") + test_normalization({ + {"var o = {", "var var_0000={", {GLOBAL, BLOCK}}, + {"}", "var var_0000={}", {GLOBAL}} + }); + + SECTION("false var keyword") + test_normalization({ + {"var var_a; function(){ var", "var var_0000;function(){var", {GLOBAL, FUNCTION}}, + {"_a; }", "var var_0000;function(){var_0000;}", {GLOBAL}} + }); + + SECTION("false let keyword") + test_normalization({ + {"var let_a; function(){ let", "var var_0000;function(){let", {GLOBAL, FUNCTION}}, + {"_a; }", "var var_0000;function(){var_0000;}", {GLOBAL}} + }); + + SECTION("false const keyword") + test_normalization({ + {"var const_a; function(){ const", "var var_0000;function(){const", {GLOBAL, FUNCTION}}, + {"_a; }", "var var_0000;function(){var_0000;}", {GLOBAL}} + }); + + SECTION("false class keyword") + test_normalization({ + {"var a; class", "var var_0000;class", {GLOBAL}}, + {"_a; { a }", "var var_0000;var_0001;{var_0000}", {GLOBAL}} + }); +} + +static void test_normalization_bad(const char* source, const char* expected, + JSTokenizer::JSRet eret) +{ + JSIdentifierCtx ident_ctx(DEPTH, MAX_SCOPE_DEPTH, s_ident_built_in); + JSNormalizer normalizer(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); + auto ret = normalizer.normalize(source, strlen(source)); + std::string result_buf(normalizer.get_script(), normalizer.script_size()); + CHECK(eret == ret); + CHECK(result_buf == expected); +} + +TEST_CASE("Scope tracking - error handling", "[JSNormalizer]") +{ + SECTION("not identifier after var keyword") + test_normalization_bad( + "var +;", + "var", + JSTokenizer::BAD_TOKEN + ); + + SECTION("not identifier after let keyword") + test_normalization_bad( + "let class;", + "let", + JSTokenizer::BAD_TOKEN + ); + + SECTION("not identifier after const keyword") + test_normalization_bad( + "const 1;", + "const", + JSTokenizer::BAD_TOKEN + ); + + SECTION("scope mismatch") + test_normalization_bad( + "function f() { if (true) } }", + "function var_0000(){if(true)}", + JSTokenizer::WRONG_CLOSING_SYMBOL + ); + + SECTION("scope mismatch with code block") + test_normalization_bad( + "{ { function } }", + "{{function", + JSTokenizer::WRONG_CLOSING_SYMBOL + ); + + SECTION("scope nesting overflow") + { + const char src[] = "function() { if (true) { } }"; + const char exp[] = "function(){if"; + uint32_t scope_depth = 2; + + JSIdentifierCtx ident_ctx(DEPTH, scope_depth, s_ident_built_in); + JSNormalizer normalizer(ident_ctx, DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); + auto ret = normalizer.normalize(src, strlen(src)); + std::string dst(normalizer.get_script(), normalizer.script_size()); + + CHECK(ret == JSTokenizer::SCOPE_NESTING_OVERFLOW); + CHECK(dst == exp); } } @@ -3360,8 +4253,8 @@ static JSTokenizer::JSRet norm_ret(JSNormalizer& normalizer, const std::string& TEST_CASE("JS Normalizer, literals by 8 K", "[JSNormalizer]") { - JSIdentifierCtxTest ident_ctx; - JSNormalizer normalizer(ident_ctx, UNLIM_DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); + JSIdentifierCtxStub ident_ctx; + JSNormalizer normalizer(ident_ctx, UNLIM_DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); char dst[DEPTH]; constexpr size_t size = 1 << 13; @@ -3400,7 +4293,7 @@ TEST_CASE("JS Normalizer, literals by 8 K", "[JSNormalizer]") TEST_CASE("JS Normalizer, literals by 64 K", "[JSNormalizer]") { - JSIdentifierCtxTest ident_ctx; + JSIdentifierCtxStub ident_ctx; JSNormalizer normalizer(ident_ctx, UNLIM_DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); char dst[DEPTH]; @@ -3448,9 +4341,9 @@ TEST_CASE("JS Normalizer, id normalization", "[JSNormalizer]") input.resize(DEPTH - strlen(s_closing_tag)); input.append(s_closing_tag, strlen(s_closing_tag)); - JSIdentifierCtxTest ident_ctx_mock; + JSIdentifierCtxStub ident_ctx_mock; JSNormalizer normalizer_wo_ident(ident_ctx_mock, UNLIM_DEPTH, - MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); + MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); REQUIRE(norm_ret(normalizer_wo_ident, input) == JSTokenizer::SCRIPT_ENDED); BENCHMARK("without substitution") @@ -3460,8 +4353,8 @@ TEST_CASE("JS Normalizer, id normalization", "[JSNormalizer]") }; const std::unordered_set ids{}; - JSIdentifierCtx ident_ctx(DEPTH, ids); - JSNormalizer normalizer_w_ident(ident_ctx, UNLIM_DEPTH, MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); + JSIdentifierCtx ident_ctx(DEPTH, MAX_SCOPE_DEPTH, ids); + JSNormalizer normalizer_w_ident(ident_ctx, UNLIM_DEPTH, MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); REQUIRE(norm_ret(normalizer_w_ident, input) == JSTokenizer::SCRIPT_ENDED); BENCHMARK("with substitution") @@ -3471,9 +4364,9 @@ TEST_CASE("JS Normalizer, id normalization", "[JSNormalizer]") }; const std::unordered_set ids_n { "n" }; - JSIdentifierCtx ident_ctx_ids_n(DEPTH, ids_n); + JSIdentifierCtx ident_ctx_ids_n(DEPTH, MAX_SCOPE_DEPTH, ids_n); JSNormalizer normalizer_built_ins(ident_ctx_ids_n, UNLIM_DEPTH, - MAX_TEMPLATE_NESTING, MAX_SCOPE_DEPTH); + MAX_TEMPLATE_NESTING, MAX_BRACKET_DEPTH); REQUIRE(norm_ret(normalizer_built_ins, input) == JSTokenizer::SCRIPT_ENDED); BENCHMARK("with built-ins") @@ -3486,7 +4379,7 @@ TEST_CASE("JS Normalizer, id normalization", "[JSNormalizer]") TEST_CASE("JS Normalizer, scope tracking", "[JSNormalizer]") { constexpr uint32_t depth = 65535; - JSIdentifierCtxTest ident_ctx; + JSIdentifierCtxStub ident_ctx; JSNormalizer normalizer(ident_ctx, UNLIM_DEPTH, MAX_TEMPLATE_NESTING, depth); auto src_ws = make_input("", " ", "", depth); @@ -3531,7 +4424,7 @@ TEST_CASE("JS Normalizer, automatic semicolon", "[JSNormalizer]") const char* src_wo_semicolons = wo_semicolons.c_str(); size_t src_len = w_semicolons.size(); - JSIdentifierCtxTest ident_ctx_mock; + JSIdentifierCtxStub ident_ctx_mock; JSNormalizer normalizer_wo_ident(ident_ctx_mock, UNLIM_DEPTH, MAX_TEMPLATE_NESTING, DEPTH); REQUIRE(norm_ret(normalizer_wo_ident, w_semicolons) == JSTokenizer::SCRIPT_ENDED);