and identifiers normalizer. Normalizer concatenates string literals whenever
it's possible to do. This also works with any other normalizations that result
in string literals. All JavaScript identifier names, except those from
-the ignore list, will be substituted with unified names in the following
+the ignore lists, will be substituted with unified names in the following
format: var_0000 -> var_ffff. But the unescape-like function names will be removed
from the normalized data. The Normalizer tries to expand an escaped text,
so it will appear in a usual form in the output. Moreover, Normalizer validates
for script elements. For more information on how additionally configure
Enhanced Normalizer check with the following configuration options:
js_norm_bytes_depth, js_norm_identifier_depth, js_norm_max_tmpl_nest,
-js_norm_max_bracket_depth, js_norm_max_scope_depth, js_norm_ident_ignore.
+js_norm_max_bracket_depth, js_norm_max_scope_depth, js_norm_ident_ignore,
+js_norm_prop_ignore.
Eventually Enhanced Normalizer will completely replace Legacy Normalizer.
==== Configuration
included in the ignore list. If for some reason the user wants to disable unescape
related features, then removing function's name from the ignore list does the trick.
+===== js_norm_prop_ignore
+
+js_norm_prop_ignore = {<list of ignored properties>} is an option of the enhanced
+JavaScript normalizer that defines a list of object properties and methods that
+will be kept intact during the identifiers normalization. This list should include
+methods and properties of objects that will not be tracked by assignment substitution
+functionality, for example, those that can be created implicitly.
+
+Subsequent accessors, after dot, in square brackets or after function call, will not be
+normalized as well.
+
+For example:
+
+ http_inspect.js_norm_prop_ignore = { 'split' }
+
+ in: "string".toUpperCase().split("").reverse().join("");
+ out: "string".var_0000().split("").reverse().join("");
+
+The default list of ignored properties is present in "snort_defaults.lua".
+
===== xff_headers
This configuration supports defining custom x-forwarded-for type headers. In a
'CreateHTML'
}
+default_js_norm_prop_ignore =
+{
+ -- Object
+ 'constructor', 'prototype', '__proto__', '__defineGetter__', '__defineSetter__',
+ '__lookupGetter__', '__lookupSetter__', '__count__', '__noSuchMethod__', '__parent__',
+ 'hasOwnProperty', 'isPrototypeOf', 'propertyIsEnumerable', 'toLocaleString', 'toString',
+ 'toSource', 'valueOf', 'getNotifier', 'eval', 'observe', 'unobserve', 'watch', 'unwatch',
+
+ -- Function
+ 'arguments', 'arity', 'caller', 'length', 'name', 'displayName', 'apply', 'bind', 'call',
+ 'isGenerator',
+
+ -- Number
+ 'toExponential', 'toFixed', 'toPrecision',
+
+ -- String
+ 'at', 'charAt', 'charCodeAt', 'codePointAt', 'concat', 'includes', 'endWith', 'indexOf',
+ 'lastIndexOf', 'localeCompare', 'match', 'matchAll', 'normalize', 'padEnd', 'padStart',
+ 'repeat', 'replace', 'replaceAll', 'search', 'slice', 'split', 'startsWith', 'substring',
+ 'toLocaleLowerCase', 'toLocaleUpperCase', 'toLowerCase', 'toUpperCase', 'trim', 'trimStart',
+ 'trimEnd',
+
+ -- RegExp
+ 'flags', 'dotAll', 'global', 'hasIndices', 'ignoreCase', 'multiline', 'source', 'sticky',
+ 'unicode', 'lastIndex', 'compile', 'exec', 'test', 'input', 'lastMatch', 'lastParen',
+ 'leftContext', 'rightContext',
+
+ -- Array
+ 'copyWithin', 'entries', 'every', 'fill', 'filter', 'find', 'findIndex', 'flat', 'flatMap',
+ 'forEach', 'groupBy', 'groupByToMap', 'join', 'keys', 'map', 'pop', 'push', 'reduce',
+ 'reduceRight', 'reverse', 'shift', 'unshift', 'some', 'sort', 'splice',
+
+ -- Generator
+ 'next', 'return', 'throw'
+}
+
default_http_inspect =
{
-- params not specified here get internal defaults
js_norm_ident_ignore = default_js_norm_ident_ignore,
+ js_norm_prop_ignore = default_js_norm_prop_ignore,
}
---------------------------------------------------------------------------
ip_hi_dist icmp_low_sweep icmp_med_sweep icmp_hi_sweep
default_hi_port_scan default_med_port_scan default_low_port_scan
default_variables netflow_versions default_js_norm_ident_ignore
- default_http_inspect
+ default_js_norm_prop_ignore default_http_inspect
]]
snort_whitelist_append(default_whitelist)
will be concatenated. This also works for functions that result in string
literals. Semicolons will be inserted, if not already present, according to ECMAScript
automatic semicolon insertion rules.
-All JavaScript identifier names, except those from the ignore list,
+All JavaScript identifier names, except those from the ident_ignore or prop_ignore lists,
will be substituted with unified names in the following format: var_0000 -> var_ffff.
So, the number of unique identifiers available is 65536 names per HTTP transaction.
If Normalizer overruns the configured limit, built-in alert is generated.
A config option to set the limit manually:
* http_inspect.js_norm_identifier_depth.
-Identifiers from the ignore list will be placed as is, without substitution. Starting with
+Identifiers from the ident_ignore list will be placed as is, without substitution. Starting with
the listed identifier, any chain of dot accessors, brackets and function calls will be kept
intact.
For example:
var a = console.log
a("hello") // will be substituted to 'console.log("hello")'
+For properties and methods of objects that can be created implicitly, there is a
+js_norm_prop_ignore list. All names in the call chain after the first property or
+method from the list has been occurred will not be normalized.
+
+Note that identifiers are normalized by name, i.e. an identifier and a property with the same name
+will be normalized to the same value. However, the ignore lists act separately on identifiers
+and properties.
+
+For example:
+
+ http_inspect.js_norm_prop_ignore = { 'split' }
+
+ in: "string".toUpperCase().split("").reverse().join("");
+ out: "string".var_0000().split("").reverse().join("");
+
In addition to the scope tracking, JS Normalizer specifically tracks unicode unescape
functions(unescape, decodeURI, decodeURIComponent, String.fromCharCode, String.fromCodePoint).
This allows detection of unescape functions nested within other unescape functions, which is
}
}
-snort::JSNormalizer& HttpFlowData::acquire_js_ctx(int32_t ident_depth, size_t norm_depth,
- uint8_t max_template_nesting, uint32_t max_bracket_depth, uint32_t max_scope_depth,
- const std::unordered_set<std::string>& ignored_ids)
+snort::JSNormalizer& HttpFlowData::acquire_js_ctx(const HttpParaList::JsNormParam& js_norm_param)
{
if (js_normalizer)
return *js_normalizer;
if (!js_ident_ctx)
{
- js_ident_ctx = new JSIdentifierCtx(ident_depth, max_scope_depth, ignored_ids);
+ js_ident_ctx = new JSIdentifierCtx(js_norm_param.js_identifier_depth,
+ js_norm_param.max_scope_depth, js_norm_param.ignored_ids, js_norm_param.ignored_props);
debug_logf(4, http_trace, TRACE_JS_PROC, nullptr,
- "js_ident_ctx created (ident_depth %d)\n", ident_depth);
+ "js_ident_ctx created (ident_depth %d)\n", js_norm_param.js_identifier_depth);
}
- js_normalizer = new JSNormalizer(*js_ident_ctx, norm_depth,
- max_template_nesting, max_bracket_depth);
+ js_normalizer = new JSNormalizer(*js_ident_ctx, js_norm_param.js_norm_bytes_depth,
+ js_norm_param.max_template_nesting, js_norm_param.max_bracket_depth);
debug_logf(4, http_trace, TRACE_JS_PROC, nullptr,
"js_normalizer created (norm_depth %zd, max_template_nesting %d)\n",
- norm_depth, max_template_nesting);
+ js_norm_param.js_norm_bytes_depth, js_norm_param.max_template_nesting);
return *js_normalizer;
}
}
#else
void HttpFlowData::reset_js_ident_ctx() {}
-snort::JSNormalizer& HttpFlowData::acquire_js_ctx(int32_t, size_t, uint8_t, uint32_t, uint32_t,
- const std::unordered_set<std::string>&)
+snort::JSNormalizer& HttpFlowData::acquire_js_ctx(const HttpParaList::JsNormParam&)
{ return *js_normalizer; }
void HttpFlowData::release_js_ctx() {}
#endif
#include "http_common.h"
#include "http_enum.h"
#include "http_event.h"
+#include "http_module.h"
class HttpTransaction;
class HttpJsNorm;
void reset_js_pdu_idx();
void reset_js_ident_ctx();
- snort::JSNormalizer& acquire_js_ctx(int32_t ident_depth, size_t norm_depth,
- uint8_t max_template_nesting, uint32_t max_bracket_depth, uint32_t max_scope_depth,
- const std::unordered_set<std::string>& ignored_ids);
+ snort::JSNormalizer& acquire_js_ctx(const HttpParaList::JsNormParam& js_norm_param);
void release_js_ctx();
bool is_pdu_missed();
for (auto s : params->js_norm_param.ignored_ids)
js_norm_ident_ignore += s + " ";
+ std::string js_norm_prop_ignore;
+ for (auto s : params->js_norm_param.ignored_props)
+ js_norm_prop_ignore += s + " ";
+
ConfigLogger::log_limit("request_depth", params->request_depth, -1LL);
ConfigLogger::log_limit("response_depth", params->response_depth, -1LL);
ConfigLogger::log_flag("unzip", params->unzip);
ConfigLogger::log_value("js_norm_max_scope_depth", params->js_norm_param.max_scope_depth);
if (!js_norm_ident_ignore.empty())
ConfigLogger::log_list("js_norm_ident_ignore", js_norm_ident_ignore.c_str());
+ if (!js_norm_prop_ignore.empty())
+ ConfigLogger::log_list("js_norm_prop_ignore", js_norm_prop_ignore.c_str());
ConfigLogger::log_value("bad_characters", bad_chars.c_str());
ConfigLogger::log_value("ignore_unreserved", unreserved_chars.c_str());
ConfigLogger::log_flag("percent_u", params->uri_param.percent_u);
return ret;
}
-HttpJsNorm::HttpJsNorm(const HttpParaList::UriParam& uri_param_, int64_t normalization_depth_,
- int32_t identifier_depth_, uint8_t max_template_nesting_, uint32_t max_bracket_depth_,
- uint32_t max_scope_depth_, const std::unordered_set<std::string>& ignored_ids_) :
+HttpJsNorm::HttpJsNorm(const HttpParaList::UriParam& uri_param_,
+ const HttpParaList::JsNormParam& js_norm_param_) :
uri_param(uri_param_),
+ js_norm_param(js_norm_param_),
detection_depth(UINT64_MAX),
- normalization_depth(normalization_depth_),
- identifier_depth(identifier_depth_),
- max_template_nesting(max_template_nesting_),
- max_bracket_depth(max_bracket_depth_),
- max_scope_depth(max_scope_depth_),
- ignored_ids(ignored_ids_),
mpse_otag(nullptr),
mpse_attr(nullptr),
mpse_type(nullptr)
trace_logf(2, http_trace, TRACE_JS_PROC, current_packet,
"script continues\n");
- auto& js_ctx = ssn->acquire_js_ctx(identifier_depth, normalization_depth, max_template_nesting,
- max_bracket_depth, max_scope_depth, ignored_ids);
+ auto& js_ctx = ssn->acquire_js_ctx(js_norm_param);
while (ptr < end)
{
HttpModule::increment_peg_counts(PEG_JS_INLINE);
}
- auto& js_ctx = ssn->acquire_js_ctx(identifier_depth, normalization_depth,
- max_template_nesting, max_bracket_depth, max_scope_depth, ignored_ids);
+ auto& js_ctx = ssn->acquire_js_ctx(js_norm_param);
auto output_size_before = js_ctx.script_size();
auto ret = js_normalize(js_ctx, current_packet, end, ptr, false);
class HttpJsNorm
{
public:
- HttpJsNorm(const HttpParaList::UriParam&, int64_t normalization_depth,
- int32_t identifier_depth, uint8_t max_template_nesting, uint32_t max_bracket_depth,
- uint32_t max_scope_depth, const std::unordered_set<std::string>& ignored_ids);
+ HttpJsNorm(const HttpParaList::UriParam& uri_param_,
+ const HttpParaList::JsNormParam& js_norm_param_);
~HttpJsNorm();
void set_detection_depth(size_t depth)
};
const HttpParaList::UriParam& uri_param;
+ const HttpParaList::JsNormParam& js_norm_param;
size_t detection_depth;
- int64_t normalization_depth;
- int32_t identifier_depth;
- uint8_t max_template_nesting;
- uint32_t max_bracket_depth;
- uint32_t max_scope_depth;
- const std::unordered_set<std::string>& ignored_ids;
bool configure_once = false;
snort::SearchTool* mpse_otag;
{ nullptr, Parameter::PT_MAX, nullptr, nullptr, nullptr }
};
+static const Parameter js_norm_prop_ignore_param[] =
+{
+ { "prop_name", Parameter::PT_STRING, nullptr, nullptr, "name of the object property to ignore" },
+ { nullptr, Parameter::PT_MAX, nullptr, nullptr, nullptr }
+};
+
const Parameter HttpModule::http_params[] =
{
{ "request_depth", Parameter::PT_INT, "-1:max53", "-1",
{ "js_norm_ident_ignore", Parameter::PT_LIST, js_norm_ident_ignore_param, nullptr,
"list of JavaScript ignored identifiers which will not be normalized" },
+ { "js_norm_prop_ignore", Parameter::PT_LIST, js_norm_prop_ignore_param, nullptr,
+ "list of JavaScript ignored object properties which will not be normalized" },
+
{ "max_javascript_whitespaces", Parameter::PT_INT, "1:65535", "200",
"maximum consecutive whitespaces allowed within the JavaScript obfuscated data" },
{
params->js_norm_param.ignored_ids.insert(val.get_string());
}
+ else if (val.is("prop_name"))
+ {
+ params->js_norm_param.ignored_props.insert(val.get_string());
+ }
else if (val.is("max_javascript_whitespaces"))
{
params->js_norm_param.max_javascript_whitespaces = val.get_uint16();
params->uri_param.iis_unicode_code_page);
}
- params->js_norm_param.js_norm = new HttpJsNorm(params->uri_param,
- params->js_norm_param.js_norm_bytes_depth, params->js_norm_param.js_identifier_depth,
- params->js_norm_param.max_template_nesting, params->js_norm_param.max_bracket_depth,
- params->js_norm_param.max_scope_depth, params->js_norm_param.ignored_ids);
+ params->js_norm_param.js_norm = new HttpJsNorm(params->uri_param, params->js_norm_param);
params->script_detection_handle = script_detection_handle;
uint32_t max_bracket_depth = 256;
uint32_t max_scope_depth = 256;
std::unordered_set<std::string> ignored_ids;
+ std::unordered_set<std::string> ignored_props;
int max_javascript_whitespaces = 200;
class HttpJsNorm* js_norm = nullptr;
};
long HttpTestManager::print_amount {};
bool HttpTestManager::print_hex {};
-HttpJsNorm::HttpJsNorm(const HttpParaList::UriParam& uri_param_, int64_t normalization_depth_,
- int32_t identifier_depth_, uint8_t max_template_nesting_, uint32_t max_bracket_depth_,
- uint32_t max_scope_depth_, const std::unordered_set<std::string>& ignored_ids_) :
- uri_param(uri_param_), normalization_depth(normalization_depth_),
- identifier_depth(identifier_depth_), max_template_nesting(max_template_nesting_),
- max_bracket_depth(max_bracket_depth_), max_scope_depth(max_scope_depth_),
- ignored_ids(ignored_ids_), mpse_otag(nullptr), mpse_attr(nullptr), mpse_type(nullptr) {}
+HttpJsNorm::HttpJsNorm(const HttpParaList::UriParam& uri_param_,
+ const HttpParaList::JsNormParam& js_norm_param_) :
+ uri_param(uri_param_), js_norm_param(js_norm_param_), mpse_otag(nullptr), mpse_attr(nullptr),
+ mpse_type(nullptr) {}
HttpJsNorm::~HttpJsNorm() = default;
void HttpJsNorm::configure(){}
int64_t Parameter::get_int(char const*) { return 0; }
void show_stats(PegCount*, const PegInfo*, unsigned, const char*) { }
void show_stats(PegCount*, const PegInfo*, const IndexVec&, const char*, FILE*) { }
-HttpJsNorm::HttpJsNorm(const HttpParaList::UriParam& uri_param_, int64_t normalization_depth_,
- int32_t identifier_depth_, uint8_t max_template_nesting_, uint32_t max_bracket_depth_,
- uint32_t max_scope_depth_, const std::unordered_set<std::string>& ignored_ids_) :
- uri_param(uri_param_), normalization_depth(normalization_depth_),
- identifier_depth(identifier_depth_), max_template_nesting(max_template_nesting_),
- max_bracket_depth(max_bracket_depth_), max_scope_depth(max_scope_depth_),
- ignored_ids(ignored_ids_), mpse_otag(nullptr), mpse_attr(nullptr), mpse_type(nullptr) {}
+HttpJsNorm::HttpJsNorm(const HttpParaList::UriParam& uri_param_,
+ const HttpParaList::JsNormParam& js_norm_param_) :
+ uri_param(uri_param_), js_norm_param(js_norm_param_), mpse_otag(nullptr), mpse_attr(nullptr),
+ mpse_type(nullptr) {}
HttpJsNorm::~HttpJsNorm() = default;
void HttpJsNorm::configure() {}
int64_t Parameter::get_int(char const*) { return 0; }
#define NORM_NAME_SIZE 9 // size of the normalized form plus null symbol
#define NORM_NAME_CNT 65536
+#define TYPE_NORMALIZED 1
+#define TYPE_IGNORED_ID 2
+#define TYPE_IGNORED_PROP 4
+
static char norm_names[NORM_NAME_SIZE * NORM_NAME_CNT];
static void init_norm_names()
}
JSIdentifierCtx::JSIdentifierCtx(int32_t depth, uint32_t max_scope_depth,
- const std::unordered_set<std::string>& ignore_list)
- : ignore_list(ignore_list), max_scope_depth(max_scope_depth)
+ const std::unordered_set<std::string>& ignored_ids_list,
+ const std::unordered_set<std::string>& ignored_props_list)
+ : ignored_ids_list(ignored_ids_list), ignored_props_list(ignored_props_list),
+ max_scope_depth(max_scope_depth)
{
init_norm_names();
- memset(id_fast, 0, sizeof(id_fast));
norm_name = norm_names;
norm_name_end = norm_names + NORM_NAME_SIZE * std::min(depth, NORM_NAME_CNT);
scopes.emplace_back(JSProgramScopeType::GLOBAL);
- for (const auto& iid : ignore_list)
- if (iid.length() == 1)
- id_fast[(unsigned)iid[0]] = iid.c_str();
- else
- id_names[iid] = iid.c_str();
+ init_ignored_names();
}
-const char* JSIdentifierCtx::substitute(unsigned char c)
+const char* JSIdentifierCtx::substitute(unsigned char c, bool is_property)
{
auto p = id_fast[c];
- if (p)
- return p;
-
- if (norm_name >= norm_name_end)
- return nullptr;
-
- auto n = norm_name;
- norm_name += NORM_NAME_SIZE;
- HttpModule::increment_peg_counts(HttpEnums::PEG_JS_IDENTIFIER);
+ if (is_substituted(p, is_property))
+ return is_property ? p.prop_name : p.id_name;
- return id_fast[c] = n;
+ return acquire_norm_name(id_fast[c]);
}
-const char* JSIdentifierCtx::substitute(const char* id_name)
+const char* JSIdentifierCtx::substitute(const char* id_name, bool is_property)
{
assert(*id_name);
if (id_name[1] == '\0')
- return substitute(*id_name);
+ return substitute(*id_name, is_property);
const auto it = id_names.find(id_name);
- if (it != id_names.end())
- return it->second;
+ if (it != id_names.end() && is_substituted(it->second, is_property))
+ return is_property ? it->second.prop_name : it->second.id_name;
+ return acquire_norm_name(id_names[id_name]);
+}
+
+bool JSIdentifierCtx::is_ignored(const char* id_name) const
+{
+ return id_name < norm_names ||
+ id_name >= norm_names + NORM_NAME_SIZE * NORM_NAME_CNT;
+}
+
+bool JSIdentifierCtx::is_substituted(const NormId& id, bool is_property)
+{
+ return ((id.type & TYPE_NORMALIZED) != 0) ||
+ (!is_property && ((id.type & TYPE_IGNORED_ID) != 0)) ||
+ (is_property && ((id.type & TYPE_IGNORED_PROP) != 0));
+}
+
+const char* JSIdentifierCtx::acquire_norm_name(NormId& id)
+{
if (norm_name >= norm_name_end)
return nullptr;
norm_name += NORM_NAME_SIZE;
HttpModule::increment_peg_counts(HttpEnums::PEG_JS_IDENTIFIER);
- return id_names[id_name] = n;
+ if (id.prop_name || id.id_name)
+ {
+ id.type |= TYPE_NORMALIZED;
+ if ((id.type & TYPE_IGNORED_ID) != 0)
+ return id.prop_name = n;
+ else if ((id.type & TYPE_IGNORED_PROP) != 0)
+ return id.id_name = n;
+ }
+
+ return (id = {n, n, TYPE_NORMALIZED}).id_name;
}
-bool JSIdentifierCtx::is_ignored(const char* id_name) const
+void JSIdentifierCtx::init_ignored_names()
{
- return id_name < norm_names ||
- id_name >= norm_names + NORM_NAME_SIZE * NORM_NAME_CNT;
+ for (const auto& iid : ignored_ids_list)
+ if (iid.length() == 1)
+ id_fast[(unsigned)iid[0]] = {iid.c_str(), nullptr, TYPE_IGNORED_ID};
+ else
+ id_names[iid] = {iid.c_str(), nullptr, TYPE_IGNORED_ID};
+
+ for (const auto& iprop : ignored_props_list)
+ {
+ if (iprop.length() == 1)
+ {
+ id_fast[(unsigned)iprop[0]].prop_name = iprop.c_str();
+ id_fast[(unsigned)iprop[0]].type |= TYPE_IGNORED_PROP;
+ }
+ else
+ {
+ id_names[iprop].prop_name = iprop.c_str();
+ id_names[iprop].type |= TYPE_IGNORED_PROP;
+ }
+ }
}
bool JSIdentifierCtx::scope_push(JSProgramScopeType t)
void JSIdentifierCtx::reset()
{
- memset(id_fast, 0, sizeof(id_fast));
+ memset(&id_fast, 0, sizeof(id_fast));
norm_name = norm_names;
id_names.clear();
scopes.clear();
scopes.emplace_back(JSProgramScopeType::GLOBAL);
-
- for (const auto& iid : ignore_list)
- if (iid.length() == 1)
- id_fast[(unsigned)iid[0]] = iid.c_str();
- else
- id_names[iid] = iid.c_str();
+ init_ignored_names();
}
void JSIdentifierCtx::add_alias(const char* alias, const std::string&& value)
public:
virtual ~JSIdentifierCtxBase() = default;
- virtual const char* substitute(const char* identifier) = 0;
+ virtual const char* substitute(const char* identifier, bool is_property) = 0;
virtual void add_alias(const char* alias, const std::string&& value) = 0;
virtual const char* alias_lookup(const char* alias) const = 0;
virtual bool is_ignored(const char* identifier) const = 0;
{
public:
JSIdentifierCtx(int32_t depth, uint32_t max_scope_depth,
- const std::unordered_set<std::string>& ignore_list);
+ const std::unordered_set<std::string>& ignored_ids_list,
+ const std::unordered_set<std::string>& ignored_props_list);
- virtual const char* substitute(const char* identifier) override;
+ virtual const char* substitute(const char* identifier, bool is_property) override;
virtual void add_alias(const char* alias, const std::string&& value) override;
virtual const char* alias_lookup(const char* alias) const override;
virtual bool is_ignored(const char* identifier) const override;
(sizeof(ProgramScope) * 3)); }
private:
+
+ struct NormId
+ {
+ const char* id_name = nullptr;
+ const char* prop_name = nullptr;
+ uint8_t type = 0;
+ };
+
using Alias = std::vector<std::string>;
using AliasRef = std::list<Alias*>;
using AliasMap = std::unordered_map<std::string, Alias>;
- using NameMap = std::unordered_map<std::string, const char*>;
+ using NameMap = std::unordered_map<std::string, NormId>;
class ProgramScope
{
AliasRef to_remove{};
};
- inline const char* substitute(unsigned char c);
+ inline const char* substitute(unsigned char c, bool is_property);
+ inline bool is_substituted(const NormId& id, bool is_property);
+ inline const char* acquire_norm_name(NormId& id);
+ inline void init_ignored_names();
// do not swap next two lines, the destructor frees them in the reverse order
AliasMap aliases;
std::list<ProgramScope> scopes;
- const char* id_fast[256];
+ NormId id_fast[256];
NameMap id_names;
- const std::unordered_set<std::string>& ignore_list;
+ const std::unordered_set<std::string>& ignored_ids_list;
+ const std::unordered_set<std::string>& ignored_props_list;
const char* norm_name;
const char* norm_name_end;
set_ident_norm(true);
- const char* name = ident_ctx.substitute(lexeme);
+ const char* name = ident_ctx.substitute(lexeme, id_part);
if (!name)
{
if (ident_ctx.is_ignored(name))
{
- if (id_part)
- {
- std::string n(name);
- n.push_back('+'); // any illegal symbol as a part of ID name
- name = ident_ctx.substitute(n.c_str());
- }
- else
- {
+ if (!id_part)
ignored_id_pos = yyout.rdbuf()->pubseekoff(0, yyout.cur, std::ios_base::out);
- set_ident_norm(false);
- yyout << name;
- return EOS;
- }
+ set_ident_norm(false);
+ yyout << name;
+ return EOS;
}
const char* alias = id_part ? nullptr : ident_ctx.alias_lookup(lexeme);
{
if (!id_continue && prefix_increment && dealias_stored)
{
- ident_ctx.add_alias(last_dealiased.c_str(), std::string(ident_ctx.substitute(last_dealiased.c_str())));
+ ident_ctx.add_alias(last_dealiased.c_str(),
+ std::string(ident_ctx.substitute(last_dealiased.c_str(), false)));
}
dealias_stored = false;
prefix_increment = false;
{
if (dealias_stored)
{
- ident_ctx.add_alias(last_dealiased.c_str(), std::string(ident_ctx.substitute(last_dealiased.c_str())));
+ ident_ctx.add_alias(last_dealiased.c_str(),
+ std::string(ident_ctx.substitute(last_dealiased.c_str(), false)));
}
prefix_increment = token != IDENTIFIER && token != CLOSING_BRACKET;
dealias_stored = false;
{
auto dealias = ident_ctx.alias_lookup(lexeme);
if ((!ident_norm() && id_part) ||
- (ident_ctx.is_ignored(ident_ctx.substitute(lexeme)) && !id_part))
+ (!id_part && ident_ctx.is_ignored(ident_ctx.substitute(lexeme, false))))
aliased << YYText();
else if (dealias)
aliased << dealias;
if (complex_assignment)
{
if (ident_ctx.alias_lookup(alias.c_str()))
- ident_ctx.add_alias(alias.c_str(), std::string(ident_ctx.substitute(alias.c_str())));
+ {
+ ident_ctx.add_alias(alias.c_str(),
+ std::string(ident_ctx.substitute(alias.c_str(), false)));
+ }
alias_state = ALIAS_NONE;
}
else
{
if (alias_state == ALIAS_VALUE || alias_state == ALIAS_EQUALS)
if (ident_ctx.alias_lookup(alias.c_str()))
- ident_ctx.add_alias(alias.c_str(), std::string(ident_ctx.substitute(alias.c_str())));
+ {
+ ident_ctx.add_alias(alias.c_str(),
+ std::string(ident_ctx.substitute(alias.c_str(), false)));
+ }
alias_state = ALIAS_NONE;
}
}
#define DEPTH 65536
#define SCOPE_DEPTH 256
-static const std::unordered_set<std::string> s_ignored_ids { "console" };
+static const std::unordered_set<std::string> s_ignored_ids { "console", "v" };
+static const std::unordered_set<std::string> s_ignored_props { "watch", "w" };
TEST_CASE("JSIdentifierCtx::substitute()", "[JSIdentifierCtx]")
{
SECTION("same name")
{
- JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ignored_ids, s_ignored_props);
- CHECK(!strcmp(ident_ctx.substitute("a"), "var_0000"));
- CHECK(!strcmp(ident_ctx.substitute("a"), "var_0000"));
+ CHECK(!strcmp(ident_ctx.substitute("a", false), "var_0000"));
+ CHECK(!strcmp(ident_ctx.substitute("a", false), "var_0000"));
}
SECTION("different names")
{
- JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ignored_ids, s_ignored_props);
- CHECK(!strcmp(ident_ctx.substitute("a"), "var_0000"));
- CHECK(!strcmp(ident_ctx.substitute("b"), "var_0001"));
- CHECK(!strcmp(ident_ctx.substitute("a"), "var_0000"));
+ CHECK(!strcmp(ident_ctx.substitute("a", false), "var_0000"));
+ CHECK(!strcmp(ident_ctx.substitute("b", false), "var_0001"));
+ CHECK(!strcmp(ident_ctx.substitute("a", false), "var_0000"));
}
SECTION("depth reached")
{
- JSIdentifierCtx ident_ctx(2, SCOPE_DEPTH, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(2, SCOPE_DEPTH, s_ignored_ids, s_ignored_props);
- CHECK(!strcmp(ident_ctx.substitute("a"), "var_0000"));
- CHECK(!strcmp(ident_ctx.substitute("b"), "var_0001"));
- CHECK(ident_ctx.substitute("c") == nullptr);
- CHECK(ident_ctx.substitute("d") == nullptr);
- CHECK(!strcmp(ident_ctx.substitute("a"), "var_0000"));
+ CHECK(!strcmp(ident_ctx.substitute("a", false), "var_0000"));
+ CHECK(!strcmp(ident_ctx.substitute("b", false), "var_0001"));
+ CHECK(ident_ctx.substitute("c", false) == nullptr);
+ CHECK(ident_ctx.substitute("d", false) == nullptr);
+ CHECK(!strcmp(ident_ctx.substitute("a", false), "var_0000"));
}
SECTION("max names")
{
- JSIdentifierCtx ident_ctx(DEPTH + 2, SCOPE_DEPTH, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(DEPTH + 2, SCOPE_DEPTH, s_ignored_ids, s_ignored_props);
std::vector<std::string> n, e;
n.reserve(DEPTH + 2);
}
for (int it = 0; it < DEPTH; ++it)
- CHECK(!strcmp(ident_ctx.substitute(n[it].c_str()), e[it].c_str()));
+ CHECK(!strcmp(ident_ctx.substitute(n[it].c_str(), false), e[it].c_str()));
- CHECK(ident_ctx.substitute(n[DEPTH].c_str()) == nullptr);
- CHECK(ident_ctx.substitute(n[DEPTH + 1].c_str()) == nullptr);
+ CHECK(ident_ctx.substitute(n[DEPTH].c_str(), false) == nullptr);
+ CHECK(ident_ctx.substitute(n[DEPTH + 1].c_str(), false) == nullptr);
+ }
+ SECTION("ignored identifier - single char")
+ {
+ JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ignored_ids, s_ignored_props);
+
+ CHECK(!strcmp(ident_ctx.substitute("v", false), "v"));
+ CHECK(!strcmp(ident_ctx.substitute("v", true), "var_0000"));
+ CHECK(!strcmp(ident_ctx.substitute("w", false), "var_0001"));
+ CHECK(!strcmp(ident_ctx.substitute("w", true), "w"));
+ }
+ SECTION("ignored identifier - multiple chars")
+ {
+ JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ignored_ids, s_ignored_props);
+
+ CHECK(!strcmp(ident_ctx.substitute("console", false), "console"));
+ CHECK(!strcmp(ident_ctx.substitute("console", true), "var_0000"));
+ CHECK(!strcmp(ident_ctx.substitute("watch", false), "var_0001"));
+ CHECK(!strcmp(ident_ctx.substitute("watch", true), "watch"));
}
}
TEST_CASE("JSIdentifierCtx::is_ignored()", "[JSIdentifierCtx]")
{
- JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ignored_ids);
+ SECTION("single char identifier")
+ {
+ JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ignored_ids, s_ignored_props);
+
+ auto v1 = ident_ctx.substitute("v", false);
+ auto v2 = ident_ctx.substitute("a", false);
+ auto v3 = ident_ctx.substitute("w", false);
+ auto v4 = ident_ctx.substitute("w", true);
+
+ CHECK(ident_ctx.is_ignored(v1) == true);
+ CHECK(ident_ctx.is_ignored(v2) == false);
+ CHECK(ident_ctx.is_ignored(v3) == false);
+ CHECK(ident_ctx.is_ignored(v4) == true);
+ }
+ SECTION("multiple chars identifier")
+ {
+ JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ignored_ids, s_ignored_props);
- auto v1 = ident_ctx.substitute("console");
- auto v2 = ident_ctx.substitute("foo");
+ auto v1 = ident_ctx.substitute("console", false);
+ auto v2 = ident_ctx.substitute("foo", false);
+ auto v3 = ident_ctx.substitute("watch", false);
+ auto v4 = ident_ctx.substitute("watch", true);
- CHECK(ident_ctx.is_ignored(v1) == true);
- CHECK(ident_ctx.is_ignored(v2) == false);
+ CHECK(ident_ctx.is_ignored(v1) == true);
+ CHECK(ident_ctx.is_ignored(v2) == false);
+ CHECK(ident_ctx.is_ignored(v3) == false);
+ CHECK(ident_ctx.is_ignored(v4) == true);
+ }
}
TEST_CASE("JSIdentifierCtx::scopes", "[JSIdentifierCtx]")
{
- JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(DEPTH, SCOPE_DEPTH, s_ignored_ids, s_ignored_props);
SECTION("scope stack")
{
}
SECTION("scope max nesting")
{
- JSIdentifierCtx ident_ctx_limited(DEPTH, 2, s_ignored_ids);
+ JSIdentifierCtx ident_ctx_limited(DEPTH, 2, s_ignored_ids, s_ignored_props);
CHECK(ident_ctx_limited.scope_push(JSProgramScopeType::FUNCTION));
CHECK(ident_ctx_limited.scope_check({GLOBAL, FUNCTION}));
{ \
char dst1[sizeof(exp1)]; \
\
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids); \
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props); \
JSNormalizer norm(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth); \
\
DO(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1); \
char dst1[sizeof(exp1)]; \
char dst2[sizeof(exp2)]; \
\
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids); \
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props); \
JSNormalizer norm(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth); \
\
DO(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1); \
#define NORM_COMBINED_S_2(src1, src2, exp) \
{ \
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids); \
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props); \
JSNormalizer norm(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth); \
\
auto ret = norm.normalize(src1, sizeof(src1) - 1); \
char dst3[sizeof(exp3)];
char dst4[sizeof(exp4)];
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props);
JSNormalizer norm(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth);
DO(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1);
char dst2[sizeof(exp2)];
char dst3[sizeof(exp3)];
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props);
JSNormalizer norm(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth);
TRY(src1, sizeof(src1) - 1, dst1, sizeof(dst1) - 1, JSTokenizer::SCRIPT_CONTINUE);
}
}
+TEST_CASE("ignored properties", "[JSNormalizer]")
+{
+ SECTION("basic")
+ {
+ const char dat1[] = "foo.bar ;";
+ const char dat2[] = "foo.bar() ;";
+ const char dat3[] = "foo.watch ;";
+ const char dat4[] = "foo.unwatch() ;";
+ const char dat5[] = "console.watch ;";
+ const char dat6[] = "console.unwatch() ;";
+ const char dat7[] = "console.foo.watch ;";
+ const char dat8[] = "console.foo.unwatch() ;";
+ const char dat9[] = "foo.console.watch ;";
+ const char dat10[] = "foo.console.unwatch() ;";
+
+ const char dat11[] = "foo['bar'] ;";
+ const char dat12[] = "foo[\"bar\"]() ;";
+ const char dat13[] = "foo['watch'] ;";
+ const char dat14[] = "foo[\"unwatch\"]() ;";
+ const char dat15[] = "console['watch'] ;";
+ const char dat16[] = "console[\"unwatch\"]() ;";
+ const char dat17[] = "console['foo']['watch'] ;";
+ const char dat18[] = "console[\"foo\"][\"unwatch\"]() ;";
+ const char dat19[] = "foo['console']['watch'] ;";
+ const char dat20[] = "foo[\"console\"][\"unwatch\"]() ;";
+
+ const char exp1[] = "var_0000.var_0001;";
+ const char exp2[] = "var_0000.var_0001();";
+ const char exp3[] = "var_0000.watch;";
+ const char exp4[] = "var_0000.unwatch();";
+ const char exp5[] = "console.watch;";
+ const char exp6[] = "console.unwatch();";
+ const char exp7[] = "console.foo.watch;";
+ const char exp8[] = "console.foo.unwatch();";
+ const char exp9[] = "var_0000.var_0001.watch;";
+ const char exp10[] = "var_0000.var_0001.unwatch();";
+
+ const char exp11[] = "var_0000['bar'];";
+ const char exp12[] = "var_0000[\"bar\"]();";
+ const char exp13[] = "var_0000['watch'];";
+ const char exp14[] = "var_0000[\"unwatch\"]();";
+ const char exp15[] = "console['watch'];";
+ const char exp16[] = "console[\"unwatch\"]();";
+ const char exp17[] = "console['foo']['watch'];";
+ const char exp18[] = "console[\"foo\"][\"unwatch\"]();";
+ const char exp19[] = "var_0000['console']['watch'];";
+ const char exp20[] = "var_0000[\"console\"][\"unwatch\"]();";
+
+ NORMALIZE_S(dat1, exp1);
+ NORMALIZE_S(dat2, exp2);
+ NORMALIZE_S(dat3, exp3);
+ NORMALIZE_S(dat4, exp4);
+ NORMALIZE_S(dat5, exp5);
+ NORMALIZE_S(dat6, exp6);
+ NORMALIZE_S(dat7, exp7);
+ NORMALIZE_S(dat8, exp8);
+ NORMALIZE_S(dat9, exp9);
+ NORMALIZE_S(dat10, exp10);
+
+ NORMALIZE_S(dat11, exp11);
+ NORMALIZE_S(dat12, exp12);
+ NORMALIZE_S(dat13, exp13);
+ NORMALIZE_S(dat14, exp14);
+ NORMALIZE_S(dat15, exp15);
+ NORMALIZE_S(dat16, exp16);
+ NORMALIZE_S(dat17, exp17);
+ NORMALIZE_S(dat18, exp18);
+ NORMALIZE_S(dat19, exp19);
+ NORMALIZE_S(dat20, exp20);
+ }
+
+ SECTION("chain tracking")
+ {
+ const char dat1[] = "foo.watch.bar ;";
+ const char dat2[] = "foo['watch'].bar ;";
+ const char dat3[] = "foo.bar.watch.bar ;";
+ const char dat4[] = "foo['bar'].watch['bar'] ;";
+ const char dat5[] = "foo['bar'].watch['bar'].baz ;";
+
+ const char dat6[] = "foo.unwatch().bar ;";
+ const char dat7[] = "foo['unwatch']().bar ;";
+ const char dat8[] = "foo.bar.unwatch().bar ;";
+ const char dat9[] = "foo['bar'].unwatch()['bar'] ;";
+ const char dat10[] = "foo['bar'].unwatch()['bar'].baz ;";
+
+ const char dat11[] = "foo . watch \n . bar ;";
+ const char dat12[] = "foo ['watch'] \n . bar ;";
+ const char dat13[] = "foo . /*multiline*/ watch //oneline\n . bar ;";
+
+ const char dat14[] = "foo . unwatch () \n . bar ;";
+ const char dat15[] = "foo ['unwatch'] () \n . bar ;";
+ const char dat16[] = "foo /*multiline*/ . unwatch ( ) . // oneline \n bar ;";
+
+ const char dat17[] = "foo . + watch . bar ;";
+ const char dat18[] = "foo . + ['watch'] . bar ;";
+
+ const char dat19[] = "foo . + unwatch() . bar ;";
+ const char dat20[] = "foo . + ['unwatch']() . bar ;";
+
+ // FIXIT-L: add support for proper tracking of bracket accessors.
+ // Current behavior: foo['watch'].bar -> var_0000['watch'].var_0001
+ // Expected behavior: foo['watch'].bar -> var_0000['watch'].bar
+ const char exp1[] = "var_0000.watch.bar;";
+ const char exp2[] = "var_0000['watch'].var_0001;";
+ const char exp3[] = "var_0000.var_0001.watch.bar;";
+ const char exp4[] = "var_0000['bar'].watch['bar'];";
+ const char exp5[] = "var_0000['bar'].watch['bar'].baz;";
+
+ const char exp6[] = "var_0000.unwatch().bar;";
+ const char exp7[] = "var_0000['unwatch']().var_0001;";
+ const char exp8[] = "var_0000.var_0001.unwatch().bar;";
+ const char exp9[] = "var_0000['bar'].unwatch()['bar'];";
+ const char exp10[] = "var_0000['bar'].unwatch()['bar'].baz;";
+
+ const char exp11[] = "var_0000.watch.bar;";
+ const char exp12[] = "var_0000['watch'].var_0001;";
+ const char exp13[] = "var_0000.watch.bar;";
+
+ const char exp14[] = "var_0000.unwatch().bar;";
+ const char exp15[] = "var_0000['unwatch']().var_0001;";
+ const char exp16[] = "var_0000.unwatch().bar;";
+
+ const char exp17[] = "var_0000.+var_0001.var_0002;";
+ const char exp18[] = "var_0000.+['watch'].var_0001;";
+
+ const char exp19[] = "var_0000.+var_0001().var_0002;";
+ const char exp20[] = "var_0000.+['unwatch']().var_0001;";
+
+ NORMALIZE_S(dat1, exp1);
+ NORMALIZE_S(dat2, exp2);
+ NORMALIZE_S(dat3, exp3);
+ NORMALIZE_S(dat4, exp4);
+ NORMALIZE_S(dat5, exp5);
+
+ NORMALIZE_S(dat6, exp6);
+ NORMALIZE_S(dat7, exp7);
+ NORMALIZE_S(dat8, exp8);
+ NORMALIZE_S(dat9, exp9);
+ NORMALIZE_S(dat10, exp10);
+
+ NORMALIZE_S(dat11, exp11);
+ NORMALIZE_S(dat12, exp12);
+ NORMALIZE_S(dat13, exp13);
+
+ NORMALIZE_S(dat14, exp14);
+ NORMALIZE_S(dat15, exp15);
+ NORMALIZE_S(dat16, exp16);
+
+ NORMALIZE_S(dat17, exp17);
+ NORMALIZE_S(dat18, exp18);
+
+ NORMALIZE_S(dat19, exp19);
+ NORMALIZE_S(dat20, exp20);
+ }
+
+ SECTION("scope tracking")
+ {
+ const char dat1[] = "foo.(watch).bar ;";
+ const char dat2[] = "foo(['watch']).bar ;";
+
+ const char dat3[] = "foo.bar(baz.unwatch.eval).eval ;";
+ const char dat4[] = "foo.bar(baz['unwatch'].eval).eval ;";
+
+ const char exp1[] = "var_0000.(var_0001).var_0002;";
+ const char exp2[] = "var_0000(['watch']).var_0001;";
+
+ const char exp3[] = "var_0000.var_0001(var_0002.unwatch.eval).var_0003;";
+ const char exp4[] = "var_0000.var_0001(var_0002['unwatch'].var_0003).var_0003;";
+
+ NORMALIZE_S(dat1, exp1);
+ NORMALIZE_S(dat2, exp2);
+
+ NORMALIZE_S(dat3, exp3);
+ NORMALIZE_S(dat4, exp4);
+ }
+
+ SECTION("corner cases")
+ {
+ const char dat1[] = ".watch ;";
+ const char dat2[] = ".unwatch() ;";
+
+ const char dat3[] = "'foo'.watch ;";
+ const char dat4[] = "\"foo\".unwatch() ;";
+
+ const char dat5[] = "''.split('').reverse().join('') ;";
+ const char dat6[] = "\"\".split(\"\").reverse().join(\"\") ;";
+
+ const char dat7[] = "watch () ;";
+ const char dat8[] = "watch.watch() ;";
+
+ // 'name' is present in both ignore lists
+ const char dat9[] = "name.foo ;";
+ const char dat10[] = "foo.name ;";
+ const char dat11[] = "name.name ;";
+ const char dat12[] = "name ;";
+
+ const char dat13[] = "foo.foo ;";
+ const char dat14[] = "console.console; console;";
+ const char dat15[] = "watch.watch; watch;";
+ const char dat16[] = "foo.console; console.foo; foo.watch; watch.foo ;";
+ const char dat17[] = "console.foo; foo.console; watch.foo; foo.watch ;";
+
+ const char dat18[] = "a.a ;";
+ const char dat19[] = "u.u; u;";
+ const char dat20[] = "w.w; w;";
+ const char dat21[] = "a.u; u.a; a.w; w.a ;";
+ const char dat22[] = "u.a; a.u; w.a; a.w ;";
+
+ const char exp1[] = ".watch;";
+ const char exp2[] = ".unwatch();";
+
+ const char exp3[] = "'foo'.watch;";
+ const char exp4[] = "\"foo\".unwatch();";
+
+ const char exp5[] = "''.split('').reverse().join('');";
+ const char exp6[] = "\"\".split(\"\").reverse().join(\"\");";
+
+ const char exp7[] = "var_0000();";
+ const char exp8[] = "var_0000.watch();";
+
+ const char exp9[] = "name.foo;";
+ const char exp10[] = "var_0000.name;";
+ const char exp11[] = "name.name;";
+ const char exp12[] = "name;";
+
+ const char exp13[] = "var_0000.var_0000;";
+ const char exp14[] = "console.console;console;";
+ const char exp15[] = "var_0000.watch;var_0000;";
+ const char exp16[] = "var_0000.var_0001;console.foo;var_0000.watch;var_0002.var_0000;";
+ const char exp17[] = "console.foo;var_0000.var_0001;var_0002.var_0000;var_0000.watch;";
+
+ const char exp18[] = "var_0000.var_0000;";
+ const char exp19[] = "u.u;u;";
+ const char exp20[] = "var_0000.w;var_0000;";
+ const char exp21[] = "var_0000.var_0001;u.a;var_0000.w;var_0002.var_0000;";
+ const char exp22[] = "u.a;var_0000.var_0001;var_0002.var_0000;var_0000.w;";
+
+ NORMALIZE_S(dat1, exp1);
+ NORMALIZE_S(dat2, exp2);
+
+ NORMALIZE_S(dat3, exp3);
+ NORMALIZE_S(dat4, exp4);
+
+ NORMALIZE_S(dat5, exp5);
+ NORMALIZE_S(dat6, exp6);
+
+ NORMALIZE_S(dat7, exp7);
+ NORMALIZE_S(dat8, exp8);
+
+ NORMALIZE_S(dat9, exp9);
+ NORMALIZE_S(dat10, exp10);
+ NORMALIZE_S(dat11, exp11);
+ NORMALIZE_S(dat12, exp12);
+
+ NORMALIZE_S(dat13, exp13);
+ NORMALIZE_S(dat14, exp14);
+ NORMALIZE_S(dat15, exp15);
+ NORMALIZE_S(dat16, exp16);
+ NORMALIZE_S(dat17, exp17);
+
+ NORMALIZE_S(dat18, exp18);
+ NORMALIZE_S(dat19, exp19);
+ NORMALIZE_S(dat20, exp20);
+ NORMALIZE_S(dat21, exp21);
+ NORMALIZE_S(dat22, exp22);
+ }
+}
+
TEST_CASE("ignored identifier split", "[JSNormalizer]")
{
}
}
+TEST_CASE("ignored properties split", "[JSNormalizer]")
+{
+
+#if JSTOKENIZER_MAX_STATES != 8
+#error "ignored properties split" tests are designed for 8 states depth
+#endif
+
+ SECTION("a standalone property")
+ {
+ const char dat1[] = "foo.un";
+ const char dat2[] = "watch ;";
+ const char exp1[] = "var_0000.var_0001";
+ const char exp2[] = "unwatch;";
+ const char exp_comb_1[] = "var_0000.unwatch;";
+
+ const char dat3[] = "foo. un";
+ const char dat4[] = "watch () ;";
+ const char exp3[] = "var_0000.var_0001";
+ const char exp4[] = "unwatch();";
+ const char exp_comb_2[] = "var_0000.unwatch();";
+
+ const char dat5[] = "fo";
+ const char dat6[] = "o . watch ;";
+ const char exp5[] = "var_0000";
+ const char exp6[] = "var_0001.watch;";
+ const char exp_comb_3[] = "var_0001.watch;";
+
+ const char dat7[] = "foo. ";
+ const char dat8[] = "watch ;";
+ const char exp7[] = "var_0000.";
+ const char exp8[] = "watch;";
+ const char exp_comb_4[] = "var_0000.watch;";
+
+ const char dat9[] = "foo ";
+ const char dat10[] = ". watch ;";
+ const char exp9[] = "var_0000";
+ const char exp10[] = ".watch;";
+ const char exp_comb_5[] = "var_0000.watch;";
+
+ NORMALIZE_T(dat1, dat2, exp1, exp2);
+ NORM_COMBINED_S_2(dat1, dat2, exp_comb_1);
+
+ NORMALIZE_T(dat3, dat4, exp3, exp4);
+ NORM_COMBINED_S_2(dat3, dat4, exp_comb_2);
+
+ NORMALIZE_T(dat5, dat6, exp5, exp6);
+ NORM_COMBINED_S_2(dat5, dat6, exp_comb_3);
+
+ NORMALIZE_T(dat7, dat8, exp7, exp8);
+ NORM_COMBINED_S_2(dat7, dat8, exp_comb_4);
+
+ NORMALIZE_T(dat9, dat10, exp9, exp10);
+ NORM_COMBINED_S_2(dat9, dat10, exp_comb_5);
+ }
+
+ SECTION("chain tracking")
+ {
+ const char dat1[] = "foo.un";
+ const char dat2[] = "watch.bar ;";
+ const char exp1[] = "var_0000.var_0001";
+ const char exp2[] = "unwatch.bar;";
+ const char exp_comb_1[] = "var_0000.unwatch.bar;";
+
+ const char dat3[] = "foo.un";
+ const char dat4[] = "watch().bar ;";
+ const char exp3[] = "var_0000.var_0001";
+ const char exp4[] = "unwatch().bar;";
+ const char exp_comb_2[] = "var_0000.unwatch().bar;";
+
+ const char dat5[] = "foo['un";
+ const char dat6[] = "watch'].bar ;";
+ const char exp5[] = "var_0000['un";
+ const char exp6[] = "unwatch'].var_0001;";
+ const char exp_comb_3[] = "var_0000['unwatch'].var_0001;";
+
+ const char dat7[] = "foo['un";
+ const char dat8[] = "watch']().bar ;";
+ const char exp7[] = "var_0000['un";
+ const char exp8[] = "unwatch']().var_0001;";
+ const char exp_comb_4[] = "var_0000['unwatch']().var_0001;";
+
+ const char dat9[] = "foo. /*multi";
+ const char dat10[] = "line*/ watch . bar ;";
+ const char exp9[] = "var_0000.";
+ const char exp10[] = "watch.bar;";
+ const char exp_comb_5[] = "var_0000.watch.bar;";
+
+ const char dat11[] = "foo //one";
+ const char dat12[] = "line \n . watch . bar ;";
+ const char exp11[] = "var_0000";
+ const char exp12[] = ".watch.bar;";
+ const char exp_comb_6[] = "var_0000.watch.bar;";
+
+ const char dat13[] = ".";
+ const char dat14[] = "watch ( ) . bar ;";
+ const char exp13[] = ".";
+ const char exp14[] = "watch().bar;";
+ const char exp_comb_7[] = ".watch().bar;";
+
+ const char dat15[] = ".un";
+ const char dat16[] = "watch ( ) . bar ;";
+ const char exp15[] = ".var_0000";
+ const char exp16[] = "unwatch().bar;";
+ const char exp_comb_8[] = ".unwatch().bar;";
+
+ const char dat17[] = "foo.watch ";
+ const char dat18[] = "+ bar ;";
+ const char exp17[] = "var_0000.watch";
+ const char exp18[] = "+var_0001;";
+ const char exp_comb_9[] = "var_0000.watch+var_0001;";
+
+ const char dat19[] = "foo.unwatch ( ) +";
+ const char dat20[] = "bar ;";
+ const char exp19[] = "var_0000.unwatch()+";
+ const char exp20[] = "var_0001;";
+ const char exp_comb_10[] = "var_0000.unwatch()+var_0001;";
+
+ NORMALIZE_T(dat1, dat2, exp1, exp2);
+ NORM_COMBINED_S_2(dat1, dat2, exp_comb_1);
+
+ NORMALIZE_T(dat3, dat4, exp3, exp4);
+ NORM_COMBINED_S_2(dat3, dat4, exp_comb_2);
+
+ NORMALIZE_T(dat5, dat6, exp5, exp6);
+ NORM_COMBINED_S_2(dat5, dat6, exp_comb_3);
+
+ NORMALIZE_T(dat7, dat8, exp7, exp8);
+ NORM_COMBINED_S_2(dat7, dat8, exp_comb_4);
+
+ NORMALIZE_T(dat9, dat10, exp9, exp10);
+ NORM_COMBINED_S_2(dat9, dat10, exp_comb_5);
+
+ NORMALIZE_T(dat11, dat12, exp11, exp12);
+ NORM_COMBINED_S_2(dat11, dat12, exp_comb_6);
+
+ NORMALIZE_T(dat13, dat14, exp13, exp14);
+ NORM_COMBINED_S_2(dat13, dat14, exp_comb_7);
+
+ NORMALIZE_T(dat15, dat16, exp15, exp16);
+ NORM_COMBINED_S_2(dat15, dat16, exp_comb_8);
+
+ NORMALIZE_T(dat17, dat18, exp17, exp18);
+ NORM_COMBINED_S_2(dat17, dat18, exp_comb_9);
+
+ NORMALIZE_T(dat19, dat20, exp19, exp20);
+ NORM_COMBINED_S_2(dat19, dat20, exp_comb_10);
+ }
+
+ SECTION("scope tracking")
+ {
+ const char dat1[] = "foo.(un";
+ const char dat2[] = "watch).bar ;";
+ const char exp1[] = "var_0000.(var_0001";
+ const char exp2[] = "var_0002).var_0003;";
+ const char exp_comb_1[] = "var_0000.(var_0002).var_0003;";
+
+ const char dat3[] = "foo(['un";
+ const char dat4[] = "watch']).bar ;";
+ const char exp3[] = "var_0000(['un";
+ const char exp4[] = "unwatch']).var_0001;";
+ const char exp_comb_2[] = "var_0000(['unwatch']).var_0001;";
+
+ const char dat5[] = "foo.bar(baz.un";
+ const char dat6[] = "watch() . bar ) . foo ;";
+ const char exp5[] = "var_0000.var_0001(var_0002.var_0003";
+ const char exp6[] = "unwatch().bar).var_0000;";
+ const char exp_comb_3[] = "var_0000.var_0001(var_0002.unwatch().bar).var_0000;";
+
+ const char dat7[] = "foo.bar(baz['un";
+ const char dat8[] = "watch']() . bar ) . foo ;";
+ const char exp7[] = "var_0000.var_0001(var_0002['un";
+ const char exp8[] = "unwatch']().var_0001).var_0000;";
+ const char exp_comb_4[] = "var_0000.var_0001(var_0002['unwatch']().var_0001).var_0000;";
+
+ NORMALIZE_T(dat1, dat2, exp1, exp2);
+ NORM_COMBINED_S_2(dat1, dat2, exp_comb_1);
+
+ NORMALIZE_T(dat3, dat4, exp3, exp4);
+ NORM_COMBINED_S_2(dat3, dat4, exp_comb_2);
+
+ NORMALIZE_T(dat5, dat6, exp5, exp6);
+ NORM_COMBINED_S_2(dat5, dat6, exp_comb_3);
+
+ NORMALIZE_T(dat7, dat8, exp7, exp8);
+ NORM_COMBINED_S_2(dat7, dat8, exp_comb_4);
+ }
+}
+
TEST_CASE("Scope tracking - basic","[JSNormalizer]")
{
SECTION("Global only")
const char exp[] = "function(){if";
uint32_t scope_depth = 2;
- JSIdentifierCtx ident_ctx(norm_depth, scope_depth, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(norm_depth, scope_depth, s_ignored_ids, s_ignored_props);
JSNormalizer normalizer(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth);
auto ret = normalizer.normalize(src, strlen(src));
std::string dst(normalizer.get_script(), normalizer.script_size());
TEST_CASE("Function call tracking - basic", "[JSNormalizer]")
{
- JSTokenizerTester tester(norm_depth, max_scope_depth, s_ignored_ids, max_template_nesting,
- max_bracket_depth);
+ JSTokenizerTester tester(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props,
+ max_template_nesting, max_bracket_depth);
using FuncType = JSTokenizerTester::FuncType;
SECTION("ignored fake defined function identifier")
{
const std::unordered_set<std::string> s_ignored_ids_fake {"fake_unescape"};
- JSTokenizerTester tester_fake(norm_depth, max_scope_depth, s_ignored_ids_fake,
- max_template_nesting, max_bracket_depth);
+ JSTokenizerTester tester_fake(norm_depth, max_scope_depth, s_ignored_ids_fake,
+ s_ignored_props, max_template_nesting, max_bracket_depth);
tester_fake.test_function_scopes({
{"fake_unescape(", "fake_unescape(", {FuncType::NOT_FUNC, FuncType::GENERAL}}
});
TEST_CASE("Function call tracking - nesting", "[JSNormalizer]")
{
- JSTokenizerTester tester(norm_depth, max_scope_depth, s_ignored_ids, max_template_nesting,
- max_bracket_depth);
+ JSTokenizerTester tester(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props,
+ max_template_nesting, max_bracket_depth);
using FuncType = JSTokenizerTester::FuncType;
TEST_CASE("Function call tracking - over multiple PDU", "[JSNormalizer]")
{
- JSTokenizerTester tester(norm_depth, max_scope_depth, s_ignored_ids, max_template_nesting,
- max_bracket_depth);
+ JSTokenizerTester tester(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props,
+ max_template_nesting, max_bracket_depth);
using FuncType = JSTokenizerTester::FuncType;
};
const std::unordered_set<std::string> ids{};
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, ids);
+ const std::unordered_set<std::string> props{};
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, ids, props);
JSNormalizer normalizer_w_ident(ident_ctx, unlim_depth, max_template_nesting, max_bracket_depth);
REQUIRE(norm_ret(normalizer_w_ident, input) == JSTokenizer::SCRIPT_ENDED);
};
const std::unordered_set<std::string> ids_n { "n" };
- JSIdentifierCtx ident_ctx_ids_n(norm_depth, max_scope_depth, ids_n);
+ const std::unordered_set<std::string> props_n { "n" };
+ JSIdentifierCtx ident_ctx_ids_n(norm_depth, max_scope_depth, ids_n, props_n);
JSNormalizer normalizer_iids(ident_ctx_ids_n, unlim_depth,
max_template_nesting, max_bracket_depth);
const char* src_f_unescape = f_unescape.c_str();
size_t src_len = norm_depth;
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props);
JSNormalizer norm(ident_ctx, unlim_depth, max_template_nesting, norm_depth);
REQUIRE(norm_ret(norm, str_unescape) == JSTokenizer::SCRIPT_ENDED);
{
std::string buf(context);
buf += "</script>";
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props);
JSNormalizer normalizer(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth);
normalizer.normalize(buf.c_str(), buf.size());
CHECK(ident_ctx.get_types() == stack);
void test_normalization(const char* source, const char* expected)
{
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props);
JSNormalizer normalizer(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth);
normalizer.normalize(source, strlen(source));
std::string result_buf(normalizer.get_script(), normalizer.script_size());
void test_normalization_bad(const char* source, const char* expected, JSTokenizer::JSRet eret)
{
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props);
JSNormalizer normalizer(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth);
auto ret = normalizer.normalize(source, strlen(source));
std::string result_buf(normalizer.get_script(), normalizer.script_size());
void test_normalization_mixed_encoding(const char* source, const char* expected)
{
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props);
JSNormalizer normalizer(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth);
auto ret = normalizer.normalize(source, strlen(source));
std::string result_buf(normalizer.get_script(), normalizer.script_size());
void test_normalization(const std::vector<PduCase>& pdus)
{
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props);
JSNormalizer normalizer(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth);
for (const auto& pdu : pdus)
void test_normalization(const std::list<ScopedPduCase>& pdus)
{
- JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids);
+ JSIdentifierCtx ident_ctx(norm_depth, max_scope_depth, s_ignored_ids, s_ignored_props);
JSNormalizer normalizer(ident_ctx, norm_depth, max_template_nesting, max_bracket_depth);
for (auto pdu:pdus)
{
constexpr int max_bracket_depth = 256;
constexpr int max_scope_depth = 256;
static const std::unordered_set<std::string> s_ignored_ids {
- "console", "eval", "document", "unescape", "decodeURI", "decodeURIComponent", "String"
+ "console", "eval", "document", "unescape", "decodeURI", "decodeURIComponent", "String",
+ "name", "u"
+};
+
+static const std::unordered_set<std::string> s_ignored_props {
+ "watch", "unwatch", "split", "reverse", "join", "name", "w"
};
namespace snort
public:
JSIdentifierCtxStub() = default;
- const char* substitute(const char* identifier) override
+ const char* substitute(const char* identifier, bool) override
{ return identifier; }
virtual void add_alias(const char*, const std::string&&) override {}
virtual const char* alias_lookup(const char* alias) const override
public:
JSTokenizerTester(int32_t depth, uint32_t max_scope_depth,
const std::unordered_set<std::string>& ignored_ids,
+ const std::unordered_set<std::string>& ignored_props,
uint8_t max_template_nesting, uint32_t max_bracket_depth)
:
- ident_ctx(depth, max_scope_depth, ignored_ids),
+ ident_ctx(depth, max_scope_depth, ignored_ids, ignored_props),
normalizer(ident_ctx, depth, max_template_nesting, max_bracket_depth)
{ }