From: Alex Rousskov Date: Tue, 2 Jul 2024 17:35:14 +0000 (+0000) Subject: Restrict squid.conf preprocessor space characters to SP and HT (#1858) X-Git-Tag: SQUID_7_0_1~96 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=314e43047145123e0ea018186445fd720ba6107c;p=thirdparty%2Fsquid.git Restrict squid.conf preprocessor space characters to SP and HT (#1858) Here, "preprocessor space" refers to a portion of squid.conf grammar that covers preprocessor directives (e.g., `if` and `include` statements) and separation of a directive name from directive parameters. This change replaces all known xisspace() uses in preprocessor code with code that only recognizes SP and HT as space characters. Prior to this change, Squid preprocessor also classified as space ASCII VT and FF characters as well as any other locale-specific characters classified as space by isspace(3). Squid preprocessor still delimits input lines using LF and terminates each read line at the first CR or LF character (if any), whichever comes earlier. This behavior precludes CR and LF characters from reaching individual directive line parsers. However, directive parameter values loaded by ConfigParser when honoring "quoted filename" or parameters(filename) syntax are (still) tokenized using w_space which includes an ASCII CR character. Thus, a "foo_CR_bar" 9-character sequence is still interpreted as a single "foo_" token by the preprocessor and as two tokens ("foo_" and "_bar") by ConfigParser::strtokFile(). After preprocessing, directive parameters are (still) parsed by parsers specific to that directive type. Many of those parsers are based on ConfigParser::TokenParse() that effectively uses the same "SP and HT only" logic for inlined directive parameters. However, some of those parsers define "space" differently (and may use non-space-like characters for token delimiters). For example, response_delay_pool still indirectly uses isspace(3) to parse integer parameters, and the following "until eol" directives still use xisspace() to skip space before their parameter value: debug_options, err_html_text, mail_program, sslcrtd_program, and sslcrtvalidator_program. More than 100 uses of xisspace() and indirect uses of isspace(3) remain in Squid code. Most of those use cases are in configuration parsing code. Restricting all relevant use cases one-by-one may not be practical. On the other hand, restricting all configuration input lines to prohibit VT, FF, CR, and locale-specific characters classified as space by isspace(3) will break rare configs that use those characters in places like URL paths and regexes. Due to inconsistencies highlighted above, there is no consensus regarding this change among Squid developers. This experiment may need to be reverted or further grammar changes may need to be implemented based on testing and deployment experience. Co-authored-by: Amos Jeffries --- diff --git a/src/cache_cf.cc b/src/cache_cf.cc index 2f5ad01eb2..dd7eff5dbd 100644 --- a/src/cache_cf.cc +++ b/src/cache_cf.cc @@ -22,6 +22,7 @@ #include "auth/Config.h" #include "auth/Scheme.h" #include "AuthReg.h" +#include "base/CharacterSet.h" #include "base/PackableStream.h" #include "base/RunnersRegistry.h" #include "cache_cf.h" @@ -289,10 +290,22 @@ SetConfigFilename(char const *file_name, bool is_pipe) cfg_filename = file_name; } +/// Determines whether the given squid.conf character is a token-delimiting +/// space character according to squid.conf preprocessor grammar. That grammar +/// only recognizes two space characters: ASCII SP and HT. Unlike isspace(3), +/// this function is not sensitive to locale(1) and does not classify LF, VT, +/// FF, and CR characters as token-delimiting space. However, some squid.conf +/// directive-specific parsers still define space based on isspace(3). +static bool +IsSpace(const char ch) +{ + return CharacterSet::WSP[ch]; +} + static const char* skip_ws(const char* s) { - while (xisspace(*s)) + while (IsSpace(*s)) ++s; return s; @@ -370,7 +383,7 @@ trim_trailing_ws(char* str) { assert(str != nullptr); unsigned i = strlen(str); - while ((i > 0) && xisspace(str[i - 1])) + while ((i > 0) && IsSpace(str[i - 1])) --i; str[i] = '\0'; } @@ -387,7 +400,7 @@ FindStatement(const char* line, const char* statement) str += len; if (*str == '\0') return str; - else if (xisspace(*str)) + else if (IsSpace(*str)) return skip_ws(str); } @@ -498,7 +511,7 @@ parseOneConfigFile(const char *file_name, unsigned int depth) if (file == token) continue; /* Not a valid #line directive, may be a comment */ - while (*file && xisspace((unsigned char) *file)) + while (*file && IsSpace(*file)) ++file; if (*file) { @@ -556,7 +569,7 @@ parseOneConfigFile(const char *file_name, unsigned int depth) fatalf("'else' without 'if'\n"); } else if (if_states.empty() || if_states.back()) { // test last if-statement meaning if present /* Handle includes here */ - if (tmp_line_len >= 9 && strncmp(tmp_line, "include", 7) == 0 && xisspace(tmp_line[7])) { + if (tmp_line_len >= 9 && strncmp(tmp_line, "include", 7) == 0 && IsSpace(tmp_line[7])) { err_count += parseManyConfigFiles(tmp_line + 8, depth + 1); } else { try { diff --git a/src/cf_gen.cc b/src/cf_gen.cc index 3ffe4da46c..2294668b68 100644 --- a/src/cf_gen.cc +++ b/src/cf_gen.cc @@ -649,7 +649,7 @@ gen_parse(const EntryList &head, std::ostream &fout) "parse_line(char *buff)\n" "{\n" "\tchar\t*token;\n" - "\tif ((token = strtok(buff, w_space)) == NULL) \n" + "\tif ((token = strtok(buff, \" \\t\")) == NULL) \n" "\t\treturn 1;\t/* ignore empty lines */\n" "\tConfigParser::SetCfgLine(strtok(nullptr, \"\"));\n"; diff --git a/test-suite/squidconf/bad-pp-space.conf b/test-suite/squidconf/bad-pp-space.conf new file mode 100644 index 0000000000..e56d3409e1 --- /dev/null +++ b/test-suite/squidconf/bad-pp-space.conf @@ -0,0 +1,28 @@ +## Copyright (C) 1996-2024 The Squid Software Foundation and contributors +## +## Squid software is distributed under GPLv2+ license and includes +## contributions from numerous individuals and organizations. +## Please see the COPYING and CONTRIBUTORS files for details. +## + +# The following line has leading and trailing SP and HT characters around +# directive name. The preprocessor is supposed to silently strip them. + memory_pools off + +# The following line has an ASCII VT character (0x0B) after a recognized +# directive name. VT character is followed by a regular SP character to +# make sure preprocessor can isolate (some) directive name. +http_access deny all + +# The following line has UTF-8 No-Break Space character (0xC2A0) after a +# recognized directive name. No SP character follows NBSP to test that +# preprocessor does not isolate recognized directive names based on their +# spelling (it should isolate the name token based on spaces instead). +debug_options ALL,1 2,3 + +# The following line has UTF-8 Wavy Low Line character (0xEFB98F) instead of +# the first ASCII underscore in what would otherwise be a recognized directive +# name. This test validates that preprocessor does not isolate unrecognized +# directive names based on their spelling (it should isolate the name token +# based on spaces instead). +forward﹏max_tries 13 diff --git a/test-suite/squidconf/bad-pp-space.conf.instructions b/test-suite/squidconf/bad-pp-space.conf.instructions new file mode 100644 index 0000000000..8a06ee7faf --- /dev/null +++ b/test-suite/squidconf/bad-pp-space.conf.instructions @@ -0,0 +1,6 @@ +expect-failure FATAL:.*Found.3.unrecognized.directive.* +expect-messages <