From: Alex Rousskov <rousskov@measurement-factory.com>
Date: Tue, 2 Jul 2024 17:35:14 +0000 (+0000)
Subject: Restrict squid.conf preprocessor space characters to SP and HT (#1858)
X-Git-Tag: SQUID_7_0_1~96
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=314e43047145123e0ea018186445fd720ba6107c;p=thirdparty%2Fsquid.git

Restrict squid.conf preprocessor space characters to SP and HT (#1858)

Here, "preprocessor space" refers to a portion of squid.conf grammar
that covers preprocessor directives (e.g., `if` and `include`
statements) and separation of a directive name from directive
parameters. This change replaces all known xisspace() uses in
preprocessor code with code that only recognizes SP and HT as space
characters.

Prior to this change, Squid preprocessor also classified as space ASCII
VT and FF characters as well as any other locale-specific characters
classified as space by isspace(3).

Squid preprocessor still delimits input lines using LF and terminates
each read line at the first CR or LF character (if any), whichever comes
earlier. This behavior precludes CR and LF characters from reaching
individual directive line parsers. However, directive parameter values
loaded by ConfigParser when honoring "quoted filename" or
parameters(filename) syntax are (still) tokenized using w_space which
includes an ASCII CR character. Thus, a "foo_CR_bar" 9-character
sequence is still interpreted as a single "foo_" token by the
preprocessor and as two tokens ("foo_" and "_bar") by
ConfigParser::strtokFile().

After preprocessing, directive parameters are (still) parsed by parsers
specific to that directive type. Many of those parsers are based on
ConfigParser::TokenParse() that effectively uses the same "SP and HT
only" logic for inlined directive parameters. However, some of those
parsers define "space" differently (and may use non-space-like
characters for token delimiters). For example, response_delay_pool still
indirectly uses isspace(3) to parse integer parameters, and the
following "until eol" directives still use xisspace() to skip space
before their parameter value: debug_options, err_html_text,
mail_program, sslcrtd_program, and sslcrtvalidator_program.

More than 100 uses of xisspace() and indirect uses of isspace(3) remain
in Squid code. Most of those use cases are in configuration parsing
code. Restricting all relevant use cases one-by-one may not be
practical. On the other hand, restricting all configuration input lines
to prohibit VT, FF, CR, and locale-specific characters classified as
space by isspace(3) will break rare configs that use those characters in
places like URL paths and regexes.

Due to inconsistencies highlighted above, there is no consensus
regarding this change among Squid developers. This experiment may need
to be reverted or further grammar changes may need to be implemented
based on testing and deployment experience.

Co-authored-by: Amos Jeffries <squid3@treenet.co.nz>
---

diff --git a/src/cache_cf.cc b/src/cache_cf.cc
index 2f5ad01eb2..dd7eff5dbd 100644
--- a/src/cache_cf.cc
+++ b/src/cache_cf.cc
@@ -22,6 +22,7 @@
 #include "auth/Config.h"
 #include "auth/Scheme.h"
 #include "AuthReg.h"
+#include "base/CharacterSet.h"
 #include "base/PackableStream.h"
 #include "base/RunnersRegistry.h"
 #include "cache_cf.h"
@@ -289,10 +290,22 @@ SetConfigFilename(char const *file_name, bool is_pipe)
         cfg_filename = file_name;
 }
 
+/// Determines whether the given squid.conf character is a token-delimiting
+/// space character according to squid.conf preprocessor grammar. That grammar
+/// only recognizes two space characters: ASCII SP and HT. Unlike isspace(3),
+/// this function is not sensitive to locale(1) and does not classify LF, VT,
+/// FF, and CR characters as token-delimiting space. However, some squid.conf
+/// directive-specific parsers still define space based on isspace(3).
+static bool
+IsSpace(const char ch)
+{
+    return CharacterSet::WSP[ch];
+}
+
 static const char*
 skip_ws(const char* s)
 {
-    while (xisspace(*s))
+    while (IsSpace(*s))
         ++s;
 
     return s;
@@ -370,7 +383,7 @@ trim_trailing_ws(char* str)
 {
     assert(str != nullptr);
     unsigned i = strlen(str);
-    while ((i > 0) && xisspace(str[i - 1]))
+    while ((i > 0) && IsSpace(str[i - 1]))
         --i;
     str[i] = '\0';
 }
@@ -387,7 +400,7 @@ FindStatement(const char* line, const char* statement)
         str += len;
         if (*str == '\0')
             return str;
-        else if (xisspace(*str))
+        else if (IsSpace(*str))
             return skip_ws(str);
     }
 
@@ -498,7 +511,7 @@ parseOneConfigFile(const char *file_name, unsigned int depth)
             if (file == token)
                 continue;   /* Not a valid #line directive, may be a comment */
 
-            while (*file && xisspace((unsigned char) *file))
+            while (*file && IsSpace(*file))
                 ++file;
 
             if (*file) {
@@ -556,7 +569,7 @@ parseOneConfigFile(const char *file_name, unsigned int depth)
                 fatalf("'else' without 'if'\n");
         } else if (if_states.empty() || if_states.back()) { // test last if-statement meaning if present
             /* Handle includes here */
-            if (tmp_line_len >= 9 && strncmp(tmp_line, "include", 7) == 0 && xisspace(tmp_line[7])) {
+            if (tmp_line_len >= 9 && strncmp(tmp_line, "include", 7) == 0 && IsSpace(tmp_line[7])) {
                 err_count += parseManyConfigFiles(tmp_line + 8, depth + 1);
             } else {
                 try {
diff --git a/src/cf_gen.cc b/src/cf_gen.cc
index 3ffe4da46c..2294668b68 100644
--- a/src/cf_gen.cc
+++ b/src/cf_gen.cc
@@ -649,7 +649,7 @@ gen_parse(const EntryList &head, std::ostream &fout)
          "parse_line(char *buff)\n"
          "{\n"
          "\tchar\t*token;\n"
-         "\tif ((token = strtok(buff, w_space)) == NULL) \n"
+         "\tif ((token = strtok(buff, \" \\t\")) == NULL) \n"
          "\t\treturn 1;\t/* ignore empty lines */\n"
          "\tConfigParser::SetCfgLine(strtok(nullptr, \"\"));\n";
 
diff --git a/test-suite/squidconf/bad-pp-space.conf b/test-suite/squidconf/bad-pp-space.conf
new file mode 100644
index 0000000000..e56d3409e1
--- /dev/null
+++ b/test-suite/squidconf/bad-pp-space.conf
@@ -0,0 +1,28 @@
+## Copyright (C) 1996-2024 The Squid Software Foundation and contributors
+##
+## Squid software is distributed under GPLv2+ license and includes
+## contributions from numerous individuals and organizations.
+## Please see the COPYING and CONTRIBUTORS files for details.
+##
+
+# The following line has leading and trailing SP and HT characters around
+# directive name. The preprocessor is supposed to silently strip them.
+	 memory_pools	 	off
+
+# The following line has an ASCII VT character (0x0B) after a recognized
+# directive name. VT character is followed by a regular SP character to
+# make sure preprocessor can isolate (some) directive name.
+http_access deny all
+
+# The following line has UTF-8 No-Break Space character (0xC2A0) after a
+# recognized directive name. No SP character follows NBSP to test that
+# preprocessor does not isolate recognized directive names based on their
+# spelling (it should isolate the name token based on spaces instead).
+debug_optionsÂ ALL,1 2,3
+
+# The following line has UTF-8 Wavy Low Line character (0xEFB98F) instead of
+# the first ASCII underscore in what would otherwise be a recognized directive
+# name. This test validates that preprocessor does not isolate unrecognized
+# directive names based on their spelling (it should isolate the name token
+# based on spaces instead).
+forwardï¹max_tries 13
diff --git a/test-suite/squidconf/bad-pp-space.conf.instructions b/test-suite/squidconf/bad-pp-space.conf.instructions
new file mode 100644
index 0000000000..8a06ee7faf
--- /dev/null
+++ b/test-suite/squidconf/bad-pp-space.conf.instructions
@@ -0,0 +1,6 @@
+expect-failure FATAL:.*Found.3.unrecognized.directive.*
+expect-messages <<END
+ERROR: unrecognized directive .*http_access
+ERROR: unrecognized directive .*debug_options
+ERROR: unrecognized directive .*forward.*max_tries
+END