From: Timo Sirainen <timo.sirainen@open-xchange.com>
Date: Sat, 28 Feb 2026 08:32:15 +0000 (+0200)
Subject: lib-charset: Increase CHARSET_MAX_PENDING_BUF_SIZE to 16 bytes
X-Git-Tag: 2.4.3~130
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=db2add1b4058fd489905ea833ea31ceb4d550070;p=thirdparty%2Fdovecot%2Fcore.git

lib-charset: Increase CHARSET_MAX_PENDING_BUF_SIZE to 16 bytes

The old 10 bytes is likely enough, but lets make it safer based on AI's
recommendation:

While the 4–8 byte rule covers most common encodings, ISO-2022 variants
(like ISO-2022-JP) are the primary reason you might need a slightly larger
buffer. Because these encodings use multi-byte "escape sequences" to switch
between character sets, iconv() may stop mid-sequence.

For standard ISO-2022 variants, a buffer of 10 to 16 bytes is generally
considered the absolute "safe" maximum for unconverted bytes.

Why 16 Bytes? While individual characters or escape sequences rarely exceed
4–6 bytes, choosing 16 bytes provides a power-of-two alignment that safely
handles even the most obscure registered ISO-IR sequences and provides a
margin for implementation-specific behavior.
---

diff --git a/src/lib-charset/charset-utf8.h b/src/lib-charset/charset-utf8.h
index c17ab3053f..0b9f124b14 100644
--- a/src/lib-charset/charset-utf8.h
+++ b/src/lib-charset/charset-utf8.h
@@ -6,7 +6,7 @@
 /* Max number of bytes that iconv can require for a single character.
    UTF-8 takes max 6 bytes per character. Not sure about others, but I'd think
    10 is more than enough for everyone.. */
-#define CHARSET_MAX_PENDING_BUF_SIZE 10
+#define CHARSET_MAX_PENDING_BUF_SIZE 16
 
 struct charset_translation;
 
diff --git a/src/lib-charset/test-charset.c b/src/lib-charset/test-charset.c
index 2f9ba2b26a..434a9e3b2e 100644
--- a/src/lib-charset/test-charset.c
+++ b/src/lib-charset/test-charset.c
@@ -151,7 +151,7 @@ static void test_charset_iconv_utf7_state(void)
 	memcpy(nextbuf, "+AOQ-", 5);
 	size = sizeof(nextbuf);
 	test_assert(charset_to_utf8(trans, nextbuf, &size, str) == CHARSET_RET_OK);
-	test_assert(strcmp(str_c(str), "a\xC3\xA4???????????") == 0);
+	test_assert_strcmp(str_c(str), "a\xC3\xA4?????????????????");
 	charset_to_utf8_end(&trans);
 	test_end();
 }