From: Jiang Xin Date: Fri, 26 Jun 2026 11:57:52 +0000 (+0800) Subject: l10n: AGENTS.md: add quotation mark preservation guidelines X-Git-Tag: v2.55.0~3^2~11 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=5eb25b9605bd31345e81840d428d9bf019695c4e;p=thirdparty%2Fgit.git l10n: AGENTS.md: add quotation mark preservation guidelines Add a "Preserving Quotation Marks" section to prevent AI-assisted translation and review from incorrectly converting language-specific UTF-8 curly quotes (e.g., „ U+201E, " U+201C for Bulgarian) into ASCII straight quotes " (U+0022), which would cause PO string truncation and syntax errors. Also update the "Special characters" item in the Quality checklist to reference the new section. Signed-off-by: Jiang Xin --- diff --git a/po/AGENTS.md b/po/AGENTS.md index feb90d4d24..94af128b46 100644 --- a/po/AGENTS.md +++ b/po/AGENTS.md @@ -127,6 +127,52 @@ etc.), and quotes exactly as in `msgid`. Only reorder placeholders with positional syntax when needed (see Placeholder Reordering below). +### Preserving Quotation Marks + +Some languages use language-specific UTF-8 quotation marks (curly/smart +quotes) rather than ASCII straight quotes. **Always preserve these +characters exactly as they appear in the source.** Do **not** convert them +to ASCII straight quotes. + +**Protected quotation marks** (non-exhaustive list): + +| Character | Unicode | Name | Languages | +|-----------|---------|------|-----------| +| „ | U+201E | DOUBLE LOW-9 QUOTATION MARK | Bulgarian, German, etc. | +| " | U+201C | LEFT DOUBLE QUOTATION MARK | Bulgarian, etc. | +| " | U+201D | RIGHT DOUBLE QUOTATION MARK | English, German, etc. | +| ' | U+2018 | LEFT SINGLE QUOTATION MARK | English, etc. | +| ' | U+2019 | RIGHT SINGLE QUOTATION MARK | English, etc. | +| « | U+00AB | LEFT-POINTING DOUBLE ANGLE QUOTATION MARK | French, Russian, etc. | +| » | U+00BB | RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK | French, Russian, etc. | +| ‹ | U+2039 | SINGLE LEFT-POINTING ANGLE QUOTATION MARK | French, etc. | +| › | U+203A | SINGLE RIGHT-POINTING ANGLE QUOTATION MARK | French, etc. | + +**Why this matters in PO files**: In PO file format, the ASCII straight +double quote `"` (U+0022) is the **string delimiter**. If a translation +contains a curly quote that is incorrectly converted to `"` (U+0022), +the PO parser will interpret it as the end of the string, causing: + +1. **String truncation**: The `msgstr` value is cut short at the + spurious quote character. +2. **Syntax errors**: `msgfmt --check` fails with parse errors at + the line where the string was prematurely terminated. +3. **Data loss**: Content after the accidental quote delimiter is + misinterpreted or lost. + +**Rules**: + +- **Never** replace language-specific quotation marks with ASCII + straight quotes `"` (U+0022) or `'` (U+0027). +- Apply this rule when translating PO files, PO multi-line strings, + and GETTEXT JSON `msgstr` array values. +- Apply this rule when generating suggested translations + (`suggest_msgstr`) during review. +- If the source `msgid` uses ASCII straight quotes, preserve them + as-is in the translation unless the target language convention + requires different quotation marks. + + ### Placeholder Reordering When reordering placeholders relative to `msgid`, use positional syntax (`%n$`) @@ -387,7 +433,10 @@ read and write this format. - **Placeholders**: Preserve variables (`%s`, `{name}`, `$1`) exactly; use positional parameters when reordering (see "Placeholder Reordering" above). - **Special characters**: Preserve escape sequences (`\n`, `\"`, `\\`, `\t`), - placeholders exactly as in `msgid`. See "Preserving Special Characters" above. + placeholders exactly as in `msgid`. Preserve language-specific quotation + marks (curly/smart quotes like „, ", ", ', ') — do not convert them to + ASCII straight quotes. See "Preserving Special Characters" and + "Preserving Quotation Marks" above. - **Plurals and gender**: Correct forms and agreement. - **Context fit**: Suitable for UI space, tone, and use (e.g. error vs. tooltip). - **Cultural appropriateness**: No offensive or ambiguous content.