From: Bruno Haible <bruno@clisp.org>
Date: Tue, 1 Oct 2024 14:42:56 +0000 (+0200)
Subject: its: Do escape handling during msgfmt merge, not during xgettext. Off by default.
X-Git-Tag: v0.23~92
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=eaf658bed81b831f84c53f27128696aacf8356ac;p=thirdparty%2Fgettext.git

its: Do escape handling during msgfmt merge, not during xgettext. Off by default.

Reported by Samy Mahmoudi <samy.mahmoudi@gmail.com>
at <https://savannah.gnu.org/bugs/?58643>.

* gettext-tools/src/its.c (its_localization_note_rule_constructor): Don't do
escaping while extracting a localization note.
(its_rule_list_extract_text): New local variable do_escape_during_extract. Don't
do escaping while extracting.
(starts_with_character_reference, _its_encode_special_chars_for_merge): New
functions.
(its_merge_context_merge_node): New local variables do_escape_during_extract,
do_escape_during_merge. Don't do escaping while extracting. Conditionally do
escaping while merging.
* gettext-tools/src/its-extensions.xsd: Mention that escape="no" is now the
default.
* gettext-tools/its/glade1.its: Add a comment.
* gettext-tools/its/glade2.its: Likewise.
* gettext-tools/its/gsettings.its: Likewise.
* gettext-tools/its/gtkbuilder.its: Likewise.
* gettext-tools/its/metainfo.its: Add a <gt:escapeRule>.
* gettext-tools/tests/xgettext-appdata-1: Add comment.
* gettext-tools/tests/xgettext-appdata-2: New file, based on
gettext-tools/tests/msgfmt-xml-1.
* gettext-tools/tests/Makefile.am (TESTS): Add it.
* gettext-tools/tests/xgettext-its-1: Update expected results.
* gettext-tools/tests/msgfmt-xml-1: Test also character references and entity
references.
* gettext-tools/tests/msgfmt-xml-2: Likewise.
* gettext-tools/doc/gettext.texi (ITS Rules): Under "Escape Special Characters",
explain that it is no longer necessary to write a rule with escape="no".
Rewrite section "Two Use-cases of Translated Strings in XML".
* NEWS: Mention the changes.
---

diff --git a/NEWS b/NEWS
index 88bbdde60..f423a58e7 100644
--- a/NEWS
+++ b/NEWS
@@ -1,7 +1,17 @@
 Version 0.23 - September 2024
 
 * Programming languages support:
-  - XML: XML schemas for .its and .loc files are now provided.
+  - XML:
+    o The escaping of characters such as & < > has been changed:
+      - No escaping is done any more by xgettext, when creating a POT file.
+      - Instead, extra escaping can be requested for the msgfmt pass, when
+        merging into an XML file.
+      - The default value of 'escape' in the <gt:escapeRule> was "yes";
+        now it is "no".
+      This means that existing translations of older POT files may no longer
+      fully apply. As a maintainer of a package that has translatable XML files,
+      you need to regenerate the POT file and pass it on to your translators.
+    o XML schemas for .its and .loc files are now provided.
   - Python:
     o xgettext now assumes source code for Python 3 rather than Python 2.
       This affects the interpretation of escape sequences in string literals.
diff --git a/gettext-tools/doc/gettext.texi b/gettext-tools/doc/gettext.texi
index e70b9e57b..106174e9c 100644
--- a/gettext-tools/doc/gettext.texi
+++ b/gettext-tools/doc/gettext.texi
@@ -10656,6 +10656,11 @@ appdata-tools, appstream, libappstream-glib-dev
 @subsection Preparing Rules for XML Internationalization
 @cindex preparing rules for XML translation
 
+@c The ITS support in GNU gettext was designed so as to supersede
+@c the GNOME itstool <https://itstool.org/>.  See
+@c <https://lists.gnu.org/archive/html/bug-gettext/2015-10/msg00001.html> and
+@c <https://mail.gnome.org/archives/desktop-devel-list/2015-October/msg00013.html>.
+
 @menu
 * ITS Rules::                   Specifying ITS Rules
 * Locating Rules::              Specifying where to find the ITS Rules
@@ -10674,6 +10679,7 @@ categories:
 
 @table @samp
 @item Context
+@c Rationale: Glade 2.
 
 This data category associates @code{msgctxt} to the extracted text.  In
 the global rule, the @code{contextRule} element contains the following:
@@ -10692,23 +10698,8 @@ An optional @code{textPointer} attribute that contains a relative
 selector pointing to a node that holds the @code{msgid} value.
 @end itemize
 
-@item Escape Special Characters
-
-This data category indicates whether the special XML characters
-(@code{<}, @code{>}, @code{&}, @code{"}) are escaped with entity
-reference.  In the global rule, the @code{escapeRule} element contains
-the following:
-
-@itemize @bullet
-@item
-A required @code{selector} attribute.  It contains an absolute selector
-that selects the nodes to which this rule applies.
-
-@item
-A required @code{escape} attribute with the value @code{yes} or @code{no}.
-@end itemize
-
 @item Extended Preserve Space
+@c Rationale: GSettings.
 
 This data category extends the standard @samp{Preserve Space} data
 category with the additional values @samp{trim} and @samp{paragraph}.
@@ -10728,6 +10719,28 @@ A required @code{space} attribute with the value @code{default},
 @code{preserve}, @code{trim}, or @code{paragraph}.
 @end itemize
 
+@item Escape Special Characters
+
+This data category indicates whether the special XML characters
+(@code{<}, @code{>}, @code{&}, @code{"}) are escaped with entity
+references.  In the global rule, the @code{escapeRule} element contains
+the following:
+
+@itemize @bullet
+@item
+A required @code{selector} attribute.  It contains an absolute selector
+that selects the nodes to which this rule applies.
+
+@item
+A required @code{escape} attribute with the value @code{yes} or @code{no}.
+@end itemize
+
+@noindent
+The default value, @code{no}, should be good for most XML file types.
+A rule with @code{escape="no"},
+that was necessary with GNU gettext versions before 0.23,
+is now redundant.
+
 @end table
 
 All those extended data categories can only be expressed with global
@@ -10818,19 +10831,30 @@ from the matching XML files.
 
 @subsubsection Two Use-cases of Translated Strings in XML
 
-For XML, there are two use-cases of translated strings.  One is the case
-where the translated strings are directly consumed by programs, and the
-other is the case where the translated strings are merged back to the
-original XML document.  In the former case, special characters in the
-extracted strings shouldn't be escaped, while they should in the latter
-case.  To control whether to escape special characters, the @samp{Escape
-Special Characters} data category can be used.
-
-To merge the translations, the @samp{msgfmt} program can be used with
-the option @code{--xml}.  @xref{msgfmt Invocation}, for more details
-about how one calls the @samp{msgfmt} program.  @samp{msgfmt}'s
-@code{--xml} option doesn't perform character escaping, so translated
-strings can have arbitrary XML constructs, such as elements for markup.
+After strings have been extracted from an XML file to a POT file
+through @code{xgettext}
+and the translator has produced a PO file with translations,
+it can be used in two ways:
+
+@itemize @bullet
+@item
+The PO file (or the MO file generated from it) can be directly consumed
+by a program.
+
+@item
+Or the translated strings can be merged back to the original XML document.
+To do this use the @code{msgfmt} program with the option @code{--xml}.
+@xref{msgfmt Invocation}, for more details about how one calls
+the @samp{msgfmt} program.
+
+During this merge from a PO file into an XML file, it may happen that
+more escaping of special characters for XML is needed
+than what @code{msgfmt} does by default.
+In this case, you can enforce more escaping
+either throuch an @code{<escapeRule>} ITS rule,
+or through an attribute @code{gt:escape="yes"} on the particular XML element.
+
+@end itemize
 
 @c This is the template for new data formats.
 @ignore
diff --git a/gettext-tools/its/glade1.its b/gettext-tools/its/glade1.its
index 874b5e981..42f73b780 100644
--- a/gettext-tools/its/glade1.its
+++ b/gettext-tools/its/glade1.its
@@ -1,6 +1,6 @@
 <?xml version="1.0"?>
 <!--
-  Copyright (C) 2015 Free Software Foundation, Inc.
+  Copyright (C) 2015-2024 Free Software Foundation, Inc.
   This file was written by Daiki Ueno <ueno@gnu.org>, 2015.
 
   This program is free software: you can redistribute it and/or modify
@@ -31,5 +31,7 @@
                      translate="yes"/>
 
   <its:preserveSpaceRule selector="/GTK-Interface" space="preserve"/>
+
+  <!-- This rule is redundant since gettext 0.23.  -->
   <gt:escapeRule selector="/GTK-Interface" escape="no"/>
 </its:rules>
diff --git a/gettext-tools/its/glade2.its b/gettext-tools/its/glade2.its
index e6133ae81..48220f302 100644
--- a/gettext-tools/its/glade2.its
+++ b/gettext-tools/its/glade2.its
@@ -1,6 +1,6 @@
 <?xml version="1.0"?>
 <!--
-  Copyright (C) 2015 Free Software Foundation, Inc.
+  Copyright (C) 2015-2024 Free Software Foundation, Inc.
   This file was written by Daiki Ueno <ueno@gnu.org>, 2015.
 
   This program is free software: you can redistribute it and/or modify
@@ -40,5 +40,7 @@
                   textPointer="substring-after(., '|')"/>
 
   <its:preserveSpaceRule selector="/glade-interface" space="preserve"/>
+
+  <!-- This rule is redundant since gettext 0.23.  -->
   <gt:escapeRule selector="/glade-interface" escape="no"/>
 </its:rules>
diff --git a/gettext-tools/its/gsettings.its b/gettext-tools/its/gsettings.its
index 930ec4238..c69f1d2f4 100644
--- a/gettext-tools/its/gsettings.its
+++ b/gettext-tools/its/gsettings.its
@@ -1,6 +1,6 @@
 <?xml version="1.0"?>
 <!--
-  Copyright (C) 2015 Free Software Foundation, Inc.
+  Copyright (C) 2015-2024 Free Software Foundation, Inc.
   This file was written by Daiki Ueno <ueno@gnu.org>, 2015.
 
   This program is free software: you can redistribute it and/or modify
@@ -28,5 +28,7 @@
   <gt:escapeRule selector="//default/@context" escape="no"/>
 
   <gt:preserveSpaceRule selector="//default" space="trim"/>
+
+  <!-- This rule is redundant since gettext 0.23.  -->
   <gt:escapeRule selector="/schemalist" escape="no"/>
 </its:rules>
diff --git a/gettext-tools/its/gtkbuilder.its b/gettext-tools/its/gtkbuilder.its
index 8078e1d4f..a984511d5 100644
--- a/gettext-tools/its/gtkbuilder.its
+++ b/gettext-tools/its/gtkbuilder.its
@@ -1,6 +1,6 @@
 <?xml version="1.0"?>
 <!--
-  Copyright (C) 2015, 2023 Free Software Foundation, Inc.
+  Copyright (C) 2015-2024 Free Software Foundation, Inc.
   This file was written by Daiki Ueno <ueno@gnu.org>, 2015.
 
   This program is free software: you can redistribute it and/or modify
@@ -35,5 +35,7 @@
   <gt:contextRule selector="/interface//*[@context]" contextPointer="@context"/>
 
   <its:preserveSpaceRule selector="/interface" space="preserve"/>
+
+  <!-- This rule is redundant since gettext 0.23.  -->
   <gt:escapeRule selector="/interface" escape="no"/>
 </its:rules>
diff --git a/gettext-tools/its/metainfo.its b/gettext-tools/its/metainfo.its
index 29b31f035..466c250a3 100644
--- a/gettext-tools/its/metainfo.its
+++ b/gettext-tools/its/metainfo.its
@@ -1,6 +1,6 @@
 <?xml version="1.0"?>
 <!--
-  Copyright (C) 2015, 2017 Free Software Foundation, Inc.
+  Copyright (C) 2015-2024 Free Software Foundation, Inc.
   This file was written by Daiki Ueno <ueno@gnu.org>, 2015.
 
   This program is free software: you can redistribute it and/or modify
@@ -17,6 +17,7 @@
   along with this program.  If not, see <https://www.gnu.org/licenses/>.
 -->
 <its:rules xmlns:its="http://www.w3.org/2005/11/its"
+           xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0"
            version="2.0">
   <its:translateRule selector="/component" translate="no"/>
   <its:translateRule selector="/component/name |
@@ -25,4 +26,7 @@
                                /component/developer_name |
                                /component/screenshots/screenshot/caption"
                      translate="yes"/>
+
+  <!-- This rule is redundant since gettext 0.23.  -->
+  <gt:escapeRule selector="/component" escape="no"/>
 </its:rules>
diff --git a/gettext-tools/src/its-extensions.xsd b/gettext-tools/src/its-extensions.xsd
index 116ef18da..4cb19e552 100644
--- a/gettext-tools/src/its-extensions.xsd
+++ b/gettext-tools/src/its-extensions.xsd
@@ -50,7 +50,7 @@ Written by Bruno Haible &lt;bruno@clisp.org&gt;, 2024.
   </complexType>
 
   <!-- If no <gt:escapeRule> is present, the default 'escape' property
-       is "yes".  -->
+       is "no".  -->
   <complexType name="EscapeRuleType">
     <attribute name="selector" type="string" use="required"></attribute>
     <attribute name="escape" use="required">
diff --git a/gettext-tools/src/its.c b/gettext-tools/src/its.c
index 4ac5283d9..a00e9ddd7 100644
--- a/gettext-tools/src/its.c
+++ b/gettext-tools/src/its.c
@@ -846,7 +846,7 @@ its_localization_note_rule_constructor (struct its_rule_ty *rule, xmlNode *node)
     {
       /* FIXME: Respect space attribute.  */
       char *content = _its_collect_text_content (n, ITS_WHITESPACE_NORMALIZE,
-                                                 true);
+                                                 false);
       its_value_list_append (&rule->values, "locNote", content);
       free (content);
     }
@@ -1771,13 +1771,34 @@ its_rule_list_extract_text (its_rule_list_ty *rules,
       struct its_value_list_ty *values;
       const char *value;
       char *msgid = NULL, *msgctxt = NULL, *comment = NULL;
-      bool no_escape;
+      bool do_escape;
+      bool do_escape_during_extract;
       enum its_whitespace_type_ty whitespace;
       
       values = its_rule_list_eval (rules, node);
 
       value = its_value_list_get_value (values, "escape");
-      no_escape = value != NULL && strcmp (value, "no") == 0;
+      do_escape = value != NULL && strcmp (value, "yes") == 0;
+      /* Consider also a locally declared 'gt:escape' attribute.  */
+      if (node->type == XML_ELEMENT_NODE
+          && xmlHasNsProp (node, BAD_CAST "escape", BAD_CAST GT_NS))
+        {
+          char *prop = _its_get_attribute (node, "escape", GT_NS);
+          if (strcmp (prop, "yes") == 0 || strcmp (prop, "no") == 0)
+            do_escape = strcmp (prop, "yes") == 0;
+          free (prop);
+        }
+
+      do_escape_during_extract = do_escape;
+      /* But no, during message extraction (i.e. what xgettext does), we do
+         *not* want escaping to be done.  The contents of the POT file is meant
+         for translators, and
+           - the messages are not labelled as requiring XML content syntax,
+           - it is better for the translators if they can write various
+             characters such as & < > without escaping them.
+         Escaping needs to happen in the message merge phase (i.e. what msgfmt
+         does) instead.  */
+      do_escape_during_extract = false;
 
       value = its_value_list_get_value (values, "locNote");
       if (value)
@@ -1787,7 +1808,7 @@ its_rule_list_extract_text (its_rule_list_ty *rules,
           value = its_value_list_get_value (values, "locNotePointer");
           if (value)
             comment = _its_get_content (rules, node, value, ITS_WHITESPACE_TRIM,
-                                        !no_escape);
+                                        do_escape_during_extract);
         }
 
       if (comment != NULL && *comment != '\0')
@@ -1841,17 +1862,18 @@ its_rule_list_extract_text (its_rule_list_ty *rules,
       value = its_value_list_get_value (values, "contextPointer");
       if (value)
         msgctxt = _its_get_content (rules, node, value, ITS_WHITESPACE_PRESERVE,
-                                    !no_escape);
+                                    do_escape_during_extract);
 
       value = its_value_list_get_value (values, "textPointer");
       if (value)
         msgid = _its_get_content (rules, node, value, ITS_WHITESPACE_PRESERVE,
-                                  !no_escape);
+                                  do_escape_during_extract);
       its_value_list_destroy (values);
       free (values);
 
       if (msgid == NULL)
-        msgid = _its_collect_text_content (node, whitespace, !no_escape);
+        msgid = _its_collect_text_content (node, whitespace,
+                                           do_escape_during_extract);
       if (*msgid != '\0')
         {
           lex_pos_ty pos;
@@ -1939,6 +1961,82 @@ struct its_merge_context_ty
   struct its_node_list_ty nodes;
 };
 
+/* Returns true if S starts with a character reference.  */
+static bool
+starts_with_character_reference (const char *s)
+{
+  /* <https://www.w3.org/TR/xml/#NT-CharRef> defines
+     CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'  */
+  if (*s == '&')
+    {
+      s++;
+      if (*s == '#')
+        {
+          s++;
+          if (*s >= '0' && *s <= '9')
+            {
+              do
+                s++;
+              while (*s >= '0' && *s <= '9');
+              return *s == ';';
+            }
+          if (*s == 'x')
+            {
+              s++;
+              if ((*s >= '0' && *s <= '9')
+                  || (*s >= 'A' && *s <= 'F')
+                  || (*s >= 'a' && *s <= 'f'))
+                {
+                  do
+                    s++;
+                  while ((*s >= '0' && *s <= '9')
+                         || (*s >= 'A' && *s <= 'F')
+                         || (*s >= 'a' && *s <= 'f'));
+                  return *s == ';';
+                }
+            }
+        }
+    }
+  return false;
+}
+
+static char *
+_its_encode_special_chars_for_merge (const char *content)
+{
+  const char *str;
+  size_t amount = 0;
+  char *result, *p;
+
+  for (str = content; *str != '\0'; str++)
+    {
+      if (*str == '&' && starts_with_character_reference (str))
+        amount += sizeof ("&amp;");
+      else if (*str == '<')
+        amount += sizeof ("&lt;");
+      else if (*str == '>')
+        amount += sizeof ("&gt;");
+      else
+        amount += 1;
+    }
+
+  result = XNMALLOC (amount + 1, char);
+  *result = '\0';
+  p = result;
+  for (str = content; *str != '\0'; str++)
+    {
+      if (*str == '&' && starts_with_character_reference (str))
+        p = stpcpy (p, "&amp;");
+      else if (*str == '<')
+        p = stpcpy (p, "&lt;");
+      else if (*str == '>')
+        p = stpcpy (p, "&gt;");
+      else
+        *p++ = *str;
+    }
+  *p = '\0';
+  return result;
+}
+
 static void
 its_merge_context_merge_node (struct its_merge_context_ty *context,
                               xmlNode *node,
@@ -1950,13 +2048,29 @@ its_merge_context_merge_node (struct its_merge_context_ty *context,
       struct its_value_list_ty *values;
       const char *value;
       char *msgid = NULL, *msgctxt = NULL;
-      bool no_escape;
+      bool do_escape;
+      bool do_escape_during_extract;
+      bool do_escape_during_merge;
       enum its_whitespace_type_ty whitespace;
 
       values = its_rule_list_eval (context->rules, node);
 
       value = its_value_list_get_value (values, "escape");
-      no_escape = value != NULL && strcmp (value, "no") == 0;
+      do_escape = value != NULL && strcmp (value, "yes") == 0;
+      /* Consider also a locally declared 'gt:escape' attribute.  */
+      if (xmlHasNsProp (node, BAD_CAST "escape", BAD_CAST GT_NS))
+        {
+          char *prop = _its_get_attribute (node, "escape", GT_NS);
+          if (strcmp (prop, "yes") == 0 || strcmp (prop, "no") == 0)
+            do_escape = strcmp (prop, "yes") == 0;
+          free (prop);
+        }
+
+      do_escape_during_extract = do_escape;
+      /* Like above, in its_rule_list_extract_text.  */
+      do_escape_during_extract = false;
+
+      do_escape_during_merge = do_escape;
 
       value = its_value_list_get_value (values, "space");
       if (value && strcmp (value, "preserve") == 0)
@@ -1971,17 +2085,20 @@ its_merge_context_merge_node (struct its_merge_context_ty *context,
       value = its_value_list_get_value (values, "contextPointer");
       if (value)
         msgctxt = _its_get_content (context->rules, node, value,
-                                    ITS_WHITESPACE_PRESERVE, !no_escape);
+                                    ITS_WHITESPACE_PRESERVE,
+                                    do_escape_during_extract);
 
       value = its_value_list_get_value (values, "textPointer");
       if (value)
         msgid = _its_get_content (context->rules, node, value,
-                                  ITS_WHITESPACE_PRESERVE, !no_escape);
+                                  ITS_WHITESPACE_PRESERVE,
+                                  do_escape_during_extract);
       its_value_list_destroy (values);
       free (values);
 
       if (msgid == NULL)
-        msgid = _its_collect_text_content (node, whitespace, !no_escape);
+        msgid = _its_collect_text_content (node, whitespace,
+                                           do_escape_during_extract);
       if (*msgid != '\0')
         {
           message_ty *mp;
@@ -1994,7 +2111,50 @@ its_merge_context_merge_node (struct its_merge_context_ty *context,
               translated = xmlNewNode (node->ns, node->name);
               xmlSetProp (translated, BAD_CAST "xml:lang", BAD_CAST language);
 
-              xmlNodeAddContent (translated, BAD_CAST mp->msgstr);
+              /* libxml2 offers two functions for setting the content of an
+                 element: xmlNodeSetContent and xmlNodeAddContent.  They differ
+                 in the amount of escaping they do:
+                 - xmlNodeSetContent does no escaping, at the risk of creating
+                   malformed XML.
+                 - xmlNodeAddContent escapes all of & < >, which always produces
+                   well-formed XML but is not the right thing for entity
+                   references.
+                 We need a middle ground between both, that is adapted to what
+                 translators will usually produce.
+
+                 translated       | no escaping | middle-ground | full escaping
+                                  | SetContent  |               | AddContent
+                 -----------------+-------------+---------------+--------------
+                 &                | &           | &             | &amp;
+                 &quot;           | &quot;      | &quot;        | &amp;quot;
+                 &amp;            | &amp;       | &amp;         | &amp;amp;
+                 <                | <           | &lt;          | &lt;
+                 >                | >           | &gt;          | &gt;
+                 &lt;             | &lt;        | &lt;          | &amp;lt;
+                 &gt;             | &gt;        | &gt;          | &amp;gt;
+                 &#xa9;           | &#xa9;      | &amp;#xa9;    | &amp;#xa9;
+                 &copy;           | &copy;      | &copy;        | &amp;copy;
+                 -----------------+-------------+---------------+--------------
+
+                 The function _its_encode_special_chars_for_merge implements
+                 this middle-ground.  But we allow full escaping to be requested
+                 through a gt:escape="yes" attribute.  */
+
+              if (do_escape_during_merge)
+                {
+                  /* These three are equivalent:
+                     xmlNodeAddContent (translated, BAD_CAST mp->msgstr);
+                     xmlNodeSetContent (translated, xmlEncodeEntitiesReentrant (context->doc, BAD_CAST mp->msgstr));
+                     xmlNodeSetContent (translated, xmlEncodeSpecialChars (context->doc, BAD_CAST mp->msgstr));  */
+                  xmlNodeAddContent (translated, BAD_CAST mp->msgstr);
+                }
+              else
+                {
+                  char *middle_ground = _its_encode_special_chars_for_merge (mp->msgstr);
+                  xmlNodeSetContent (translated, BAD_CAST middle_ground);
+                  free (middle_ground);
+                }
+
               xmlAddNextSibling (node, translated);
             }
         }
diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am
index 0bbd1c022..454160141 100644
--- a/gettext-tools/tests/Makefile.am
+++ b/gettext-tools/tests/Makefile.am
@@ -84,7 +84,7 @@ TESTS = gettext-1 gettext-2 \
 	xgettext-13 xgettext-14 xgettext-15 xgettext-16 xgettext-17 \
 	xgettext-18 \
 	xgettext-combine-1 xgettext-combine-2 xgettext-combine-3 \
-	xgettext-appdata-1 \
+	xgettext-appdata-1 xgettext-appdata-2 \
 	xgettext-awk-1 xgettext-awk-2 xgettext-awk-3 \
 	xgettext-awk-stackovfl-1 xgettext-awk-stackovfl-2 \
 	xgettext-c-2 xgettext-c-3 xgettext-c-4 xgettext-c-5 xgettext-c-6 \
diff --git a/gettext-tools/tests/msgfmt-xml-1 b/gettext-tools/tests/msgfmt-xml-1
index c7de103c7..856c030cf 100755
--- a/gettext-tools/tests/msgfmt-xml-1
+++ b/gettext-tools/tests/msgfmt-xml-1
@@ -5,7 +5,12 @@
 
 cat <<\EOF > mf.appdata.xml
 <?xml version="1.0" encoding="UTF-8"?>
-<component type="desktop">
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component type="desktop" xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0">
   <id>org.gnome.Characters.desktop</id>
   <name>GNOME Characters</name>
   <summary>Character map application</summary>
@@ -20,6 +25,15 @@ cat <<\EOF > mf.appdata.xml
       You can also browse characters by categories, such as
       Punctuation, Pictures, etc.
     </p>
+    <p gt:escape="yes">
+      Did you know that the copyright sign (&#xa9;, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p gt:escape="no">Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p gt:escape="yes">Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
   </description>
   <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
   <updatecontact>dueno_at_src.gnome.org</updatecontact>
@@ -61,11 +75,34 @@ msgid ""
 msgstr ""
 "Vous pouvez aussi naviguer dans les caractÃ¨res par catÃ©gories, comme par "
 "Ponctuation, Images, etc."
+
+msgid ""
+"Did you know that the copyright sign (Â©, U+00A9) can be written in HTML as "
+"&#xa9;, as &#169;, or as &copy;?"
+msgstr ""
+"Saviez-vous que le signe de copyright (Â©, U+00A9) peut Ãªtre Ã©crit en HTML "
+"comme &#xa9;, comme &#169; ou comme &copy; ?"
+
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr "Ãcrit par &author1;, &author2;, et &author3;."
+
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference Â©, escaped character reference &#xa9;, entity references &copy; "
+"&author1;"
+msgstr ""
+"Exposition d'Ã©chappements: operateur x&y, entitÃ©s XML standard & \" ' & < >, "
+"caractÃ¨re Â©, caractÃ¨re Ã©chappÃ© &#xa9;, entitÃ©s &copy; &author1;"
 EOF
 
 cat <<\EOF > mf.appdata.xml.ok
 <?xml version="1.0" encoding="UTF-8"?>
-<component type="desktop">
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0" type="desktop">
   <id>org.gnome.Characters.desktop</id>
   <name>GNOME Characters</name>
   <summary>Character map application</summary>
@@ -82,6 +119,19 @@ cat <<\EOF > mf.appdata.xml.ok
       Punctuation, Pictures, etc.
     </p>
     <p xml:lang="fr">Vous pouvez aussi naviguer dans les caractÃ¨res par catÃ©gories, comme par Ponctuation, Images, etc.</p>
+    <p gt:escape="yes">
+      Did you know that the copyright sign (Â©, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p xml:lang="fr">Saviez-vous que le signe de copyright (Â©, U+00A9) peut Ãªtre Ã©crit en HTML comme &amp;#xa9;, comme &amp;#169; ou comme &amp;copy; ?</p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p xml:lang="fr">Ãcrit par &author1;, &author2;, et &author3;.</p>
+    <p gt:escape="no">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference Â©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'Ã©chappements: operateur x&y, entitÃ©s XML standard & " ' & &lt; &gt;, caractÃ¨re Â©, caractÃ¨re Ã©chappÃ© &amp;#xa9;, entitÃ©s &copy; &author1;</p>
+    <p gt:escape="yes">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference Â©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'Ã©chappements: operateur x&amp;y, entitÃ©s XML standard &amp; " ' &amp; &lt; &gt;, caractÃ¨re Â©, caractÃ¨re Ã©chappÃ© &amp;#xa9;, entitÃ©s &amp;copy; &amp;author1;</p>
   </description>
   <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
   <updatecontact>dueno_at_src.gnome.org</updatecontact>
diff --git a/gettext-tools/tests/msgfmt-xml-2 b/gettext-tools/tests/msgfmt-xml-2
index f8d51f164..10e136f94 100755
--- a/gettext-tools/tests/msgfmt-xml-2
+++ b/gettext-tools/tests/msgfmt-xml-2
@@ -5,7 +5,12 @@
 
 cat <<\EOF > mf.appdata.xml
 <?xml version="1.0" encoding="UTF-8"?>
-<component type="desktop">
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component type="desktop" xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0">
   <id>org.gnome.Characters.desktop</id>
   <name>GNOME Characters</name>
   <summary>Character map application</summary>
@@ -20,6 +25,15 @@ cat <<\EOF > mf.appdata.xml
       You can also browse characters by categories, such as
       Punctuation, Pictures, etc.
     </p>
+    <p gt:escape="yes">
+      Did you know that the copyright sign (&#xa9;, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p gt:escape="no">Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p gt:escape="yes">Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
   </description>
   <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
   <updatecontact>dueno_at_src.gnome.org</updatecontact>
@@ -63,6 +77,24 @@ msgid ""
 msgstr ""
 "Vous pouvez aussi naviguer dans les caractÃ¨res par catÃ©gories, comme par "
 "Ponctuation, Images, etc."
+
+msgid ""
+"Did you know that the copyright sign (Â©, U+00A9) can be written in HTML as "
+"&#xa9;, as &#169;, or as &copy;?"
+msgstr ""
+"Saviez-vous que le signe de copyright (Â©, U+00A9) peut Ãªtre Ã©crit en HTML "
+"comme &#xa9;, comme &#169; ou comme &copy; ?"
+
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr "Ãcrit par &author1;, &author2;, et &author3;."
+
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference Â©, escaped character reference &#xa9;, entity references &copy; "
+"&author1;"
+msgstr ""
+"Exposition d'Ã©chappements: operateur x&y, entitÃ©s XML standard & \" ' & < >, "
+"caractÃ¨re Â©, caractÃ¨re Ã©chappÃ© &#xa9;, entitÃ©s &copy; &author1;"
 EOF
 
 cat <<\EOF > po/de.po
@@ -100,11 +132,35 @@ msgid ""
 msgstr ""
 "Sie kÃ¶nnen ebenfalls nach Kategorie suchen, wie z.B. nach Zeichensetzung oder "
 "Bildern."
+
+msgid ""
+"Did you know that the copyright sign (Â©, U+00A9) can be written in HTML as "
+"&#xa9;, as &#169;, or as &copy;?"
+msgstr ""
+"Wussten Sie, dass das Copyright-Zeichen (Â©, U+00A9) in HTML als "
+"&#xa9;, als &#169;, oder als &copy; "
+"geschrieben werden kann?"
+
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr "Geschrieben von &author1;, &author2; und &author3;."
+
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference Â©, escaped character reference &#xa9;, entity references &copy; "
+"&author1;"
+msgstr ""
+"Escape-Beispiele: Operator x&y, Standard-XML EntitÃ¤ten & \" ' & < >, Zeichen "
+"Â©, escaptes Zeichen &#xa9;, EntitÃ¤ten &copy; &author1;"
 EOF
 
 cat <<\EOF > mf.appdata.xml.ok
 <?xml version="1.0" encoding="UTF-8"?>
-<component type="desktop">
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0" type="desktop">
   <id>org.gnome.Characters.desktop</id>
   <name>GNOME Characters</name>
   <summary>Character map application</summary>
@@ -123,6 +179,23 @@ cat <<\EOF > mf.appdata.xml.ok
     </p>
     <p xml:lang="fr">Vous pouvez aussi naviguer dans les caractÃ¨res par catÃ©gories, comme par Ponctuation, Images, etc.</p>
     <p xml:lang="de">Sie kÃ¶nnen ebenfalls nach Kategorie suchen, wie z.B. nach Zeichensetzung oder Bildern.</p>
+    <p gt:escape="yes">
+      Did you know that the copyright sign (Â©, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p xml:lang="fr">Saviez-vous que le signe de copyright (Â©, U+00A9) peut Ãªtre Ã©crit en HTML comme &amp;#xa9;, comme &amp;#169; ou comme &amp;copy; ?</p>
+    <p xml:lang="de">Wussten Sie, dass das Copyright-Zeichen (Â©, U+00A9) in HTML als &amp;#xa9;, als &amp;#169;, oder als &amp;copy; geschrieben werden kann?</p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p xml:lang="fr">Ãcrit par &author1;, &author2;, et &author3;.</p>
+    <p xml:lang="de">Geschrieben von &author1;, &author2; und &author3;.</p>
+    <p gt:escape="no">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference Â©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'Ã©chappements: operateur x&y, entitÃ©s XML standard & " ' & &lt; &gt;, caractÃ¨re Â©, caractÃ¨re Ã©chappÃ© &amp;#xa9;, entitÃ©s &copy; &author1;</p>
+    <p xml:lang="de">Escape-Beispiele: Operator x&y, Standard-XML EntitÃ¤ten & " ' & &lt; &gt;, Zeichen Â©, escaptes Zeichen &amp;#xa9;, EntitÃ¤ten &copy; &author1;</p>
+    <p gt:escape="yes">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference Â©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'Ã©chappements: operateur x&amp;y, entitÃ©s XML standard &amp; " ' &amp; &lt; &gt;, caractÃ¨re Â©, caractÃ¨re Ã©chappÃ© &amp;#xa9;, entitÃ©s &amp;copy; &amp;author1;</p>
+    <p xml:lang="de">Escape-Beispiele: Operator x&amp;y, Standard-XML EntitÃ¤ten &amp; " ' &amp; &lt; &gt;, Zeichen Â©, escaptes Zeichen &amp;#xa9;, EntitÃ¤ten &amp;copy; &amp;author1;</p>
   </description>
   <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
   <updatecontact>dueno_at_src.gnome.org</updatecontact>
@@ -131,7 +204,12 @@ EOF
 
 cat <<\EOF > mf.appdata.xml.desired.ok
 <?xml version="1.0" encoding="UTF-8"?>
-<component type="desktop">
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0" type="desktop">
   <id>org.gnome.Characters.desktop</id>
   <name>GNOME Characters</name>
   <summary>Character map application</summary>
@@ -148,6 +226,19 @@ cat <<\EOF > mf.appdata.xml.desired.ok
       Punctuation, Pictures, etc.
     </p>
     <p xml:lang="fr">Vous pouvez aussi naviguer dans les caractÃ¨res par catÃ©gories, comme par Ponctuation, Images, etc.</p>
+    <p gt:escape="yes">
+      Did you know that the copyright sign (Â©, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p xml:lang="fr">Saviez-vous que le signe de copyright (Â©, U+00A9) peut Ãªtre Ã©crit en HTML comme &amp;#xa9;, comme &amp;#169; ou comme &amp;copy; ?</p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p xml:lang="fr">Ãcrit par &author1;, &author2;, et &author3;.</p>
+    <p gt:escape="no">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference Â©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'Ã©chappements: operateur x&y, entitÃ©s XML standard & " ' & &lt; &gt;, caractÃ¨re Â©, caractÃ¨re Ã©chappÃ© &amp;#xa9;, entitÃ©s &copy; &author1;</p>
+    <p gt:escape="yes">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference Â©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'Ã©chappements: operateur x&amp;y, entitÃ©s XML standard &amp; " ' &amp; &lt; &gt;, caractÃ¨re Â©, caractÃ¨re Ã©chappÃ© &amp;#xa9;, entitÃ©s &amp;copy; &amp;author1;</p>
   </description>
   <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
   <updatecontact>dueno_at_src.gnome.org</updatecontact>
diff --git a/gettext-tools/tests/xgettext-appdata-1 b/gettext-tools/tests/xgettext-appdata-1
index 7f68a5227..3c1ea5fff 100755
--- a/gettext-tools/tests/xgettext-appdata-1
+++ b/gettext-tools/tests/xgettext-appdata-1
@@ -1,7 +1,7 @@
 #!/bin/sh
 . "${srcdir=.}/init.sh"; path_prepend_ . ../src
 
-# Test of AppData support.
+# Test of AppData support: HTML markup.
 
 cat <<\EOF > xg-gs-1-empty.appdata.xml
 <?xml version="1.0"?>
diff --git a/gettext-tools/tests/xgettext-appdata-2 b/gettext-tools/tests/xgettext-appdata-2
new file mode 100644
index 000000000..980c4a45d
--- /dev/null
+++ b/gettext-tools/tests/xgettext-appdata-2
@@ -0,0 +1,121 @@
+#!/bin/sh
+. "${srcdir=.}/init.sh"; path_prepend_ . ../src
+
+# Test of AppData support: escaping of XML entities.
+
+cat <<\EOF > xg-gs-2-empty.appdata.xml
+<?xml version="1.0"?>
+<component type="desktop"/>
+EOF
+
+: ${XGETTEXT=xgettext}
+${XGETTEXT} -o xg-gs-2.pot xg-gs-2-empty.appdata.xml 2>/dev/null
+test $? = 0 || {
+  echo "Skipping test: xgettext was built without AppData support"
+  Exit 77
+}
+
+cat <<\EOF > xg-gs-2.appdata.xml
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component type="desktop">
+  <id>org.gnome.Characters.desktop</id>
+  <name>GNOME Characters</name>
+  <summary>Character map application</summary>
+  <licence>CC0</licence>
+  <description>
+    <p>
+      Characters is a simple utility application to find and insert
+      unusual characters.  It allows you to quickly find the character
+      you are looking for by searching for keywords.
+    </p>
+    <p>
+      You can also browse characters by categories, such as
+      Punctuation, Pictures, etc.
+    </p>
+    <p>
+      Did you know that the copyright sign (Â©, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p>Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;, escaped entity reference &amp;copy;</p>
+    <p>Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;, escaped entity reference &amp;copy;</p>
+  </description>
+  <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
+  <updatecontact>dueno_at_src.gnome.org</updatecontact>
+</component>
+EOF
+
+: ${XGETTEXT=xgettext}
+${XGETTEXT} --add-comments -o xg-gs-2.tmp xg-gs-2.appdata.xml || Exit 1
+func_filter_POT_Creation_Date xg-gs-2.tmp xg-gs-2.pot
+
+cat <<\EOF > xg-gs-2.ok
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
+# This file is distributed under the same license as the PACKAGE package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: PACKAGE VERSION\n"
+"Report-Msgid-Bugs-To: \n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: LANGUAGE <LL@li.org>\n"
+"Language: \n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=UTF-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+
+#: xg-gs-2.appdata.xml:9
+msgid "GNOME Characters"
+msgstr ""
+
+#: xg-gs-2.appdata.xml:10
+msgid "Character map application"
+msgstr ""
+
+#: xg-gs-2.appdata.xml:13
+msgid ""
+"Characters is a simple utility application to find and insert unusual "
+"characters. It allows you to quickly find the character you are looking for "
+"by searching for keywords."
+msgstr ""
+
+#: xg-gs-2.appdata.xml:18
+msgid ""
+"You can also browse characters by categories, such as Punctuation, Pictures, "
+"etc."
+msgstr ""
+
+#: xg-gs-2.appdata.xml:22
+msgid ""
+"Did you know that the copyright sign (Â©, U+00A9) can be written in HTML as "
+"&#xa9;, as &#169;, or as &copy;?"
+msgstr ""
+
+#: xg-gs-2.appdata.xml:28
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr ""
+
+#: xg-gs-2.appdata.xml:29 xg-gs-2.appdata.xml:30
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference Â©, escaped character reference &#xa9;, entity references &copy; "
+"&author1;, escaped entity reference &copy;"
+msgstr ""
+EOF
+
+: ${DIFF=diff}
+${DIFF} xg-gs-2.ok xg-gs-2.pot
+result=$?
+
+exit $result
diff --git a/gettext-tools/tests/xgettext-its-1 b/gettext-tools/tests/xgettext-its-1
index 22e9163ec..523dee490 100755
--- a/gettext-tools/tests/xgettext-its-1
+++ b/gettext-tools/tests/xgettext-its-1
@@ -201,7 +201,7 @@ EOF
 cat <<\EOF >messages.ok
 #. (itstool) path: message/p
 #: messages.xml:8
-msgid "This is a test message &foo;&gt;&lt;&amp;\"\""
+msgid "This is a test message &foo;><&\"\""
 msgstr ""
 
 #. (itstool) path: message/p
@@ -214,15 +214,15 @@ msgstr ""
 #: messages.xml:17
 #, no-wrap
 msgid ""
-"  $ echo '  ' &gt;&gt; /dev/null\n"
-"  $ cat &lt; /dev/yes\n"
-"  $ sleep 10 &amp;\n"
+"  $ echo '  ' >> /dev/null\n"
+"  $ cat < /dev/yes\n"
+"  $ sleep 10 &\n"
 msgstr ""
 
 #. This is a comment
 #. (itstool) path: messages/message@comment
 #: messages.xml:22
-msgid "This is a comment &lt;&gt;&amp;&quot;"
+msgid "This is a comment <>&\""
 msgstr ""
 
 #. (itstool) path: message/p