its: Do escape handling during msgfmt merge, not during xgettext. Off by default.

author Bruno Haible <bruno@clisp.org>

Tue, 1 Oct 2024 14:42:56 +0000 (16:42 +0200)

committer Bruno Haible <bruno@clisp.org>

Tue, 1 Oct 2024 17:14:44 +0000 (19:14 +0200)
author Bruno Haible <bruno@clisp.org>
Tue, 1 Oct 2024 14:42:56 +0000 (16:42 +0200)
committer Bruno Haible <bruno@clisp.org>
Tue, 1 Oct 2024 17:14:44 +0000 (19:14 +0200)
diff --git a/NEWS b/NEWS

index 88bbdde6008c785c947160aeb462bcf78d4cbe82..f423a58e796aaf97905e9ba37570fcaf50e293e8 100644 (file)
--- a/NEWS
+++ b/NEWS
@@ -1,7 +1,17 @@
  Version 0.23 - September 2024
  
  * Programming languages support:
-  - XML: XML schemas for .its and .loc files are now provided.
+  - XML:
+    o The escaping of characters such as & < > has been changed:
+      - No escaping is done any more by xgettext, when creating a POT file.
+      - Instead, extra escaping can be requested for the msgfmt pass, when
+        merging into an XML file.
+      - The default value of 'escape' in the <gt:escapeRule> was "yes";
+        now it is "no".
+      This means that existing translations of older POT files may no longer
+      fully apply. As a maintainer of a package that has translatable XML files,
+      you need to regenerate the POT file and pass it on to your translators.
+    o XML schemas for .its and .loc files are now provided.
    - Python:
      o xgettext now assumes source code for Python 3 rather than Python 2.
        This affects the interpretation of escape sequences in string literals.
diff --git a/gettext-tools/doc/gettext.texi b/gettext-tools/doc/gettext.texi

index e70b9e57bfb3396752f84f9c2dc49d66a5667787..106174e9ce3067a00d5a81ba65125595e339ec52 100644 (file)
--- a/gettext-tools/doc/gettext.texi
+++ b/gettext-tools/doc/gettext.texi
@@ -10656,6 +10656,11 @@ appdata-tools, appstream, libappstream-glib-dev
  @subsection Preparing Rules for XML Internationalization
  @cindex preparing rules for XML translation
  
+@c The ITS support in GNU gettext was designed so as to supersede
+@c the GNOME itstool <https://itstool.org/>.  See
+@c <https://lists.gnu.org/archive/html/bug-gettext/2015-10/msg00001.html> and
+@c <https://mail.gnome.org/archives/desktop-devel-list/2015-October/msg00013.html>.
+
  @menu
  * ITS Rules::                   Specifying ITS Rules
  * Locating Rules::              Specifying where to find the ITS Rules
@@ -10674,6 +10679,7 @@ categories:
  
  @table @samp
  @item Context
+@c Rationale: Glade 2.
  
  This data category associates @code{msgctxt} to the extracted text.  In
  the global rule, the @code{contextRule} element contains the following:
@@ -10692,23 +10698,8 @@ An optional @code{textPointer} attribute that contains a relative
  selector pointing to a node that holds the @code{msgid} value.
  @end itemize
  
-@item Escape Special Characters
-
-This data category indicates whether the special XML characters
-(@code{<}, @code{>}, @code{&}, @code{"}) are escaped with entity
-reference.  In the global rule, the @code{escapeRule} element contains
-the following:
-
-@itemize @bullet
-@item
-A required @code{selector} attribute.  It contains an absolute selector
-that selects the nodes to which this rule applies.
-
-@item
-A required @code{escape} attribute with the value @code{yes} or @code{no}.
-@end itemize
-
  @item Extended Preserve Space
+@c Rationale: GSettings.
  
  This data category extends the standard @samp{Preserve Space} data
  category with the additional values @samp{trim} and @samp{paragraph}.
@@ -10728,6 +10719,28 @@ A required @code{space} attribute with the value @code{default},
  @code{preserve}, @code{trim}, or @code{paragraph}.
  @end itemize
  
+@item Escape Special Characters
+
+This data category indicates whether the special XML characters
+(@code{<}, @code{>}, @code{&}, @code{"}) are escaped with entity
+references.  In the global rule, the @code{escapeRule} element contains
+the following:
+
+@itemize @bullet
+@item
+A required @code{selector} attribute.  It contains an absolute selector
+that selects the nodes to which this rule applies.
+
+@item
+A required @code{escape} attribute with the value @code{yes} or @code{no}.
+@end itemize
+
+@noindent
+The default value, @code{no}, should be good for most XML file types.
+A rule with @code{escape="no"},
+that was necessary with GNU gettext versions before 0.23,
+is now redundant.
+
  @end table
  
  All those extended data categories can only be expressed with global
@@ -10818,19 +10831,30 @@ from the matching XML files.
  
  @subsubsection Two Use-cases of Translated Strings in XML
  
-For XML, there are two use-cases of translated strings.  One is the case
-where the translated strings are directly consumed by programs, and the
-other is the case where the translated strings are merged back to the
-original XML document.  In the former case, special characters in the
-extracted strings shouldn't be escaped, while they should in the latter
-case.  To control whether to escape special characters, the @samp{Escape
-Special Characters} data category can be used.
-
-To merge the translations, the @samp{msgfmt} program can be used with
-the option @code{--xml}.  @xref{msgfmt Invocation}, for more details
-about how one calls the @samp{msgfmt} program.  @samp{msgfmt}'s
-@code{--xml} option doesn't perform character escaping, so translated
-strings can have arbitrary XML constructs, such as elements for markup.
+After strings have been extracted from an XML file to a POT file
+through @code{xgettext}
+and the translator has produced a PO file with translations,
+it can be used in two ways:
+
+@itemize @bullet
+@item
+The PO file (or the MO file generated from it) can be directly consumed
+by a program.
+
+@item
+Or the translated strings can be merged back to the original XML document.
+To do this use the @code{msgfmt} program with the option @code{--xml}.
+@xref{msgfmt Invocation}, for more details about how one calls
+the @samp{msgfmt} program.
+
+During this merge from a PO file into an XML file, it may happen that
+more escaping of special characters for XML is needed
+than what @code{msgfmt} does by default.
+In this case, you can enforce more escaping
+either throuch an @code{<escapeRule>} ITS rule,
+or through an attribute @code{gt:escape="yes"} on the particular XML element.
+
+@end itemize
  
  @c This is the template for new data formats.
  @ignore
diff --git a/gettext-tools/its/glade1.its b/gettext-tools/its/glade1.its

index 874b5e98187a5fe106f219791b9799419ca9b1f8..42f73b78031d0d0ac6aa48998587a3959aa88173 100644 (file)
--- a/gettext-tools/its/glade1.its
+++ b/gettext-tools/its/glade1.its
@@ -1,6 +1,6 @@
  <?xml version="1.0"?>
  <!--
-  Copyright (C) 2015 Free Software Foundation, Inc.
+  Copyright (C) 2015-2024 Free Software Foundation, Inc.
    This file was written by Daiki Ueno <ueno@gnu.org>, 2015.
  
    This program is free software: you can redistribute it and/or modify
@@ -31,5 +31,7 @@
                       translate="yes"/>
  
    <its:preserveSpaceRule selector="/GTK-Interface" space="preserve"/>
+
+  <!-- This rule is redundant since gettext 0.23.  -->
    <gt:escapeRule selector="/GTK-Interface" escape="no"/>
  </its:rules>
diff --git a/gettext-tools/its/glade2.its b/gettext-tools/its/glade2.its

index e6133ae8185471f93f6665537cb88e87ebac954a..48220f302683939e616ece6c8a970c137c885794 100644 (file)
--- a/gettext-tools/its/glade2.its
+++ b/gettext-tools/its/glade2.its
@@ -1,6 +1,6 @@
  <?xml version="1.0"?>
  <!--
-  Copyright (C) 2015 Free Software Foundation, Inc.
+  Copyright (C) 2015-2024 Free Software Foundation, Inc.
    This file was written by Daiki Ueno <ueno@gnu.org>, 2015.
  
    This program is free software: you can redistribute it and/or modify
@@ -40,5 +40,7 @@
                    textPointer="substring-after(., '|')"/>
  
    <its:preserveSpaceRule selector="/glade-interface" space="preserve"/>
+
+  <!-- This rule is redundant since gettext 0.23.  -->
    <gt:escapeRule selector="/glade-interface" escape="no"/>
  </its:rules>
diff --git a/gettext-tools/its/gsettings.its b/gettext-tools/its/gsettings.its

index 930ec42381ea79fdf1dc718d06e740af5cb9a910..c69f1d2f477827514453ce5356e321dbf9bb921a 100644 (file)
--- a/gettext-tools/its/gsettings.its
+++ b/gettext-tools/its/gsettings.its
@@ -1,6 +1,6 @@
  <?xml version="1.0"?>
  <!--
-  Copyright (C) 2015 Free Software Foundation, Inc.
+  Copyright (C) 2015-2024 Free Software Foundation, Inc.
    This file was written by Daiki Ueno <ueno@gnu.org>, 2015.
  
    This program is free software: you can redistribute it and/or modify
@@ -28,5 +28,7 @@
    <gt:escapeRule selector="//default/@context" escape="no"/>
  
    <gt:preserveSpaceRule selector="//default" space="trim"/>
+
+  <!-- This rule is redundant since gettext 0.23.  -->
    <gt:escapeRule selector="/schemalist" escape="no"/>
  </its:rules>
diff --git a/gettext-tools/its/gtkbuilder.its b/gettext-tools/its/gtkbuilder.its

index 8078e1d4ffe76feed72f1f4e6ee4e097815d5a2f..a984511d5f96792d91c0c00bd477fd7cd8d19eb5 100644 (file)
--- a/gettext-tools/its/gtkbuilder.its
+++ b/gettext-tools/its/gtkbuilder.its
@@ -1,6 +1,6 @@
  <?xml version="1.0"?>
  <!--
-  Copyright (C) 2015, 2023 Free Software Foundation, Inc.
+  Copyright (C) 2015-2024 Free Software Foundation, Inc.
    This file was written by Daiki Ueno <ueno@gnu.org>, 2015.
  
    This program is free software: you can redistribute it and/or modify
@@ -35,5 +35,7 @@
    <gt:contextRule selector="/interface//*[@context]" contextPointer="@context"/>
  
    <its:preserveSpaceRule selector="/interface" space="preserve"/>
+
+  <!-- This rule is redundant since gettext 0.23.  -->
    <gt:escapeRule selector="/interface" escape="no"/>
  </its:rules>
diff --git a/gettext-tools/its/metainfo.its b/gettext-tools/its/metainfo.its

index 29b31f035b3d7ea6d8e7c7e8dead81edbc5d85b8..466c250a3736748836c89f858fce63c2f55ca532 100644 (file)
--- a/gettext-tools/its/metainfo.its
+++ b/gettext-tools/its/metainfo.its
@@ -1,6 +1,6 @@
  <?xml version="1.0"?>
  <!--
-  Copyright (C) 2015, 2017 Free Software Foundation, Inc.
+  Copyright (C) 2015-2024 Free Software Foundation, Inc.
    This file was written by Daiki Ueno <ueno@gnu.org>, 2015.
  
    This program is free software: you can redistribute it and/or modify
@@ -17,6 +17,7 @@
    along with this program.  If not, see <https://www.gnu.org/licenses/>.
  -->
  <its:rules xmlns:its="http://www.w3.org/2005/11/its"
+           xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0"
             version="2.0">
    <its:translateRule selector="/component" translate="no"/>
    <its:translateRule selector="/component/name |
@@ -25,4 +26,7 @@
                                 /component/developer_name |
                                 /component/screenshots/screenshot/caption"
                       translate="yes"/>
+
+  <!-- This rule is redundant since gettext 0.23.  -->
+  <gt:escapeRule selector="/component" escape="no"/>
  </its:rules>
diff --git a/gettext-tools/src/its-extensions.xsd b/gettext-tools/src/its-extensions.xsd

index 116ef18daf348af22854b7053ccf4b47e1a3ed25..4cb19e552a4f6c80c67c43582d0e8315b9eac601 100644 (file)
--- a/gettext-tools/src/its-extensions.xsd
+++ b/gettext-tools/src/its-extensions.xsd
@@ -50,7 +50,7 @@ Written by Bruno Haible &lt;bruno@clisp.org&gt;, 2024.
    </complexType>
  
    <!-- If no <gt:escapeRule> is present, the default 'escape' property
-       is "yes".  -->
+       is "no".  -->
    <complexType name="EscapeRuleType">
      <attribute name="selector" type="string" use="required"></attribute>
      <attribute name="escape" use="required">
diff --git a/gettext-tools/src/its.c b/gettext-tools/src/its.c

index 4ac5283d935a0a669458dee48c50a12057580567..a00e9ddd769b8d86a2b0a4d8f5a7b51291b1a810 100644 (file)
--- a/gettext-tools/src/its.c
+++ b/gettext-tools/src/its.c
@@ -846,7 +846,7 @@ its_localization_note_rule_constructor (struct its_rule_ty *rule, xmlNode *node)
      {
        /* FIXME: Respect space attribute.  */
        char *content = _its_collect_text_content (n, ITS_WHITESPACE_NORMALIZE,
-                                                 true);
+                                                 false);
        its_value_list_append (&rule->values, "locNote", content);
        free (content);
      }
@@ -1771,13 +1771,34 @@ its_rule_list_extract_text (its_rule_list_ty *rules,
        struct its_value_list_ty *values;
        const char *value;
        char *msgid = NULL, *msgctxt = NULL, *comment = NULL;
-      bool no_escape;
+      bool do_escape;
+      bool do_escape_during_extract;
        enum its_whitespace_type_ty whitespace;
        
        values = its_rule_list_eval (rules, node);
  
        value = its_value_list_get_value (values, "escape");
-      no_escape = value != NULL && strcmp (value, "no") == 0;
+      do_escape = value != NULL && strcmp (value, "yes") == 0;
+      /* Consider also a locally declared 'gt:escape' attribute.  */
+      if (node->type == XML_ELEMENT_NODE
+          && xmlHasNsProp (node, BAD_CAST "escape", BAD_CAST GT_NS))
+        {
+          char *prop = _its_get_attribute (node, "escape", GT_NS);
+          if (strcmp (prop, "yes") == 0 || strcmp (prop, "no") == 0)
+            do_escape = strcmp (prop, "yes") == 0;
+          free (prop);
+        }
+
+      do_escape_during_extract = do_escape;
+      /* But no, during message extraction (i.e. what xgettext does), we do
+         *not* want escaping to be done.  The contents of the POT file is meant
+         for translators, and
+           - the messages are not labelled as requiring XML content syntax,
+           - it is better for the translators if they can write various
+             characters such as & < > without escaping them.
+         Escaping needs to happen in the message merge phase (i.e. what msgfmt
+         does) instead.  */
+      do_escape_during_extract = false;
  
        value = its_value_list_get_value (values, "locNote");
        if (value)
@@ -1787,7 +1808,7 @@ its_rule_list_extract_text (its_rule_list_ty *rules,
            value = its_value_list_get_value (values, "locNotePointer");
            if (value)
              comment = _its_get_content (rules, node, value, ITS_WHITESPACE_TRIM,
-                                        !no_escape);
+                                        do_escape_during_extract);
          }
  
        if (comment != NULL && *comment != '\0')
@@ -1841,17 +1862,18 @@ its_rule_list_extract_text (its_rule_list_ty *rules,
        value = its_value_list_get_value (values, "contextPointer");
        if (value)
          msgctxt = _its_get_content (rules, node, value, ITS_WHITESPACE_PRESERVE,
-                                    !no_escape);
+                                    do_escape_during_extract);
  
        value = its_value_list_get_value (values, "textPointer");
        if (value)
          msgid = _its_get_content (rules, node, value, ITS_WHITESPACE_PRESERVE,
-                                  !no_escape);
+                                  do_escape_during_extract);
        its_value_list_destroy (values);
        free (values);
  
        if (msgid == NULL)
-        msgid = _its_collect_text_content (node, whitespace, !no_escape);
+        msgid = _its_collect_text_content (node, whitespace,
+                                           do_escape_during_extract);
        if (*msgid != '\0')
          {
            lex_pos_ty pos;
@@ -1939,6 +1961,82 @@ struct its_merge_context_ty
    struct its_node_list_ty nodes;
  };
  
+/* Returns true if S starts with a character reference.  */
+static bool
+starts_with_character_reference (const char *s)
+{
+  /* <https://www.w3.org/TR/xml/#NT-CharRef> defines
+     CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'  */
+  if (*s == '&')
+    {
+      s++;
+      if (*s == '#')
+        {
+          s++;
+          if (*s >= '0' && *s <= '9')
+            {
+              do
+                s++;
+              while (*s >= '0' && *s <= '9');
+              return *s == ';';
+            }
+          if (*s == 'x')
+            {
+              s++;
+              if ((*s >= '0' && *s <= '9')
+                  || (*s >= 'A' && *s <= 'F')
+                  || (*s >= 'a' && *s <= 'f'))
+                {
+                  do
+                    s++;
+                  while ((*s >= '0' && *s <= '9')
+                         || (*s >= 'A' && *s <= 'F')
+                         || (*s >= 'a' && *s <= 'f'));
+                  return *s == ';';
+                }
+            }
+        }
+    }
+  return false;
+}
+
+static char *
+_its_encode_special_chars_for_merge (const char *content)
+{
+  const char *str;
+  size_t amount = 0;
+  char *result, *p;
+
+  for (str = content; *str != '\0'; str++)
+    {
+      if (*str == '&' && starts_with_character_reference (str))
+        amount += sizeof ("&amp;");
+      else if (*str == '<')
+        amount += sizeof ("&lt;");
+      else if (*str == '>')
+        amount += sizeof ("&gt;");
+      else
+        amount += 1;
+    }
+
+  result = XNMALLOC (amount + 1, char);
+  *result = '\0';
+  p = result;
+  for (str = content; *str != '\0'; str++)
+    {
+      if (*str == '&' && starts_with_character_reference (str))
+        p = stpcpy (p, "&amp;");
+      else if (*str == '<')
+        p = stpcpy (p, "&lt;");
+      else if (*str == '>')
+        p = stpcpy (p, "&gt;");
+      else
+        *p++ = *str;
+    }
+  *p = '\0';
+  return result;
+}
+
  static void
  its_merge_context_merge_node (struct its_merge_context_ty *context,
                                xmlNode *node,
@@ -1950,13 +2048,29 @@ its_merge_context_merge_node (struct its_merge_context_ty *context,
        struct its_value_list_ty *values;
        const char *value;
        char *msgid = NULL, *msgctxt = NULL;
-      bool no_escape;
+      bool do_escape;
+      bool do_escape_during_extract;
+      bool do_escape_during_merge;
        enum its_whitespace_type_ty whitespace;
  
        values = its_rule_list_eval (context->rules, node);
  
        value = its_value_list_get_value (values, "escape");
-      no_escape = value != NULL && strcmp (value, "no") == 0;
+      do_escape = value != NULL && strcmp (value, "yes") == 0;
+      /* Consider also a locally declared 'gt:escape' attribute.  */
+      if (xmlHasNsProp (node, BAD_CAST "escape", BAD_CAST GT_NS))
+        {
+          char *prop = _its_get_attribute (node, "escape", GT_NS);
+          if (strcmp (prop, "yes") == 0 || strcmp (prop, "no") == 0)
+            do_escape = strcmp (prop, "yes") == 0;
+          free (prop);
+        }
+
+      do_escape_during_extract = do_escape;
+      /* Like above, in its_rule_list_extract_text.  */
+      do_escape_during_extract = false;
+
+      do_escape_during_merge = do_escape;
  
        value = its_value_list_get_value (values, "space");
        if (value && strcmp (value, "preserve") == 0)
@@ -1971,17 +2085,20 @@ its_merge_context_merge_node (struct its_merge_context_ty *context,
        value = its_value_list_get_value (values, "contextPointer");
        if (value)
          msgctxt = _its_get_content (context->rules, node, value,
-                                    ITS_WHITESPACE_PRESERVE, !no_escape);
+                                    ITS_WHITESPACE_PRESERVE,
+                                    do_escape_during_extract);
  
        value = its_value_list_get_value (values, "textPointer");
        if (value)
          msgid = _its_get_content (context->rules, node, value,
-                                  ITS_WHITESPACE_PRESERVE, !no_escape);
+                                  ITS_WHITESPACE_PRESERVE,
+                                  do_escape_during_extract);
        its_value_list_destroy (values);
        free (values);
  
        if (msgid == NULL)
-        msgid = _its_collect_text_content (node, whitespace, !no_escape);
+        msgid = _its_collect_text_content (node, whitespace,
+                                           do_escape_during_extract);
        if (*msgid != '\0')
          {
            message_ty *mp;
@@ -1994,7 +2111,50 @@ its_merge_context_merge_node (struct its_merge_context_ty *context,
                translated = xmlNewNode (node->ns, node->name);
                xmlSetProp (translated, BAD_CAST "xml:lang", BAD_CAST language);
  
-              xmlNodeAddContent (translated, BAD_CAST mp->msgstr);
+              /* libxml2 offers two functions for setting the content of an
+                 element: xmlNodeSetContent and xmlNodeAddContent.  They differ
+                 in the amount of escaping they do:
+                 - xmlNodeSetContent does no escaping, at the risk of creating
+                   malformed XML.
+                 - xmlNodeAddContent escapes all of & < >, which always produces
+                   well-formed XML but is not the right thing for entity
+                   references.
+                 We need a middle ground between both, that is adapted to what
+                 translators will usually produce.
+
+                 translated       | no escaping | middle-ground | full escaping
+                                  | SetContent  |               | AddContent
+                 -----------------+-------------+---------------+--------------
+                 &                | &           | &             | &amp;
+                 &quot;           | &quot;      | &quot;        | &amp;quot;
+                 &amp;            | &amp;       | &amp;         | &amp;amp;
+                 <                | <           | &lt;          | &lt;
+                 >                | >           | &gt;          | &gt;
+                 &lt;             | &lt;        | &lt;          | &amp;lt;
+                 &gt;             | &gt;        | &gt;          | &amp;gt;
+                 &#xa9;           | &#xa9;      | &amp;#xa9;    | &amp;#xa9;
+                 &copy;           | &copy;      | &copy;        | &amp;copy;
+                 -----------------+-------------+---------------+--------------
+
+                 The function _its_encode_special_chars_for_merge implements
+                 this middle-ground.  But we allow full escaping to be requested
+                 through a gt:escape="yes" attribute.  */
+
+              if (do_escape_during_merge)
+                {
+                  /* These three are equivalent:
+                     xmlNodeAddContent (translated, BAD_CAST mp->msgstr);
+                     xmlNodeSetContent (translated, xmlEncodeEntitiesReentrant (context->doc, BAD_CAST mp->msgstr));
+                     xmlNodeSetContent (translated, xmlEncodeSpecialChars (context->doc, BAD_CAST mp->msgstr));  */
+                  xmlNodeAddContent (translated, BAD_CAST mp->msgstr);
+                }
+              else
+                {
+                  char *middle_ground = _its_encode_special_chars_for_merge (mp->msgstr);
+                  xmlNodeSetContent (translated, BAD_CAST middle_ground);
+                  free (middle_ground);
+                }
+
                xmlAddNextSibling (node, translated);
              }
          }
diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am

index 0bbd1c022a31e93c93b6d6b31000a5366b332036..45416014148d4b4217ba9ee97533a0ad175f86ef 100644 (file)
--- a/gettext-tools/tests/Makefile.am
+++ b/gettext-tools/tests/Makefile.am
@@ -84,7 +84,7 @@ TESTS = gettext-1 gettext-2 \
         xgettext-13 xgettext-14 xgettext-15 xgettext-16 xgettext-17 \
         xgettext-18 \
         xgettext-combine-1 xgettext-combine-2 xgettext-combine-3 \
-       xgettext-appdata-1 \
+       xgettext-appdata-1 xgettext-appdata-2 \
         xgettext-awk-1 xgettext-awk-2 xgettext-awk-3 \
         xgettext-awk-stackovfl-1 xgettext-awk-stackovfl-2 \
         xgettext-c-2 xgettext-c-3 xgettext-c-4 xgettext-c-5 xgettext-c-6 \
diff --git a/gettext-tools/tests/msgfmt-xml-1 b/gettext-tools/tests/msgfmt-xml-1

index c7de103c765d98b1371a695cf783870d383d3e17..856c030cf2a5adffb7bd213e04c448f00443e96e 100755 (executable)
--- a/gettext-tools/tests/msgfmt-xml-1
+++ b/gettext-tools/tests/msgfmt-xml-1
@@ -5,7 +5,12 @@
  
  cat <<\EOF > mf.appdata.xml
  <?xml version="1.0" encoding="UTF-8"?>
-<component type="desktop">
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component type="desktop" xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0">
    <id>org.gnome.Characters.desktop</id>
    <name>GNOME Characters</name>
    <summary>Character map application</summary>
@@ -20,6 +25,15 @@ cat <<\EOF > mf.appdata.xml
        You can also browse characters by categories, such as
        Punctuation, Pictures, etc.
      </p>
+    <p gt:escape="yes">
+      Did you know that the copyright sign (&#xa9;, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p gt:escape="no">Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p gt:escape="yes">Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
    </description>
    <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
    <updatecontact>dueno_at_src.gnome.org</updatecontact>
@@ -61,11 +75,34 @@ msgid ""
  msgstr ""
  "Vous pouvez aussi naviguer dans les caractères par catégories, comme par "
  "Ponctuation, Images, etc."
+
+msgid ""
+"Did you know that the copyright sign (©, U+00A9) can be written in HTML as "
+"&#xa9;, as &#169;, or as &copy;?"
+msgstr ""
+"Saviez-vous que le signe de copyright (©, U+00A9) peut être écrit en HTML "
+"comme &#xa9;, comme &#169; ou comme &copy; ?"
+
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr "Écrit par &author1;, &author2;, et &author3;."
+
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference ©, escaped character reference &#xa9;, entity references &copy; "
+"&author1;"
+msgstr ""
+"Exposition d'échappements: operateur x&y, entités XML standard & \" ' & < >, "
+"caractère ©, caractère échappé &#xa9;, entités &copy; &author1;"
  EOF
  
  cat <<\EOF > mf.appdata.xml.ok
  <?xml version="1.0" encoding="UTF-8"?>
-<component type="desktop">
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0" type="desktop">
    <id>org.gnome.Characters.desktop</id>
    <name>GNOME Characters</name>
    <summary>Character map application</summary>
@@ -82,6 +119,19 @@ cat <<\EOF > mf.appdata.xml.ok
        Punctuation, Pictures, etc.
      </p>
      <p xml:lang="fr">Vous pouvez aussi naviguer dans les caractères par catégories, comme par Ponctuation, Images, etc.</p>
+    <p gt:escape="yes">
+      Did you know that the copyright sign (©, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p xml:lang="fr">Saviez-vous que le signe de copyright (©, U+00A9) peut être écrit en HTML comme &amp;#xa9;, comme &amp;#169; ou comme &amp;copy; ?</p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p xml:lang="fr">Écrit par &author1;, &author2;, et &author3;.</p>
+    <p gt:escape="no">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference ©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'échappements: operateur x&y, entités XML standard & " ' & &lt; &gt;, caractère ©, caractère échappé &amp;#xa9;, entités &copy; &author1;</p>
+    <p gt:escape="yes">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference ©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'échappements: operateur x&amp;y, entités XML standard &amp; " ' &amp; &lt; &gt;, caractère ©, caractère échappé &amp;#xa9;, entités &amp;copy; &amp;author1;</p>
    </description>
    <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
    <updatecontact>dueno_at_src.gnome.org</updatecontact>
diff --git a/gettext-tools/tests/msgfmt-xml-2 b/gettext-tools/tests/msgfmt-xml-2

index f8d51f16478fc422d1a032b80fe6f4fbb9a9ee76..10e136f94ba3116b9a5bf43fe3c2fc74966095df 100755 (executable)
--- a/gettext-tools/tests/msgfmt-xml-2
+++ b/gettext-tools/tests/msgfmt-xml-2
@@ -5,7 +5,12 @@
  
  cat <<\EOF > mf.appdata.xml
  <?xml version="1.0" encoding="UTF-8"?>
-<component type="desktop">
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component type="desktop" xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0">
    <id>org.gnome.Characters.desktop</id>
    <name>GNOME Characters</name>
    <summary>Character map application</summary>
@@ -20,6 +25,15 @@ cat <<\EOF > mf.appdata.xml
        You can also browse characters by categories, such as
        Punctuation, Pictures, etc.
      </p>
+    <p gt:escape="yes">
+      Did you know that the copyright sign (&#xa9;, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p gt:escape="no">Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p gt:escape="yes">Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
    </description>
    <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
    <updatecontact>dueno_at_src.gnome.org</updatecontact>
@@ -63,6 +77,24 @@ msgid ""
  msgstr ""
  "Vous pouvez aussi naviguer dans les caractères par catégories, comme par "
  "Ponctuation, Images, etc."
+
+msgid ""
+"Did you know that the copyright sign (©, U+00A9) can be written in HTML as "
+"&#xa9;, as &#169;, or as &copy;?"
+msgstr ""
+"Saviez-vous que le signe de copyright (©, U+00A9) peut être écrit en HTML "
+"comme &#xa9;, comme &#169; ou comme &copy; ?"
+
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr "Écrit par &author1;, &author2;, et &author3;."
+
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference ©, escaped character reference &#xa9;, entity references &copy; "
+"&author1;"
+msgstr ""
+"Exposition d'échappements: operateur x&y, entités XML standard & \" ' & < >, "
+"caractère ©, caractère échappé &#xa9;, entités &copy; &author1;"
  EOF
  
  cat <<\EOF > po/de.po
@@ -100,11 +132,35 @@ msgid ""
  msgstr ""
  "Sie können ebenfalls nach Kategorie suchen, wie z.B. nach Zeichensetzung oder "
  "Bildern."
+
+msgid ""
+"Did you know that the copyright sign (©, U+00A9) can be written in HTML as "
+"&#xa9;, as &#169;, or as &copy;?"
+msgstr ""
+"Wussten Sie, dass das Copyright-Zeichen (©, U+00A9) in HTML als "
+"&#xa9;, als &#169;, oder als &copy; "
+"geschrieben werden kann?"
+
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr "Geschrieben von &author1;, &author2; und &author3;."
+
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference ©, escaped character reference &#xa9;, entity references &copy; "
+"&author1;"
+msgstr ""
+"Escape-Beispiele: Operator x&y, Standard-XML Entitäten & \" ' & < >, Zeichen "
+"©, escaptes Zeichen &#xa9;, Entitäten &copy; &author1;"
  EOF
  
  cat <<\EOF > mf.appdata.xml.ok
  <?xml version="1.0" encoding="UTF-8"?>
-<component type="desktop">
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0" type="desktop">
    <id>org.gnome.Characters.desktop</id>
    <name>GNOME Characters</name>
    <summary>Character map application</summary>
@@ -123,6 +179,23 @@ cat <<\EOF > mf.appdata.xml.ok
      </p>
      <p xml:lang="fr">Vous pouvez aussi naviguer dans les caractères par catégories, comme par Ponctuation, Images, etc.</p>
      <p xml:lang="de">Sie können ebenfalls nach Kategorie suchen, wie z.B. nach Zeichensetzung oder Bildern.</p>
+    <p gt:escape="yes">
+      Did you know that the copyright sign (©, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p xml:lang="fr">Saviez-vous que le signe de copyright (©, U+00A9) peut être écrit en HTML comme &amp;#xa9;, comme &amp;#169; ou comme &amp;copy; ?</p>
+    <p xml:lang="de">Wussten Sie, dass das Copyright-Zeichen (©, U+00A9) in HTML als &amp;#xa9;, als &amp;#169;, oder als &amp;copy; geschrieben werden kann?</p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p xml:lang="fr">Écrit par &author1;, &author2;, et &author3;.</p>
+    <p xml:lang="de">Geschrieben von &author1;, &author2; und &author3;.</p>
+    <p gt:escape="no">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference ©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'échappements: operateur x&y, entités XML standard & " ' & &lt; &gt;, caractère ©, caractère échappé &amp;#xa9;, entités &copy; &author1;</p>
+    <p xml:lang="de">Escape-Beispiele: Operator x&y, Standard-XML Entitäten & " ' & &lt; &gt;, Zeichen ©, escaptes Zeichen &amp;#xa9;, Entitäten &copy; &author1;</p>
+    <p gt:escape="yes">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference ©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'échappements: operateur x&amp;y, entités XML standard &amp; " ' &amp; &lt; &gt;, caractère ©, caractère échappé &amp;#xa9;, entités &amp;copy; &amp;author1;</p>
+    <p xml:lang="de">Escape-Beispiele: Operator x&amp;y, Standard-XML Entitäten &amp; " ' &amp; &lt; &gt;, Zeichen ©, escaptes Zeichen &amp;#xa9;, Entitäten &amp;copy; &amp;author1;</p>
    </description>
    <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
    <updatecontact>dueno_at_src.gnome.org</updatecontact>
@@ -131,7 +204,12 @@ EOF
  
  cat <<\EOF > mf.appdata.xml.desired.ok
  <?xml version="1.0" encoding="UTF-8"?>
-<component type="desktop">
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component xmlns:gt="https://www.gnu.org/s/gettext/ns/its/extensions/1.0" type="desktop">
    <id>org.gnome.Characters.desktop</id>
    <name>GNOME Characters</name>
    <summary>Character map application</summary>
@@ -148,6 +226,19 @@ cat <<\EOF > mf.appdata.xml.desired.ok
        Punctuation, Pictures, etc.
      </p>
      <p xml:lang="fr">Vous pouvez aussi naviguer dans les caractères par catégories, comme par Ponctuation, Images, etc.</p>
+    <p gt:escape="yes">
+      Did you know that the copyright sign (©, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p xml:lang="fr">Saviez-vous que le signe de copyright (©, U+00A9) peut être écrit en HTML comme &amp;#xa9;, comme &amp;#169; ou comme &amp;copy; ?</p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p xml:lang="fr">Écrit par &author1;, &author2;, et &author3;.</p>
+    <p gt:escape="no">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference ©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'échappements: operateur x&y, entités XML standard & " ' & &lt; &gt;, caractère ©, caractère échappé &amp;#xa9;, entités &copy; &author1;</p>
+    <p gt:escape="yes">Escape gallery: operator x&amp;y, standard XML entities &amp; " ' &amp; &lt; &gt;, character reference ©, escaped character reference &amp;#xa9;, entity references &copy; &author1;</p>
+    <p xml:lang="fr">Exposition d'échappements: operateur x&amp;y, entités XML standard &amp; " ' &amp; &lt; &gt;, caractère ©, caractère échappé &amp;#xa9;, entités &amp;copy; &amp;author1;</p>
    </description>
    <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
    <updatecontact>dueno_at_src.gnome.org</updatecontact>
diff --git a/gettext-tools/tests/xgettext-appdata-1 b/gettext-tools/tests/xgettext-appdata-1

index 7f68a522739ccc727830946adb2d8b8e643da769..3c1ea5fff428a2670db022c7680ac31e7037f808 100755 (executable)
--- a/gettext-tools/tests/xgettext-appdata-1
+++ b/gettext-tools/tests/xgettext-appdata-1
@@ -1,7 +1,7 @@
  #!/bin/sh
  . "${srcdir=.}/init.sh"; path_prepend_ . ../src
  
-# Test of AppData support.
+# Test of AppData support: HTML markup.
  
  cat <<\EOF > xg-gs-1-empty.appdata.xml
  <?xml version="1.0"?>
diff --git a/gettext-tools/tests/xgettext-appdata-2 b/gettext-tools/tests/xgettext-appdata-2

new file mode 100644 (file)

index 0000000..980c4a4
--- /dev/null
+++ b/gettext-tools/tests/xgettext-appdata-2
@@ -0,0 +1,121 @@
+#!/bin/sh
+. "${srcdir=.}/init.sh"; path_prepend_ . ../src
+
+# Test of AppData support: escaping of XML entities.
+
+cat <<\EOF > xg-gs-2-empty.appdata.xml
+<?xml version="1.0"?>
+<component type="desktop"/>
+EOF
+
+: ${XGETTEXT=xgettext}
+${XGETTEXT} -o xg-gs-2.pot xg-gs-2-empty.appdata.xml 2>/dev/null
+test $? = 0 || {
+  echo "Skipping test: xgettext was built without AppData support"
+  Exit 77
+}
+
+cat <<\EOF > xg-gs-2.appdata.xml
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE component PUBLIC "" "" [
+<!ENTITY author1 "Giovanni Campagna">
+<!ENTITY author2 "Daiki Ueno">
+<!ENTITY author3 "Bilal Elmoussaoui">
+]>
+<component type="desktop">
+  <id>org.gnome.Characters.desktop</id>
+  <name>GNOME Characters</name>
+  <summary>Character map application</summary>
+  <licence>CC0</licence>
+  <description>
+    <p>
+      Characters is a simple utility application to find and insert
+      unusual characters.  It allows you to quickly find the character
+      you are looking for by searching for keywords.
+    </p>
+    <p>
+      You can also browse characters by categories, such as
+      Punctuation, Pictures, etc.
+    </p>
+    <p>
+      Did you know that the copyright sign (©, U+00A9) can be written in HTML
+      as &amp;#xa9;,
+      as &amp;#169;,
+      or as &amp;copy;?
+    </p>
+    <p>Written by &author1;, &author2;, and &author3;.</p>
+    <p>Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;, escaped entity reference &amp;copy;</p>
+    <p>Escape gallery: operator x&amp;y, standard XML entities &amp; &quot; &apos; &amp; &lt; &gt;, character reference &#xa9;, escaped character reference &amp;#xa9;, entity references &copy; &author1;, escaped entity reference &amp;copy;</p>
+  </description>
+  <url type="homepage">https://wiki.gnome.org/Design/Apps/CharacterMap</url>
+  <updatecontact>dueno_at_src.gnome.org</updatecontact>
+</component>
+EOF
+
+: ${XGETTEXT=xgettext}
+${XGETTEXT} --add-comments -o xg-gs-2.tmp xg-gs-2.appdata.xml || Exit 1
+func_filter_POT_Creation_Date xg-gs-2.tmp xg-gs-2.pot
+
+cat <<\EOF > xg-gs-2.ok
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
+# This file is distributed under the same license as the PACKAGE package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: PACKAGE VERSION\n"
+"Report-Msgid-Bugs-To: \n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: LANGUAGE <LL@li.org>\n"
+"Language: \n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=UTF-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+
+#: xg-gs-2.appdata.xml:9
+msgid "GNOME Characters"
+msgstr ""
+
+#: xg-gs-2.appdata.xml:10
+msgid "Character map application"
+msgstr ""
+
+#: xg-gs-2.appdata.xml:13
+msgid ""
+"Characters is a simple utility application to find and insert unusual "
+"characters. It allows you to quickly find the character you are looking for "
+"by searching for keywords."
+msgstr ""
+
+#: xg-gs-2.appdata.xml:18
+msgid ""
+"You can also browse characters by categories, such as Punctuation, Pictures, "
+"etc."
+msgstr ""
+
+#: xg-gs-2.appdata.xml:22
+msgid ""
+"Did you know that the copyright sign (©, U+00A9) can be written in HTML as "
+"&#xa9;, as &#169;, or as &copy;?"
+msgstr ""
+
+#: xg-gs-2.appdata.xml:28
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr ""
+
+#: xg-gs-2.appdata.xml:29 xg-gs-2.appdata.xml:30
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference ©, escaped character reference &#xa9;, entity references &copy; "
+"&author1;, escaped entity reference &copy;"
+msgstr ""
+EOF
+
+: ${DIFF=diff}
+${DIFF} xg-gs-2.ok xg-gs-2.pot
+result=$?
+
+exit $result
diff --git a/gettext-tools/tests/xgettext-its-1 b/gettext-tools/tests/xgettext-its-1

index 22e9163ecbdae87e3fa39f54784d7a639e3842af..523dee490a3d4dbf15c8071f261688938d4c5f02 100755 (executable)
--- a/gettext-tools/tests/xgettext-its-1
+++ b/gettext-tools/tests/xgettext-its-1
@@ -201,7 +201,7 @@ EOF
  cat <<\EOF >messages.ok
  #. (itstool) path: message/p
  #: messages.xml:8
-msgid "This is a test message &foo;&gt;&lt;&amp;\"\""
+msgid "This is a test message &foo;><&\"\""
  msgstr ""
  
  #. (itstool) path: message/p
@@ -214,15 +214,15 @@ msgstr ""
  #: messages.xml:17
  #, no-wrap
  msgid ""
-"  $ echo '  ' &gt;&gt; /dev/null\n"
-"  $ cat &lt; /dev/yes\n"
-"  $ sleep 10 &amp;\n"
+"  $ echo '  ' >> /dev/null\n"
+"  $ cat < /dev/yes\n"
+"  $ sleep 10 &\n"
  msgstr ""
  
  #. This is a comment
  #. (itstool) path: messages/message@comment
  #: messages.xml:22
-msgid "This is a comment &lt;&gt;&amp;&quot;"
+msgid "This is a comment <>&\""
  msgstr ""
  
  #. (itstool) path: message/p
author	Bruno Haible <bruno@clisp.org>
	Tue, 1 Oct 2024 14:42:56 +0000 (16:42 +0200)
committer	Bruno Haible <bruno@clisp.org>
	Tue, 1 Oct 2024 17:14:44 +0000 (19:14 +0200)
NEWS		patch \| blob \| blame \| history
gettext-tools/doc/gettext.texi		patch \| blob \| blame \| history
gettext-tools/its/glade1.its		patch \| blob \| blame \| history
gettext-tools/its/glade2.its		patch \| blob \| blame \| history
gettext-tools/its/gsettings.its		patch \| blob \| blame \| history
gettext-tools/its/gtkbuilder.its		patch \| blob \| blame \| history
gettext-tools/its/metainfo.its		patch \| blob \| blame \| history
gettext-tools/src/its-extensions.xsd		patch \| blob \| blame \| history
gettext-tools/src/its.c		patch \| blob \| blame \| history
gettext-tools/tests/Makefile.am		patch \| blob \| blame \| history
gettext-tools/tests/msgfmt-xml-1		patch \| blob \| blame \| history
gettext-tools/tests/msgfmt-xml-2		patch \| blob \| blame \| history
gettext-tools/tests/xgettext-appdata-1		patch \| blob \| blame \| history
gettext-tools/tests/xgettext-appdata-2	[new file with mode: 0644]	patch \| blob
gettext-tools/tests/xgettext-its-1		patch \| blob \| blame \| history