From: Bruno Haible
Date: Tue, 1 Oct 2024 14:42:56 +0000 (+0200)
Subject: its: Do escape handling during msgfmt merge, not during xgettext. Off by default.
X-Git-Tag: v0.23~92
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=eaf658bed81b831f84c53f27128696aacf8356ac;p=thirdparty%2Fgettext.git
its: Do escape handling during msgfmt merge, not during xgettext. Off by default.
Reported by Samy Mahmoudi
at .
* gettext-tools/src/its.c (its_localization_note_rule_constructor): Don't do
escaping while extracting a localization note.
(its_rule_list_extract_text): New local variable do_escape_during_extract. Don't
do escaping while extracting.
(starts_with_character_reference, _its_encode_special_chars_for_merge): New
functions.
(its_merge_context_merge_node): New local variables do_escape_during_extract,
do_escape_during_merge. Don't do escaping while extracting. Conditionally do
escaping while merging.
* gettext-tools/src/its-extensions.xsd: Mention that escape="no" is now the
default.
* gettext-tools/its/glade1.its: Add a comment.
* gettext-tools/its/glade2.its: Likewise.
* gettext-tools/its/gsettings.its: Likewise.
* gettext-tools/its/gtkbuilder.its: Likewise.
* gettext-tools/its/metainfo.its: Add a .
* gettext-tools/tests/xgettext-appdata-1: Add comment.
* gettext-tools/tests/xgettext-appdata-2: New file, based on
gettext-tools/tests/msgfmt-xml-1.
* gettext-tools/tests/Makefile.am (TESTS): Add it.
* gettext-tools/tests/xgettext-its-1: Update expected results.
* gettext-tools/tests/msgfmt-xml-1: Test also character references and entity
references.
* gettext-tools/tests/msgfmt-xml-2: Likewise.
* gettext-tools/doc/gettext.texi (ITS Rules): Under "Escape Special Characters",
explain that it is no longer necessary to write a rule with escape="no".
Rewrite section "Two Use-cases of Translated Strings in XML".
* NEWS: Mention the changes.
---
diff --git a/NEWS b/NEWS
index 88bbdde60..f423a58e7 100644
--- a/NEWS
+++ b/NEWS
@@ -1,7 +1,17 @@
Version 0.23 - September 2024
* Programming languages support:
- - XML: XML schemas for .its and .loc files are now provided.
+ - XML:
+ o The escaping of characters such as & < > has been changed:
+ - No escaping is done any more by xgettext, when creating a POT file.
+ - Instead, extra escaping can be requested for the msgfmt pass, when
+ merging into an XML file.
+ - The default value of 'escape' in the was "yes";
+ now it is "no".
+ This means that existing translations of older POT files may no longer
+ fully apply. As a maintainer of a package that has translatable XML files,
+ you need to regenerate the POT file and pass it on to your translators.
+ o XML schemas for .its and .loc files are now provided.
- Python:
o xgettext now assumes source code for Python 3 rather than Python 2.
This affects the interpretation of escape sequences in string literals.
diff --git a/gettext-tools/doc/gettext.texi b/gettext-tools/doc/gettext.texi
index e70b9e57b..106174e9c 100644
--- a/gettext-tools/doc/gettext.texi
+++ b/gettext-tools/doc/gettext.texi
@@ -10656,6 +10656,11 @@ appdata-tools, appstream, libappstream-glib-dev
@subsection Preparing Rules for XML Internationalization
@cindex preparing rules for XML translation
+@c The ITS support in GNU gettext was designed so as to supersede
+@c the GNOME itstool . See
+@c and
+@c .
+
@menu
* ITS Rules:: Specifying ITS Rules
* Locating Rules:: Specifying where to find the ITS Rules
@@ -10674,6 +10679,7 @@ categories:
@table @samp
@item Context
+@c Rationale: Glade 2.
This data category associates @code{msgctxt} to the extracted text. In
the global rule, the @code{contextRule} element contains the following:
@@ -10692,23 +10698,8 @@ An optional @code{textPointer} attribute that contains a relative
selector pointing to a node that holds the @code{msgid} value.
@end itemize
-@item Escape Special Characters
-
-This data category indicates whether the special XML characters
-(@code{<}, @code{>}, @code{&}, @code{"}) are escaped with entity
-reference. In the global rule, the @code{escapeRule} element contains
-the following:
-
-@itemize @bullet
-@item
-A required @code{selector} attribute. It contains an absolute selector
-that selects the nodes to which this rule applies.
-
-@item
-A required @code{escape} attribute with the value @code{yes} or @code{no}.
-@end itemize
-
@item Extended Preserve Space
+@c Rationale: GSettings.
This data category extends the standard @samp{Preserve Space} data
category with the additional values @samp{trim} and @samp{paragraph}.
@@ -10728,6 +10719,28 @@ A required @code{space} attribute with the value @code{default},
@code{preserve}, @code{trim}, or @code{paragraph}.
@end itemize
+@item Escape Special Characters
+
+This data category indicates whether the special XML characters
+(@code{<}, @code{>}, @code{&}, @code{"}) are escaped with entity
+references. In the global rule, the @code{escapeRule} element contains
+the following:
+
+@itemize @bullet
+@item
+A required @code{selector} attribute. It contains an absolute selector
+that selects the nodes to which this rule applies.
+
+@item
+A required @code{escape} attribute with the value @code{yes} or @code{no}.
+@end itemize
+
+@noindent
+The default value, @code{no}, should be good for most XML file types.
+A rule with @code{escape="no"},
+that was necessary with GNU gettext versions before 0.23,
+is now redundant.
+
@end table
All those extended data categories can only be expressed with global
@@ -10818,19 +10831,30 @@ from the matching XML files.
@subsubsection Two Use-cases of Translated Strings in XML
-For XML, there are two use-cases of translated strings. One is the case
-where the translated strings are directly consumed by programs, and the
-other is the case where the translated strings are merged back to the
-original XML document. In the former case, special characters in the
-extracted strings shouldn't be escaped, while they should in the latter
-case. To control whether to escape special characters, the @samp{Escape
-Special Characters} data category can be used.
-
-To merge the translations, the @samp{msgfmt} program can be used with
-the option @code{--xml}. @xref{msgfmt Invocation}, for more details
-about how one calls the @samp{msgfmt} program. @samp{msgfmt}'s
-@code{--xml} option doesn't perform character escaping, so translated
-strings can have arbitrary XML constructs, such as elements for markup.
+After strings have been extracted from an XML file to a POT file
+through @code{xgettext}
+and the translator has produced a PO file with translations,
+it can be used in two ways:
+
+@itemize @bullet
+@item
+The PO file (or the MO file generated from it) can be directly consumed
+by a program.
+
+@item
+Or the translated strings can be merged back to the original XML document.
+To do this use the @code{msgfmt} program with the option @code{--xml}.
+@xref{msgfmt Invocation}, for more details about how one calls
+the @samp{msgfmt} program.
+
+During this merge from a PO file into an XML file, it may happen that
+more escaping of special characters for XML is needed
+than what @code{msgfmt} does by default.
+In this case, you can enforce more escaping
+either throuch an @code{} ITS rule,
+or through an attribute @code{gt:escape="yes"} on the particular XML element.
+
+@end itemize
@c This is the template for new data formats.
@ignore
diff --git a/gettext-tools/its/glade1.its b/gettext-tools/its/glade1.its
index 874b5e981..42f73b780 100644
--- a/gettext-tools/its/glade1.its
+++ b/gettext-tools/its/glade1.its
@@ -1,6 +1,6 @@
diff --git a/gettext-tools/its/glade2.its b/gettext-tools/its/glade2.its
index e6133ae81..48220f302 100644
--- a/gettext-tools/its/glade2.its
+++ b/gettext-tools/its/glade2.its
@@ -1,6 +1,6 @@
diff --git a/gettext-tools/its/gsettings.its b/gettext-tools/its/gsettings.its
index 930ec4238..c69f1d2f4 100644
--- a/gettext-tools/its/gsettings.its
+++ b/gettext-tools/its/gsettings.its
@@ -1,6 +1,6 @@
diff --git a/gettext-tools/its/gtkbuilder.its b/gettext-tools/its/gtkbuilder.its
index 8078e1d4f..a984511d5 100644
--- a/gettext-tools/its/gtkbuilder.its
+++ b/gettext-tools/its/gtkbuilder.its
@@ -1,6 +1,6 @@
diff --git a/gettext-tools/its/metainfo.its b/gettext-tools/its/metainfo.its
index 29b31f035..466c250a3 100644
--- a/gettext-tools/its/metainfo.its
+++ b/gettext-tools/its/metainfo.its
@@ -1,6 +1,6 @@
+
+
+
diff --git a/gettext-tools/src/its-extensions.xsd b/gettext-tools/src/its-extensions.xsd
index 116ef18da..4cb19e552 100644
--- a/gettext-tools/src/its-extensions.xsd
+++ b/gettext-tools/src/its-extensions.xsd
@@ -50,7 +50,7 @@ Written by Bruno Haible <bruno@clisp.org>, 2024.
+ is "no". -->
diff --git a/gettext-tools/src/its.c b/gettext-tools/src/its.c
index 4ac5283d9..a00e9ddd7 100644
--- a/gettext-tools/src/its.c
+++ b/gettext-tools/src/its.c
@@ -846,7 +846,7 @@ its_localization_note_rule_constructor (struct its_rule_ty *rule, xmlNode *node)
{
/* FIXME: Respect space attribute. */
char *content = _its_collect_text_content (n, ITS_WHITESPACE_NORMALIZE,
- true);
+ false);
its_value_list_append (&rule->values, "locNote", content);
free (content);
}
@@ -1771,13 +1771,34 @@ its_rule_list_extract_text (its_rule_list_ty *rules,
struct its_value_list_ty *values;
const char *value;
char *msgid = NULL, *msgctxt = NULL, *comment = NULL;
- bool no_escape;
+ bool do_escape;
+ bool do_escape_during_extract;
enum its_whitespace_type_ty whitespace;
values = its_rule_list_eval (rules, node);
value = its_value_list_get_value (values, "escape");
- no_escape = value != NULL && strcmp (value, "no") == 0;
+ do_escape = value != NULL && strcmp (value, "yes") == 0;
+ /* Consider also a locally declared 'gt:escape' attribute. */
+ if (node->type == XML_ELEMENT_NODE
+ && xmlHasNsProp (node, BAD_CAST "escape", BAD_CAST GT_NS))
+ {
+ char *prop = _its_get_attribute (node, "escape", GT_NS);
+ if (strcmp (prop, "yes") == 0 || strcmp (prop, "no") == 0)
+ do_escape = strcmp (prop, "yes") == 0;
+ free (prop);
+ }
+
+ do_escape_during_extract = do_escape;
+ /* But no, during message extraction (i.e. what xgettext does), we do
+ *not* want escaping to be done. The contents of the POT file is meant
+ for translators, and
+ - the messages are not labelled as requiring XML content syntax,
+ - it is better for the translators if they can write various
+ characters such as & < > without escaping them.
+ Escaping needs to happen in the message merge phase (i.e. what msgfmt
+ does) instead. */
+ do_escape_during_extract = false;
value = its_value_list_get_value (values, "locNote");
if (value)
@@ -1787,7 +1808,7 @@ its_rule_list_extract_text (its_rule_list_ty *rules,
value = its_value_list_get_value (values, "locNotePointer");
if (value)
comment = _its_get_content (rules, node, value, ITS_WHITESPACE_TRIM,
- !no_escape);
+ do_escape_during_extract);
}
if (comment != NULL && *comment != '\0')
@@ -1841,17 +1862,18 @@ its_rule_list_extract_text (its_rule_list_ty *rules,
value = its_value_list_get_value (values, "contextPointer");
if (value)
msgctxt = _its_get_content (rules, node, value, ITS_WHITESPACE_PRESERVE,
- !no_escape);
+ do_escape_during_extract);
value = its_value_list_get_value (values, "textPointer");
if (value)
msgid = _its_get_content (rules, node, value, ITS_WHITESPACE_PRESERVE,
- !no_escape);
+ do_escape_during_extract);
its_value_list_destroy (values);
free (values);
if (msgid == NULL)
- msgid = _its_collect_text_content (node, whitespace, !no_escape);
+ msgid = _its_collect_text_content (node, whitespace,
+ do_escape_during_extract);
if (*msgid != '\0')
{
lex_pos_ty pos;
@@ -1939,6 +1961,82 @@ struct its_merge_context_ty
struct its_node_list_ty nodes;
};
+/* Returns true if S starts with a character reference. */
+static bool
+starts_with_character_reference (const char *s)
+{
+ /* defines
+ CharRef ::= '' [0-9]+ ';' | '' [0-9a-fA-F]+ ';' */
+ if (*s == '&')
+ {
+ s++;
+ if (*s == '#')
+ {
+ s++;
+ if (*s >= '0' && *s <= '9')
+ {
+ do
+ s++;
+ while (*s >= '0' && *s <= '9');
+ return *s == ';';
+ }
+ if (*s == 'x')
+ {
+ s++;
+ if ((*s >= '0' && *s <= '9')
+ || (*s >= 'A' && *s <= 'F')
+ || (*s >= 'a' && *s <= 'f'))
+ {
+ do
+ s++;
+ while ((*s >= '0' && *s <= '9')
+ || (*s >= 'A' && *s <= 'F')
+ || (*s >= 'a' && *s <= 'f'));
+ return *s == ';';
+ }
+ }
+ }
+ }
+ return false;
+}
+
+static char *
+_its_encode_special_chars_for_merge (const char *content)
+{
+ const char *str;
+ size_t amount = 0;
+ char *result, *p;
+
+ for (str = content; *str != '\0'; str++)
+ {
+ if (*str == '&' && starts_with_character_reference (str))
+ amount += sizeof ("&");
+ else if (*str == '<')
+ amount += sizeof ("<");
+ else if (*str == '>')
+ amount += sizeof (">");
+ else
+ amount += 1;
+ }
+
+ result = XNMALLOC (amount + 1, char);
+ *result = '\0';
+ p = result;
+ for (str = content; *str != '\0'; str++)
+ {
+ if (*str == '&' && starts_with_character_reference (str))
+ p = stpcpy (p, "&");
+ else if (*str == '<')
+ p = stpcpy (p, "<");
+ else if (*str == '>')
+ p = stpcpy (p, ">");
+ else
+ *p++ = *str;
+ }
+ *p = '\0';
+ return result;
+}
+
static void
its_merge_context_merge_node (struct its_merge_context_ty *context,
xmlNode *node,
@@ -1950,13 +2048,29 @@ its_merge_context_merge_node (struct its_merge_context_ty *context,
struct its_value_list_ty *values;
const char *value;
char *msgid = NULL, *msgctxt = NULL;
- bool no_escape;
+ bool do_escape;
+ bool do_escape_during_extract;
+ bool do_escape_during_merge;
enum its_whitespace_type_ty whitespace;
values = its_rule_list_eval (context->rules, node);
value = its_value_list_get_value (values, "escape");
- no_escape = value != NULL && strcmp (value, "no") == 0;
+ do_escape = value != NULL && strcmp (value, "yes") == 0;
+ /* Consider also a locally declared 'gt:escape' attribute. */
+ if (xmlHasNsProp (node, BAD_CAST "escape", BAD_CAST GT_NS))
+ {
+ char *prop = _its_get_attribute (node, "escape", GT_NS);
+ if (strcmp (prop, "yes") == 0 || strcmp (prop, "no") == 0)
+ do_escape = strcmp (prop, "yes") == 0;
+ free (prop);
+ }
+
+ do_escape_during_extract = do_escape;
+ /* Like above, in its_rule_list_extract_text. */
+ do_escape_during_extract = false;
+
+ do_escape_during_merge = do_escape;
value = its_value_list_get_value (values, "space");
if (value && strcmp (value, "preserve") == 0)
@@ -1971,17 +2085,20 @@ its_merge_context_merge_node (struct its_merge_context_ty *context,
value = its_value_list_get_value (values, "contextPointer");
if (value)
msgctxt = _its_get_content (context->rules, node, value,
- ITS_WHITESPACE_PRESERVE, !no_escape);
+ ITS_WHITESPACE_PRESERVE,
+ do_escape_during_extract);
value = its_value_list_get_value (values, "textPointer");
if (value)
msgid = _its_get_content (context->rules, node, value,
- ITS_WHITESPACE_PRESERVE, !no_escape);
+ ITS_WHITESPACE_PRESERVE,
+ do_escape_during_extract);
its_value_list_destroy (values);
free (values);
if (msgid == NULL)
- msgid = _its_collect_text_content (node, whitespace, !no_escape);
+ msgid = _its_collect_text_content (node, whitespace,
+ do_escape_during_extract);
if (*msgid != '\0')
{
message_ty *mp;
@@ -1994,7 +2111,50 @@ its_merge_context_merge_node (struct its_merge_context_ty *context,
translated = xmlNewNode (node->ns, node->name);
xmlSetProp (translated, BAD_CAST "xml:lang", BAD_CAST language);
- xmlNodeAddContent (translated, BAD_CAST mp->msgstr);
+ /* libxml2 offers two functions for setting the content of an
+ element: xmlNodeSetContent and xmlNodeAddContent. They differ
+ in the amount of escaping they do:
+ - xmlNodeSetContent does no escaping, at the risk of creating
+ malformed XML.
+ - xmlNodeAddContent escapes all of & < >, which always produces
+ well-formed XML but is not the right thing for entity
+ references.
+ We need a middle ground between both, that is adapted to what
+ translators will usually produce.
+
+ translated | no escaping | middle-ground | full escaping
+ | SetContent | | AddContent
+ -----------------+-------------+---------------+--------------
+ & | & | & | &
+ " | " | " | "
+ & | & | & | &
+ < | < | < | <
+ > | > | > | >
+ < | < | < | <
+ > | > | > | >
+ © | © | © | ©
+ © | © | © | ©
+ -----------------+-------------+---------------+--------------
+
+ The function _its_encode_special_chars_for_merge implements
+ this middle-ground. But we allow full escaping to be requested
+ through a gt:escape="yes" attribute. */
+
+ if (do_escape_during_merge)
+ {
+ /* These three are equivalent:
+ xmlNodeAddContent (translated, BAD_CAST mp->msgstr);
+ xmlNodeSetContent (translated, xmlEncodeEntitiesReentrant (context->doc, BAD_CAST mp->msgstr));
+ xmlNodeSetContent (translated, xmlEncodeSpecialChars (context->doc, BAD_CAST mp->msgstr)); */
+ xmlNodeAddContent (translated, BAD_CAST mp->msgstr);
+ }
+ else
+ {
+ char *middle_ground = _its_encode_special_chars_for_merge (mp->msgstr);
+ xmlNodeSetContent (translated, BAD_CAST middle_ground);
+ free (middle_ground);
+ }
+
xmlAddNextSibling (node, translated);
}
}
diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am
index 0bbd1c022..454160141 100644
--- a/gettext-tools/tests/Makefile.am
+++ b/gettext-tools/tests/Makefile.am
@@ -84,7 +84,7 @@ TESTS = gettext-1 gettext-2 \
xgettext-13 xgettext-14 xgettext-15 xgettext-16 xgettext-17 \
xgettext-18 \
xgettext-combine-1 xgettext-combine-2 xgettext-combine-3 \
- xgettext-appdata-1 \
+ xgettext-appdata-1 xgettext-appdata-2 \
xgettext-awk-1 xgettext-awk-2 xgettext-awk-3 \
xgettext-awk-stackovfl-1 xgettext-awk-stackovfl-2 \
xgettext-c-2 xgettext-c-3 xgettext-c-4 xgettext-c-5 xgettext-c-6 \
diff --git a/gettext-tools/tests/msgfmt-xml-1 b/gettext-tools/tests/msgfmt-xml-1
index c7de103c7..856c030cf 100755
--- a/gettext-tools/tests/msgfmt-xml-1
+++ b/gettext-tools/tests/msgfmt-xml-1
@@ -5,7 +5,12 @@
cat <<\EOF > mf.appdata.xml
-
+
+
+
+]>
+
org.gnome.Characters.desktop
GNOME Characters
Character map application
@@ -20,6 +25,15 @@ cat <<\EOF > mf.appdata.xml
You can also browse characters by categories, such as
Punctuation, Pictures, etc.
+
+ Did you know that the copyright sign (©, U+00A9) can be written in HTML
+ as ©,
+ as ©,
+ or as ©?
+
+ Written by &author1;, &author2;, and &author3;.
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;
https://wiki.gnome.org/Design/Apps/CharacterMap
dueno_at_src.gnome.org
@@ -61,11 +75,34 @@ msgid ""
msgstr ""
"Vous pouvez aussi naviguer dans les caractères par catégories, comme par "
"Ponctuation, Images, etc."
+
+msgid ""
+"Did you know that the copyright sign (©, U+00A9) can be written in HTML as "
+"©, as ©, or as ©?"
+msgstr ""
+"Saviez-vous que le signe de copyright (©, U+00A9) peut être écrit en HTML "
+"comme ©, comme © ou comme © ?"
+
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr "Ãcrit par &author1;, &author2;, et &author3;."
+
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference ©, escaped character reference ©, entity references © "
+"&author1;"
+msgstr ""
+"Exposition d'échappements: operateur x&y, entités XML standard & \" ' & < >, "
+"caractère ©, caractère échappé ©, entités © &author1;"
EOF
cat <<\EOF > mf.appdata.xml.ok
-
+
+
+
+]>
+
org.gnome.Characters.desktop
GNOME Characters
Character map application
@@ -82,6 +119,19 @@ cat <<\EOF > mf.appdata.xml.ok
Punctuation, Pictures, etc.
Vous pouvez aussi naviguer dans les caractères par catégories, comme par Ponctuation, Images, etc.
+
+ Did you know that the copyright sign (©, U+00A9) can be written in HTML
+ as ©,
+ as ©,
+ or as ©?
+
+ Saviez-vous que le signe de copyright (©, U+00A9) peut être écrit en HTML comme ©, comme © ou comme © ?
+ Written by &author1;, &author2;, and &author3;.
+ Ãcrit par &author1;, &author2;, et &author3;.
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;
+ Exposition d'échappements: operateur x&y, entités XML standard & " ' & < >, caractère ©, caractère échappé ©, entités © &author1;
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;
+ Exposition d'échappements: operateur x&y, entités XML standard & " ' & < >, caractère ©, caractère échappé ©, entités © &author1;
https://wiki.gnome.org/Design/Apps/CharacterMap
dueno_at_src.gnome.org
diff --git a/gettext-tools/tests/msgfmt-xml-2 b/gettext-tools/tests/msgfmt-xml-2
index f8d51f164..10e136f94 100755
--- a/gettext-tools/tests/msgfmt-xml-2
+++ b/gettext-tools/tests/msgfmt-xml-2
@@ -5,7 +5,12 @@
cat <<\EOF > mf.appdata.xml
-
+
+
+
+]>
+
org.gnome.Characters.desktop
GNOME Characters
Character map application
@@ -20,6 +25,15 @@ cat <<\EOF > mf.appdata.xml
You can also browse characters by categories, such as
Punctuation, Pictures, etc.
+
+ Did you know that the copyright sign (©, U+00A9) can be written in HTML
+ as ©,
+ as ©,
+ or as ©?
+
+ Written by &author1;, &author2;, and &author3;.
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;
https://wiki.gnome.org/Design/Apps/CharacterMap
dueno_at_src.gnome.org
@@ -63,6 +77,24 @@ msgid ""
msgstr ""
"Vous pouvez aussi naviguer dans les caractères par catégories, comme par "
"Ponctuation, Images, etc."
+
+msgid ""
+"Did you know that the copyright sign (©, U+00A9) can be written in HTML as "
+"©, as ©, or as ©?"
+msgstr ""
+"Saviez-vous que le signe de copyright (©, U+00A9) peut être écrit en HTML "
+"comme ©, comme © ou comme © ?"
+
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr "Ãcrit par &author1;, &author2;, et &author3;."
+
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference ©, escaped character reference ©, entity references © "
+"&author1;"
+msgstr ""
+"Exposition d'échappements: operateur x&y, entités XML standard & \" ' & < >, "
+"caractère ©, caractère échappé ©, entités © &author1;"
EOF
cat <<\EOF > po/de.po
@@ -100,11 +132,35 @@ msgid ""
msgstr ""
"Sie können ebenfalls nach Kategorie suchen, wie z.B. nach Zeichensetzung oder "
"Bildern."
+
+msgid ""
+"Did you know that the copyright sign (©, U+00A9) can be written in HTML as "
+"©, as ©, or as ©?"
+msgstr ""
+"Wussten Sie, dass das Copyright-Zeichen (©, U+00A9) in HTML als "
+"©, als ©, oder als © "
+"geschrieben werden kann?"
+
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr "Geschrieben von &author1;, &author2; und &author3;."
+
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference ©, escaped character reference ©, entity references © "
+"&author1;"
+msgstr ""
+"Escape-Beispiele: Operator x&y, Standard-XML Entitäten & \" ' & < >, Zeichen "
+"©, escaptes Zeichen ©, Entitäten © &author1;"
EOF
cat <<\EOF > mf.appdata.xml.ok
-
+
+
+
+]>
+
org.gnome.Characters.desktop
GNOME Characters
Character map application
@@ -123,6 +179,23 @@ cat <<\EOF > mf.appdata.xml.ok
Vous pouvez aussi naviguer dans les caractères par catégories, comme par Ponctuation, Images, etc.
Sie können ebenfalls nach Kategorie suchen, wie z.B. nach Zeichensetzung oder Bildern.
+
+ Did you know that the copyright sign (©, U+00A9) can be written in HTML
+ as ©,
+ as ©,
+ or as ©?
+
+ Saviez-vous que le signe de copyright (©, U+00A9) peut être écrit en HTML comme ©, comme © ou comme © ?
+ Wussten Sie, dass das Copyright-Zeichen (©, U+00A9) in HTML als ©, als ©, oder als © geschrieben werden kann?
+ Written by &author1;, &author2;, and &author3;.
+ Ãcrit par &author1;, &author2;, et &author3;.
+ Geschrieben von &author1;, &author2; und &author3;.
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;
+ Exposition d'échappements: operateur x&y, entités XML standard & " ' & < >, caractère ©, caractère échappé ©, entités © &author1;
+ Escape-Beispiele: Operator x&y, Standard-XML Entitäten & " ' & < >, Zeichen ©, escaptes Zeichen ©, Entitäten © &author1;
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;
+ Exposition d'échappements: operateur x&y, entités XML standard & " ' & < >, caractère ©, caractère échappé ©, entités © &author1;
+ Escape-Beispiele: Operator x&y, Standard-XML Entitäten & " ' & < >, Zeichen ©, escaptes Zeichen ©, Entitäten © &author1;
https://wiki.gnome.org/Design/Apps/CharacterMap
dueno_at_src.gnome.org
@@ -131,7 +204,12 @@ EOF
cat <<\EOF > mf.appdata.xml.desired.ok
-
+
+
+
+]>
+
org.gnome.Characters.desktop
GNOME Characters
Character map application
@@ -148,6 +226,19 @@ cat <<\EOF > mf.appdata.xml.desired.ok
Punctuation, Pictures, etc.
Vous pouvez aussi naviguer dans les caractères par catégories, comme par Ponctuation, Images, etc.
+
+ Did you know that the copyright sign (©, U+00A9) can be written in HTML
+ as ©,
+ as ©,
+ or as ©?
+
+ Saviez-vous que le signe de copyright (©, U+00A9) peut être écrit en HTML comme ©, comme © ou comme © ?
+ Written by &author1;, &author2;, and &author3;.
+ Ãcrit par &author1;, &author2;, et &author3;.
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;
+ Exposition d'échappements: operateur x&y, entités XML standard & " ' & < >, caractère ©, caractère échappé ©, entités © &author1;
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;
+ Exposition d'échappements: operateur x&y, entités XML standard & " ' & < >, caractère ©, caractère échappé ©, entités © &author1;
https://wiki.gnome.org/Design/Apps/CharacterMap
dueno_at_src.gnome.org
diff --git a/gettext-tools/tests/xgettext-appdata-1 b/gettext-tools/tests/xgettext-appdata-1
index 7f68a5227..3c1ea5fff 100755
--- a/gettext-tools/tests/xgettext-appdata-1
+++ b/gettext-tools/tests/xgettext-appdata-1
@@ -1,7 +1,7 @@
#!/bin/sh
. "${srcdir=.}/init.sh"; path_prepend_ . ../src
-# Test of AppData support.
+# Test of AppData support: HTML markup.
cat <<\EOF > xg-gs-1-empty.appdata.xml
diff --git a/gettext-tools/tests/xgettext-appdata-2 b/gettext-tools/tests/xgettext-appdata-2
new file mode 100644
index 000000000..980c4a45d
--- /dev/null
+++ b/gettext-tools/tests/xgettext-appdata-2
@@ -0,0 +1,121 @@
+#!/bin/sh
+. "${srcdir=.}/init.sh"; path_prepend_ . ../src
+
+# Test of AppData support: escaping of XML entities.
+
+cat <<\EOF > xg-gs-2-empty.appdata.xml
+
+
+EOF
+
+: ${XGETTEXT=xgettext}
+${XGETTEXT} -o xg-gs-2.pot xg-gs-2-empty.appdata.xml 2>/dev/null
+test $? = 0 || {
+ echo "Skipping test: xgettext was built without AppData support"
+ Exit 77
+}
+
+cat <<\EOF > xg-gs-2.appdata.xml
+
+
+
+
+]>
+
+ org.gnome.Characters.desktop
+ GNOME Characters
+ Character map application
+ CC0
+
+
+ Characters is a simple utility application to find and insert
+ unusual characters. It allows you to quickly find the character
+ you are looking for by searching for keywords.
+
+
+ You can also browse characters by categories, such as
+ Punctuation, Pictures, etc.
+
+
+ Did you know that the copyright sign (©, U+00A9) can be written in HTML
+ as ©,
+ as ©,
+ or as ©?
+
+ Written by &author1;, &author2;, and &author3;.
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;, escaped entity reference ©
+ Escape gallery: operator x&y, standard XML entities & " ' & < >, character reference ©, escaped character reference ©, entity references © &author1;, escaped entity reference ©
+
+ https://wiki.gnome.org/Design/Apps/CharacterMap
+ dueno_at_src.gnome.org
+
+EOF
+
+: ${XGETTEXT=xgettext}
+${XGETTEXT} --add-comments -o xg-gs-2.tmp xg-gs-2.appdata.xml || Exit 1
+func_filter_POT_Creation_Date xg-gs-2.tmp xg-gs-2.pot
+
+cat <<\EOF > xg-gs-2.ok
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
+# This file is distributed under the same license as the PACKAGE package.
+# FIRST AUTHOR , YEAR.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: PACKAGE VERSION\n"
+"Report-Msgid-Bugs-To: \n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME \n"
+"Language-Team: LANGUAGE \n"
+"Language: \n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=UTF-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+
+#: xg-gs-2.appdata.xml:9
+msgid "GNOME Characters"
+msgstr ""
+
+#: xg-gs-2.appdata.xml:10
+msgid "Character map application"
+msgstr ""
+
+#: xg-gs-2.appdata.xml:13
+msgid ""
+"Characters is a simple utility application to find and insert unusual "
+"characters. It allows you to quickly find the character you are looking for "
+"by searching for keywords."
+msgstr ""
+
+#: xg-gs-2.appdata.xml:18
+msgid ""
+"You can also browse characters by categories, such as Punctuation, Pictures, "
+"etc."
+msgstr ""
+
+#: xg-gs-2.appdata.xml:22
+msgid ""
+"Did you know that the copyright sign (©, U+00A9) can be written in HTML as "
+"©, as ©, or as ©?"
+msgstr ""
+
+#: xg-gs-2.appdata.xml:28
+msgid "Written by &author1;, &author2;, and &author3;."
+msgstr ""
+
+#: xg-gs-2.appdata.xml:29 xg-gs-2.appdata.xml:30
+msgid ""
+"Escape gallery: operator x&y, standard XML entities & \" ' & < >, character "
+"reference ©, escaped character reference ©, entity references © "
+"&author1;, escaped entity reference ©"
+msgstr ""
+EOF
+
+: ${DIFF=diff}
+${DIFF} xg-gs-2.ok xg-gs-2.pot
+result=$?
+
+exit $result
diff --git a/gettext-tools/tests/xgettext-its-1 b/gettext-tools/tests/xgettext-its-1
index 22e9163ec..523dee490 100755
--- a/gettext-tools/tests/xgettext-its-1
+++ b/gettext-tools/tests/xgettext-its-1
@@ -201,7 +201,7 @@ EOF
cat <<\EOF >messages.ok
#. (itstool) path: message/p
#: messages.xml:8
-msgid "This is a test message &foo;><&\"\""
+msgid "This is a test message &foo;><&\"\""
msgstr ""
#. (itstool) path: message/p
@@ -214,15 +214,15 @@ msgstr ""
#: messages.xml:17
#, no-wrap
msgid ""
-" $ echo ' ' >> /dev/null\n"
-" $ cat < /dev/yes\n"
-" $ sleep 10 &\n"
+" $ echo ' ' >> /dev/null\n"
+" $ cat < /dev/yes\n"
+" $ sleep 10 &\n"
msgstr ""
#. This is a comment
#. (itstool) path: messages/message@comment
#: messages.xml:22
-msgid "This is a comment <>&""
+msgid "This is a comment <>&\""
msgstr ""
#. (itstool) path: message/p