From 4df2df7213075afefe4c28132dcc1f97b6e99aa5 Mon Sep 17 00:00:00 2001 From: Bruno Haible Date: Thu, 19 Sep 2024 23:26:28 +0200 Subject: [PATCH] doc: Expand section about preparing strings. * gettext-tools/doc/gettext.texi (Triggering): Mention a few more Gnulib modules. (Preparing Strings): Turn subheadings into subsections. (No string concatenation): Mention string concatenation operators and strings with embedded expressions in various programming languages. * NEWS: Mention it. --- NEWS | 4 + gettext-tools/doc/gettext.texi | 293 ++++++++++++++++++++++++++++----- 2 files changed, 252 insertions(+), 45 deletions(-) diff --git a/NEWS b/NEWS index 1c55ea408..bee082e1e 100644 --- a/NEWS +++ b/NEWS @@ -37,6 +37,10 @@ Version 0.23 - September 2024 now return the msgid untranslated. This is relevant for GNU systems, Linux with musl libc, FreeBSD, NetBSD, OpenBSD, Cygwin, and Android. +* Documentation: + - The section "Preparing Strings" now gives more advice how to deal with + string concatenation and strings with embedded expressions. + * xgettext: - Most of the diagnostics emitted by xgettext are now labelled as "warning" or "error". diff --git a/gettext-tools/doc/gettext.texi b/gettext-tools/doc/gettext.texi index baf5ca610..b28aae902 100644 --- a/gettext-tools/doc/gettext.texi +++ b/gettext-tools/doc/gettext.texi @@ -2021,9 +2021,14 @@ declared in the @code{} and @code{} standard headers. If this is not desirable in your application (for example in a compiler's parser), you can use a set of substitute functions which hardwire the C locale, -such as found in the modules @samp{c-ctype}, @samp{c-strcase}, -@samp{c-strcasestr}, @samp{c-strtod}, @samp{c-strtold} in the GNU gnulib -source distribution. +such as found in the modules +@samp{c-ctype}, +@samp{c-strcase}, +@samp{c-strcasestr}, +@samp{c-snprintf}, +@samp{c-strtod}, @samp{c-strtold}, +@samp{c-dtoastr}, @samp{c-ldtoastr} +in the GNU gnulib source distribution. It is also possible to switch the locale forth and back between the environment dependent locale and the C locale, but this approach is @@ -2069,7 +2074,18 @@ Avoid unusual markup and unusual control characters. @noindent Let's look at some examples of these guidelines. -@subheading Decent English style +@menu +* Decent English style:: +* Entire sentences:: +* Split at paragraphs:: +* No string concatenation:: +* No embedded URLs:: +* No custom format directives:: +* No unusual markup:: +@end menu + +@node Decent English style +@subsection Decent English style @cindex style Translatable strings should be in good English style. If slang language @@ -2098,7 +2114,8 @@ of the objects"? In both cases, adding more words to the message will help both the translator and the English speaking user. -@subheading Entire sentences +@node Entire sentences +@subsection Entire sentences @cindex sentences Translatable strings should be entire sentences. It is often not possible @@ -2167,9 +2184,10 @@ makes it easier for the translator to understand and translate both. On the other hand, if one of the two messages is a stereotypic one, occurring in other places as well, you will do a favour to the translator by not merging the two. (Identical messages occurring in several places are -combined by xgettext, so the translator has to handle them once only.) +combined by @code{xgettext}, so the translator has to handle them once only.) -@subheading Split at paragraphs +@node Split at paragraphs +@subsection Split at paragraphs @cindex paragraphs Translatable strings should be limited to one paragraph; don't let a @@ -2189,7 +2207,8 @@ such as the input options, the output options, and the informative output options. This will help every user to find the option he is looking for. -@subheading No string concatenation +@node No string concatenation +@subsection No string concatenation @cindex string concatenation @cindex concatenation of strings @@ -2214,6 +2233,221 @@ to use a format string: sprintf (s, "Replace %s with %s?", object1, object2); @end example +@subheading String concatenation operator + +In many programming languages, +a particular operator denotes string concatenation +at runtime (or possibly at compile time, if the compiler supports that). + +@cindex Shell, string concatenation +@cindex Python, string concatenation +@cindex Smalltalk, string concatenation +@cindex Java, string concatenation +@cindex C#, string concatenation +@cindex awk, string concatenation +@cindex Perl, string concatenation +@cindex PHP, string concatenation +@cindex Ruby, string concatenation +@cindex Lua, string concatenation +@cindex JavaScript, string concatenation +@cindex Vala, string concatenation +@itemize @bullet +@item +In C++, string concatenation of @code{std::string} objects +is denoted by the @samp{+} operator. +@c Reference: https://en.cppreference.com/w/cpp/string/basic_string/operator%2B +@item +In Shell, string concatenation is denoted by mere juxtaposition of strings. +@c Reference: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html +@item +In Python, string concatenation is denoted by the @samp{+} operator. +@c Reference: https://docs.python.org/3.12/reference/expressions.html#binary-arithmetic-operations +@item +In Smalltalk, string concatenation is denoted by the @samp{,} operator. +@c Reference: https://rmod-files.lille.inria.fr/FreeBooks/ByExample/14%20-%20Chapter%2012%20-%20Strings.pdf +@item +In Java, string concatenation is denoted by the @samp{+} operator. +@c Reference: https://docs.oracle.com/javase/specs/jls/se21/html/jls-15.html#jls-15.18.1 +@item +In C#, string concatenation is denoted by the @samp{+} operator. +@c Reference: https://learn.microsoft.com/en-us/dotnet/csharp/how-to/concatenate-multiple-strings +@item +In awk, string concatenation is denoted by mere juxtaposition of strings. +@c Reference: https://www.gnu.org/software/gawk/manual/html_node/Concatenation.html +@item +In Perl, string concatenation is denoted by the @samp{.} operator. +@c Reference: https://perldoc.perl.org/perlop#Additive-Operators +@item +In PHP, string concatenation is denoted by the @samp{.} operator. +@c Reference: https://www.php.net/manual/en/language.operators.string.php +@item +In Ruby, string concatenation is denoted by the @samp{+} operator. +@c Reference: https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Operators +@c (Ignore ruby-doc.org! It is hopelessly outdated.) +@item +In Lua, string concatenation is denoted by the @samp{..} operator. +@c Reference: https://www.lua.org/pil/3.4.html +@item +In JavaScript, string concatenation is denoted by the @samp{+} operator. +@c Reference: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Addition +@item +In Vala, string concatenation is denoted by the @samp{+} operator. +@c Reference: https://docs.vala.dev/tutorials/programming-language/main/02-00-basics/02-05-operators.html +@end itemize + +So, for example, in Java, you would change + +@example +System.out.println("Replace "+object1+" with "+object2+"?"); +@end example + +@noindent +into a statement involving a format string: + +@example +System.out.println( + MessageFormat.format("Replace @{0@} with @{1@}?", + new Object[] @{ object1, object2 @})); +@end example + +@noindent +Similarly, in C#, you would change + +@example +Console.WriteLine("Replace "+object1+" with "+object2+"?"); +@end example + +@noindent +into a statement involving a format string: + +@example +Console.WriteLine( + String.Format("Replace @{0@} with @{1@}?", object1, object2)); +@end example + +@subheading Strings with embedded expressions + +In some programming languages, +it is possible to have strings with embedded expressions. +The expressions can refer to variables of the program. +The value of such an expression is converted to a string +and inserted in place of the expression; +but no formatting function is called. + +@cindex Shell, strings with embedded expressions +@cindex Python, strings with embedded expressions +@cindex C#, strings with embedded expressions +@cindex Tcl, strings with embedded expressions +@cindex Perl, strings with embedded expressions +@cindex PHP, strings with embedded expressions +@cindex Ruby, strings with embedded expressions +@cindex JavaScript, strings with embedded expressions +@itemize @bullet +@item +In Shell language, double-quoted strings can contain +references to variables, along with default values and string operations. +Such as @code{"Hello, $name!"} or @code{"Hello, $@{name@}!"}. +@c Reference: https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V3_chap02.html#tag_19_02_03 +@item +In Python, @emph{f-strings} can contain expressions. +Such as @code{f"Hello, @{name@}!"}. +@c Reference: https://docs.python.org/3.12/reference/lexical_analysis.html#formatted-string-literals +@c @item +@c In Java, since Java 21, @emph{string templates} can contain expressions. +@c Such as @code{STR."Hello, \@{name\@}!"}. +@c Reference: https://openjdk.org/jeps/430 https://openjdk.org/jeps/459 +@c Withdrawn: https://mail.openjdk.org/pipermail/amber-spec-experts/2024-April/004106.html +@item +In C#, since C# 6.0, @emph{interpolated strings} can contain expressions. +Such as @code{$"Hello, @{name@}!"}. +@c Reference: https://learn.microsoft.com/en-us/dotnet/csharp/how-to/concatenate-multiple-strings +@item +In Tcl, strings are subject to @emph{variable substitution}. +Such as @code{"Hello, $name!"}. +@c Reference: https://wiki.tcl-lang.org/page/Dodekalogue +@c Reference: https://wiki.tcl-lang.org/page/Variable+Substitution +@item +In Perl, @emph{interpolated strings} can contain expressions. +Such as @code{"Hello, $name!"}. +@c Reference: https://perldoc.perl.org/perlintro#Basic-syntax-overview +@item +In PHP, string literals are subject to @emph{variable parsing}. +Such as @code{"Hello, $name!"}. +@c Reference: https://www.php.net/manual/en/language.variables.basics.php +@item +In Ruby, @emph{interpolated strings} can contain expressions. +Such as @code{"Hello, #@{name@}!"}. +@c Reference: https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals#Interpolation +@c (Ignore ruby-doc.org! It is hopelessly outdated.) +@item +In JavaScript, since ES6, @emph{template literals} can contain expressions. +Such as @code{`Hello, $@{name@}!`}. +@c Reference: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals +@end itemize + +These cases are effectively string concatenation as well, +just with a different syntax. + +So, for example, in Python, you would change + +@example +print (f'Replace @{object1.name@} with @{object2.name@}?') +@end example + +@noindent +into a statement involving a format string: + +@example +print ('Replace %(name1)s with %(name2)s?' + % @{ 'name1': object1.name, 'name2': object2.name @}) +@end example + +@noindent +or equivalently +@example +print ('Replace @{name1@} with @{name2@}?' + .format(name1 = object1.name, name2 = object2.name)) +@end example + +And in JavaScript, you would change + +@example +print (`Replace $@{object1.name@} with $@{object2.name@}?`) +@end example + +@noindent +into a statement involving a format string: + +@example +print ('Replace %s with %s?'.format(object1.name, object2.name)) +@end example + +@subheading Format strings with embedded named references + +Format strings with embedded named references are different: +They are suitable for internationalization, because it is possible +to insert a call to the @code{gettext} function (that will return a +translated format string) @emph{before} the argument values are +inserted in place of the placeholders. + +The format string types that allow embedded named references are: + +@itemize @bullet +@item +@ref{sh-format, Shell format strings}. +@item +In Python, those @ref{python-format, Python format strings} +that take a dictionary as argument, +and the @ref{python-format, Python brace format strings}. +@item +In Ruby, those @ref{ruby-format, Ruby format strings} +that take a hash table as argument. +@item +In Perl, the @ref{perl-format, Perl brace format strings}. +@end itemize + +@subheading The @code{} macros + @cindex @code{inttypes.h} A similar case is compile time concatenation of strings. The ISO C 99 include file @code{} contains a macro @code{PRId64} that @@ -2257,41 +2491,8 @@ of 100 is safe, because all available hardware integer types are limited to 128 bits, and to print a 128 bit integer one needs at most 54 characters, regardless whether in decimal, octal or hexadecimal. -@cindex Java, string concatenation -@cindex C#, string concatenation -All this applies to other programming languages as well. For example, in -Java and C#, string concatenation is very frequently used, because it is a -compiler built-in operator. Like in C, in Java, you would change - -@example -System.out.println("Replace "+object1+" with "+object2+"?"); -@end example - -@noindent -into a statement involving a format string: - -@example -System.out.println( - MessageFormat.format("Replace @{0@} with @{1@}?", - new Object[] @{ object1, object2 @})); -@end example - -@noindent -Similarly, in C#, you would change - -@example -Console.WriteLine("Replace "+object1+" with "+object2+"?"); -@end example - -@noindent -into a statement involving a format string: - -@example -Console.WriteLine( - String.Format("Replace @{0@} with @{1@}?", object1, object2)); -@end example - -@subheading No embedded URLs +@node No embedded URLs +@subsection No embedded URLs It is good to not embed URLs in translatable strings, for several reasons: @itemize @bullet @@ -2322,7 +2523,8 @@ fprintf (stream, _("GNU GPL version 3 <%s>\n"), "https://gnu.org/licenses/gpl.html"); @end smallexample -@subheading No programmer-defined format string directives +@node No custom format directives +@subsection No programmer-defined format string directives The GNU C Library's @code{} facility and the C++ standard library's @code{} header file make it possible for the programmer to define their own format string directives. However, such format directives cannot be used in translatable strings, for two reasons: @itemize @bullet @@ -2365,7 +2567,8 @@ string tmp = format ("@{:#$#@}", data); cout << format (_("The contents is: @{@}"), tmp); @end smallexample -@subheading No unusual markup +@node No unusual markup +@subsection No unusual markup @cindex markup @cindex control characters -- 2.47.3