From: Bruno Haible Date: Mon, 23 Jun 2003 09:36:09 +0000 (+0000) Subject: Support for Perl format strings (both printf and bracketed format strings). X-Git-Tag: v0.13~422 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=2ddf58b2c324d403f00bbe57924ffbb981fc0c9a;p=thirdparty%2Fgettext.git Support for Perl format strings (both printf and bracketed format strings). --- diff --git a/gettext-tools/doc/ChangeLog b/gettext-tools/doc/ChangeLog index 2bba4138e..7fc161868 100644 --- a/gettext-tools/doc/ChangeLog +++ b/gettext-tools/doc/ChangeLog @@ -1,3 +1,9 @@ +2003-06-21 Guido Flohr + Bruno Haible + + * gettext.texi (perl-format): Describe two kinds of format strings. + (Perl): Add more info. + 2003-06-19 Bruno Haible * gettext.texi (Sources): Recommend to use also for printf. diff --git a/gettext-tools/doc/gettext.texi b/gettext-tools/doc/gettext.texi index 07a1099a9..2b63384fd 100644 --- a/gettext-tools/doc/gettext.texi +++ b/gettext-tools/doc/gettext.texi @@ -7323,9 +7323,19 @@ Tcl format strings are described in the @file{format.n} manual page, @node perl-format, php-format, tcl-format, Translators for other Languages @subsection Perl Format Strings -Perl format strings are described in the @file{Locale::TextDomain} -manual page, or in -@uref{http://let.imperia.org/howto/en/perl-i18n/}. +There are two kinds format strings in Perl: those acceptable to the +Perl built-in function @code{printf}, labelled as @samp{perl-format}, +and those acceptable to the @code{libintl-perl} function @code{__x}, +labelled as @samp{perl-bracket-format}. + +Perl @code{printf} format strings are described in the @code{sprintf} +section of @samp{man perlfunc}. + +Perl bracketed format strings are described in the +@file{Locale::TextDomain(3pm)} manual page of the CPAN package +libintl-perl. In brief, Perl format uses placeholders put between +brackets (@samp{[} and @samp{]}). The placeholder must have the syntax +of simple identifiers. @node php-format, , perl-format, Translators for other Languages @subsection PHP Format Strings @@ -7366,7 +7376,7 @@ that language, and to combine the resulting files using @code{msgcat}. @c on SourceForge/Freshmeat, as of February 2002. Those supported by gettext @c are marked with a star. @c C 3580 * -@c Perl 1911 +@c Perl 1911 * @c C++ 1379 * @c Java 1200 * @c PHP 1051 * @@ -8343,14 +8353,16 @@ http://www.cpan.org/). emulate @item Extractor -@code{xgettext -k__ -k\$__ -k%__ -k__x -kN__ -k} +@code{xgettext -k__ -k\$__ -k%__ -k__x -k__n -k__nx -k__xn -kN__ -k} @item Formatting with positions -@code{__x("error opening '[file]': [err]", file => $file, err => $!);} +Both kinds of format strings support formatting with positions. +@*@code{printf "%2\$d %1\$d", ...} (requires Perl 5.8.0 or newer) +@*@code{__expand("[new] replaces [old]", old => $oldvalue, new => $newvalue)} @item Portability -The @code{libintl-perl} package is platform-independent but does -not belong to the Perl core. The programmer is responsible for +The @code{libintl-perl} package is platform independent but is not +part of the Perl core. The programmer is responsible for providing a dummy implementation of the required functions if the package is not installed on the target system. @@ -8358,8 +8370,8 @@ package is not installed on the target system. --- @item Documentation -In-depth documentation can be found at -http://let.imperia.org/howto/en/perl-18n/. +Included in @code{libintl-perl}, available on CPAN +(http://www.cpan.org/). @end table @@ -8369,7 +8381,7 @@ The @code{xgettext} parser backend for Perl differs significantly from the parser backends for other programming languages, just as Perl itself differs significantly from other programming languages. The Perl parser backend offers many more string marking facilities than -the other backends but it also has some Perl-specific limitations, the +the other backends but it also has some Perl specific limitations, the worst probably being its imperfectness. @menu @@ -8409,7 +8421,7 @@ In this context, the string @code{gettext} looks more like a file handle. But not necessarily: @example -use Locale::Messages; +use Locale::Messages qw (:libintl_h); open gettext ">testfile" or die; print gettext "Hello world!"; @end example @@ -8485,8 +8497,9 @@ probably parsed incorrectly. If you find that @code{xgettext} fails to extract strings from portions of your sources, you should therefore look out for slashes and/or question marks preceding these sections. You may have come -across a bug in the @code{xgettext}'s Perl parser (and of course you -should report that bug). +across a bug in @code{xgettext}'s Perl parser (and of course you +should report that bug). In the meantime you should consider to +reformulate your code in a manner less challenging to @code{xgettext}. @node Default Keywords, Special Keywords, General Problems, Perl @subsubsection Which keywords will xgettext look for? @@ -8567,7 +8580,7 @@ print $gettext@{Error@}; The exact rule is: You can omit the surrounding quotes, when the hash key is a valid C (!) identifier, i. e. when it starts with an -underscore or a ASCII letter and is followed by an arbitrary number +underscore or an ASCII letter and is followed by an arbitrary number of underscores, ASCII letters or digits. Other Unicode characters are @emph{not} allowed, regardless of the @code{use utf8} pragma. @@ -8625,8 +8638,11 @@ The operator @code{qx} is fully supported. You can use arbitrary delimiters, including the four bracketing delimiters (round, angle, square, curly) that nest. -Use this feature with care! If you use it, you lose control of the -external code executed from within your Perl script to your translator! +The example is actually a useless use of @code{gettext}. It will +invoke the @code{gettext} function on the output of the command +specified with the @code{qx} operator. The feature was included +in order to make the interface consistent (the parser will extract +all strings and quote-like expressions). @item @strong{here documents} @group @@ -8674,7 +8690,7 @@ executed). The interpolation is performed by Perl before the string argument is passed to @code{gettext()} and will therefore depend on the name of the script which can only be determined at runtime. Consequently, it is almost impossible that a translation can be looked -up at runtime (except if by accident the interpolated string is found +up at runtime (except if, by accident, the interpolated string is found in the message catalog). The @code{xgettext} program will therefore terminate parsing with a fatal @@ -8781,9 +8797,12 @@ for (i = 0; i < 100; ++i) @{ EOF @end example -If you exaggerate with constructs like the above, you will run the -risk that the translators of your package will look out for a less -challenging project. You should consider an alternative expression: +The parser will extract the entire here document, and it will appear +entirely in the resulting PO file, including the JavaScript snippet +embedded in the HTML code. If you exaggerate with constructs like +the above, you will run the risk that the translators of your package +will look out for a less challenging project. You should consider an +alternative expression here: @example print <@{"the earth"@}@}; @end example -The quoted string is recognized as an argument to @code{xgettext} in +The @code{qq}-quoted string is recognized as an argument to @code{xgettext} in the first place, and checked for invalid variable interpolation. The -dollar sign will therefore terminate the parser with an ``invalid -interpolation'' error. +dollar sign of hash-dereferencing will therefore terminate the parser +with an ``invalid interpolation'' error. It is valid to interpolate hash lookups in regular expressions: @@ -8889,6 +8908,11 @@ into one long string at compile time, and so is @code{xgettext}. You will only find one long message in the resulting POT file. +Note that the future Perl 6 will probably use the underscore +(@samp{_}) as the string concatenation operator, and the dot +(@samp{.}) for dereferencing. This new syntax is not yet supported by +@code{xgettext}. + If embedded newline characters are not an issue, or even desired, you may also insert newline characters inside quoted strings wherever you feel like it: @@ -8986,6 +9010,54 @@ Maybe some of these missing features will be implemented in future versions, but since you can always make do without them at minimal effort, these todos have very low priority. +A nasty problem are bracketed format strings that already contain brackets +as part of the normal text, for example the usage strings typically +encountered in programs: + +@example +die "usage: $0 [OPTIONS] FILENAME...\n"; +@end example + +If you want to internationalize this code with Perl bracketed format strings, +you will run into a problem: + +@example +die __x ("usage: [program] [OPTIONS] FILENAME...\n", program => $0); +@end example + +Whereas @code{@samp{[program]}} is a placeholder, @code{@samp{[OPTIONS]}} +is not and should probably be translated. Yet, there is no way to teach +the Perl parser in @code{xgettext} to recognize the first one, and leave +the other one alone. + +There are two possible work-arounds for this problem. If you are +sure that your program will run under Perl 5.8.0 or newer (these +Perl versions handle positional parameters in @code{printf()}) or +if you are sure that the translator will not have to reorder the arguments +in her translation -- for example if you have only one bracketed placeholder +in your string, or if it describes a syntax, like in this one --, you can +mark the string as @code{no-perl-bracket-format} and use @code{printf()}: + +@example +# xgettext: no-perl-bracket-format +die sprintf ("usage: %s [OPTIONS] FILENAME...\n", $0); +@end example + +If you want to use the more portable Perl bracket format, you will have to do +put placeholders in place of the literal brackets: + +@example +die __x ("usage: [program] [@{]OPTIONS[@}] FILENAME...\n", + program => $0, '@{' => '[', '@}' => ']'); +@end example + +Perl bracketed format strings know no escaping mechanism. No matter how this +escaping mechanism looked like, it would either give the programmer a +hard time, make translating Perl bracketed format strings heavy-going, or +result in a performance penalty at runtime, when the format directives +get executed. Most of the time you will happily get along with +@code{printf()} for this special case. + @node PHP, Pike, Perl, List of Programming Languages @subsection PHP Hypertext Preprocessor @cindex PHP diff --git a/gettext-tools/src/ChangeLog b/gettext-tools/src/ChangeLog index c5c939f8f..118a08e50 100644 --- a/gettext-tools/src/ChangeLog +++ b/gettext-tools/src/ChangeLog @@ -1,3 +1,25 @@ +2003-06-21 Bruno Haible + + * format-perl.c: Complete rewrite. + * format-perl-bracket.c: New file. + * message.h (format_perl_bracket): New enum value. + (NFORMATS): Increment. + * format.h (formatstring_perl_bracket): New declaration. + * format.c (formatstring_parsers): Add entry for perl_bracket. + * message.c (format_language, format_language_pretty): Likewise. + * x-perl.h (SCANNERS_PERL): Use formatstring_perl and + formatstring_perl_bracket. + * Makefile.am (FORMAT_SOURCE): Add format-perl-bracket.c. + * Makefile.msvc (OBJECTS): Add format-perl.obj,format-perl-bracket.obj. + (xgettext_OBJECTS): Add x-perl.obj. + (format-perl.obj, format-perl-bracket.obj, x-perl.obj): New rules. + * Makefile.vms (OBJECTS): Add format-perl.obj, format-perl-bracket.obj. + (xgettext_OBJECTS): Add x-perl.obj. + (format-perl.obj, format-perl-bracket.obj, x-perl.obj): New rules. + + * x-perl.c (extract_quotelike_pass3): Don't give an invalid + interpolation error for a backslashed dollar sign. + 2003-06-21 Bruno Haible * x-perl.c (extract_quotelike_pass3): Fix handling of double backslash. diff --git a/gettext-tools/src/Makefile.am b/gettext-tools/src/Makefile.am index d3a3fe327..8ef0272eb 100644 --- a/gettext-tools/src/Makefile.am +++ b/gettext-tools/src/Makefile.am @@ -93,7 +93,7 @@ read-properties.c open-po.c dir-list.c str-list.c FORMAT_SOURCE = format.c format-invalid.h \ format-c.c format-python.c format-lisp.c format-elisp.c format-librep.c \ format-java.c format-awk.c format-pascal.c format-ycp.c format-tcl.c \ -format-perl.c format-php.c +format-perl.c format-perl-bracket.c format-php.c # libgettextsrc contains all code that is needed by at least two programs. libgettextsrc_la_SOURCES = \ diff --git a/gettext-tools/src/Makefile.msvc b/gettext-tools/src/Makefile.msvc index a1cf5bc28..8f5749b9a 100644 --- a/gettext-tools/src/Makefile.msvc +++ b/gettext-tools/src/Makefile.msvc @@ -139,13 +139,15 @@ OBJECTS = \ format-pascal.obj \ format-ycp.obj \ format-tcl.obj \ + format-perl.obj \ + format-perl-bracket.obj \ format-php.obj msgcmp_OBJECTS = msgcmp.obj msgfmt_OBJECTS = msgfmt.obj write-mo.obj write-java.obj write-tcl.obj plural-eval.obj msgmerge_OBJECTS = msgmerge.obj msgunfmt_OBJECTS = msgunfmt.obj read-mo.obj read-java.obj read-tcl.obj -xgettext_OBJECTS = xgettext.obj x-c.obj x-po.obj x-python.obj x-lisp.obj x-elisp.obj x-librep.obj x-smalltalk.obj x-java.obj x-awk.obj x-ycp.obj x-tcl.obj x-php.obj x-rst.obj x-glade.obj +xgettext_OBJECTS = xgettext.obj x-c.obj x-po.obj x-python.obj x-lisp.obj x-elisp.obj x-librep.obj x-smalltalk.obj x-java.obj x-awk.obj x-ycp.obj x-tcl.obj x-perl.obj x-php.obj x-rst.obj x-glade.obj msgattrib_OBJECTS = msgattrib.obj msgcat_OBJECTS = msgcat.obj msgcomm_OBJECTS = msgcomm.obj @@ -266,6 +268,12 @@ format-ycp.obj : format-ycp.c format-tcl.obj : format-tcl.c $(CC) $(INCLUDES) $(CFLAGS) $(PICFLAGS) -c format-tcl.c +format-perl.obj : format-perl.c + $(CC) $(INCLUDES) $(CFLAGS) $(PICFLAGS) -c format-perl.c + +format-perl-bracket.obj : format-perl-bracket.c + $(CC) $(INCLUDES) $(CFLAGS) $(PICFLAGS) -c format-perl-bracket.c + format-php.obj : format-php.c $(CC) $(INCLUDES) $(CFLAGS) $(PICFLAGS) -c format-php.c @@ -357,6 +365,9 @@ x-ycp.obj : x-ycp.c x-tcl.obj : x-tcl.c $(CC) $(INCLUDES) $(CFLAGS) -c x-tcl.c +x-perl.obj : x-perl.c + $(CC) $(INCLUDES) $(CFLAGS) -c x-perl.c + x-php.obj : x-php.c $(CC) $(INCLUDES) $(CFLAGS) -c x-php.c diff --git a/gettext-tools/src/Makefile.vms b/gettext-tools/src/Makefile.vms index 88500b962..97a0caad8 100644 --- a/gettext-tools/src/Makefile.vms +++ b/gettext-tools/src/Makefile.vms @@ -85,13 +85,15 @@ OBJECTS = \ format-pascal.obj, \ format-ycp.obj, \ format-tcl.obj, \ + format-perl.obj, \ + format-perl-bracket.obj, \ format-php.obj msgcmp_OBJECTS = msgcmp.obj msgfmt_OBJECTS = msgfmt.obj, write-mo.obj, write-java.obj, write-tcl.obj, plural-eval.obj msgmerge_OBJECTS = msgmerge.obj msgunfmt_OBJECTS = msgunfmt.obj, read-mo.obj, read-java.obj, read-tcl.obj -xgettext_OBJECTS = xgettext.obj, x-c.obj, x-po.obj, x-python.obj, x-lisp.obj, x-elisp.obj, x-librep.obj, x-smalltalk.obj, x-java.obj, x-awk.obj, x-ycp.obj, x-tcl.obj, x-php.obj, x-rst.obj, x-glade.obj +xgettext_OBJECTS = xgettext.obj, x-c.obj, x-po.obj, x-python.obj, x-lisp.obj, x-elisp.obj, x-librep.obj, x-smalltalk.obj, x-java.obj, x-awk.obj, x-ycp.obj, x-tcl.obj, x-perl.obj, x-php.obj, x-rst.obj, x-glade.obj msgattrib_OBJECTS = msgattrib.obj msgcat_OBJECTS = msgcat.obj msgcomm_OBJECTS = msgcomm.obj @@ -210,6 +212,12 @@ format-ycp.obj : format-ycp.c format-tcl.obj : format-tcl.c $(CC) $(INCLUDES) $(CFLAGS) /define=($(DEFS)) format-tcl.c +format-perl.obj : format-perl.c + $(CC) $(INCLUDES) $(CFLAGS) /define=($(DEFS)) format-perl.c + +format-perl-bracket.obj : format-perl-bracket.c + $(CC) $(INCLUDES) $(CFLAGS) /define=($(DEFS)) format-perl-bracket.c + format-php.obj : format-php.c $(CC) $(INCLUDES) $(CFLAGS) /define=($(DEFS)) format-php.c @@ -287,6 +295,9 @@ x-ycp.obj : x-ycp.c x-tcl.obj : x-tcl.c $(CC) $(INCLUDES) $(CFLAGS) /define=($(DEFS)) x-tcl.c +x-perl.obj : x-perl.c + $(CC) $(INCLUDES) $(CFLAGS) /define=($(DEFS)) x-perl.c + x-php.obj : x-php.c $(CC) $(INCLUDES) $(CFLAGS) /define=($(DEFS)) x-php.c diff --git a/gettext-tools/src/format-perl-bracket.c b/gettext-tools/src/format-perl-bracket.c new file mode 100644 index 000000000..8c3991148 --- /dev/null +++ b/gettext-tools/src/format-perl-bracket.c @@ -0,0 +1,310 @@ +/* Perl bracketed format strings. + Copyright (C) 2003 Free Software Foundation, Inc. + Written by Bruno Haible , 2003. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software Foundation, + Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ + +#ifdef HAVE_CONFIG_H +# include +#endif + +#include +#include +#include + +#include "format.h" +#include "xmalloc.h" +#include "error.h" +#include "progname.h" +#include "gettext.h" + +#define _(str) gettext (str) + +/* Perl bracketed format strings are supported by Guido Flohr's libintl-perl + package, more precisely by the __expand and __x functions therein. + A format string directive here consists of + - an opening bracket '[', + - an identifier [_A-Za-z][_0-9A-Za-z]*, + - a closing bracket ']'. + */ + +struct named_arg +{ + char *name; +}; + +struct spec +{ + unsigned int directives; + unsigned int named_arg_count; + unsigned int allocated; + struct named_arg *named; +}; + +/* Locale independent test for a decimal digit. + Argument can be 'char' or 'unsigned char'. (Whereas the argument of + isdigit must be an 'unsigned char'.) */ +#undef isdigit +#define isdigit(c) ((unsigned int) ((c) - '0') < 10) + + +static int +named_arg_compare (const void *p1, const void *p2) +{ + return strcmp (((const struct named_arg *) p1)->name, + ((const struct named_arg *) p2)->name); +} + +static void * +format_parse (const char *format, char **invalid_reason) +{ + struct spec spec; + struct spec *result; + + spec.directives = 0; + spec.named_arg_count = 0; + spec.allocated = 0; + spec.named = NULL; + + for (; *format != '\0';) + if (*format++ == '[') + { + const char *f = format; + char c; + + c = *f; + if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '_') + { + do + c = *++f; + while ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '_' + || (c >= '0' && c <= '9')); + if (c == ']') + { + /* A directive. */ + char *name; + const char *name_start = format; + const char *name_end = f; + size_t n = name_end - name_start; + + name = (char *) xmalloc (n + 1); + memcpy (name, name_start, n); + name[n] = '\0'; + + spec.directives++; + + if (spec.allocated == spec.named_arg_count) + { + spec.allocated = 2 * spec.allocated + 1; + spec.named = (struct named_arg *) xrealloc (spec.named, spec.allocated * sizeof (struct named_arg)); + } + spec.named[spec.named_arg_count].name = name; + spec.named_arg_count++; + + format = ++f; + } + } + } + + /* Sort the named argument array, and eliminate duplicates. */ + if (spec.named_arg_count > 1) + { + unsigned int i, j; + + qsort (spec.named, spec.named_arg_count, sizeof (struct named_arg), + named_arg_compare); + + /* Remove duplicates: Copy from i to j, keeping 0 <= j <= i. */ + for (i = j = 0; i < spec.named_arg_count; i++) + if (j > 0 && strcmp (spec.named[i].name, spec.named[j-1].name) == 0) + free (spec.named[i].name); + else + { + if (j < i) + spec.named[j].name = spec.named[i].name; + j++; + } + spec.named_arg_count = j; + } + + result = (struct spec *) xmalloc (sizeof (struct spec)); + *result = spec; + return result; +} + +static void +format_free (void *descr) +{ + struct spec *spec = (struct spec *) descr; + + if (spec->named != NULL) + { + unsigned int i; + for (i = 0; i < spec->named_arg_count; i++) + free (spec->named[i].name); + free (spec->named); + } + free (spec); +} + +static int +format_get_number_of_directives (void *descr) +{ + struct spec *spec = (struct spec *) descr; + + return spec->directives; +} + +static bool +format_check (const lex_pos_ty *pos, void *msgid_descr, void *msgstr_descr, + bool equality, bool noisy, const char *pretty_msgstr) +{ + struct spec *spec1 = (struct spec *) msgid_descr; + struct spec *spec2 = (struct spec *) msgstr_descr; + bool err = false; + + if (spec1->named_arg_count + spec2->named_arg_count > 0) + { + unsigned int i, j; + unsigned int n1 = spec1->named_arg_count; + unsigned int n2 = spec2->named_arg_count; + + /* Check the argument names are the same. + Both arrays are sorted. We search for the first difference. */ + for (i = 0, j = 0; i < n1 || j < n2; ) + { + int cmp = (i >= n1 ? 1 : + j >= n2 ? -1 : + strcmp (spec1->named[i].name, spec2->named[j].name)); + + if (cmp > 0) + { + if (noisy) + { + error_with_progname = false; + error_at_line (0, 0, pos->file_name, pos->line_number, + _("a format specification for argument '%s', as in '%s', doesn't exist in 'msgid'"), + spec2->named[j].name, pretty_msgstr); + error_with_progname = true; + } + err = true; + break; + } + else if (cmp < 0) + { + if (equality) + { + if (noisy) + { + error_with_progname = false; + error_at_line (0, 0, pos->file_name, pos->line_number, + _("a format specification for argument '%s' doesn't exist in '%s'"), + spec1->named[i].name, pretty_msgstr); + error_with_progname = true; + } + err = true; + break; + } + else + i++; + } + else + j++, i++; + } + } + + return err; +} + + +struct formatstring_parser formatstring_perl_bracket = +{ + format_parse, + format_free, + format_get_number_of_directives, + format_check +}; + + +#ifdef TEST + +/* Test program: Print the argument list specification returned by + format_parse for strings read from standard input. */ + +#include +#include "getline.h" + +static void +format_print (void *descr) +{ + struct spec *spec = (struct spec *) descr; + unsigned int i; + + if (spec == NULL) + { + printf ("INVALID"); + return; + } + + printf ("{"); + for (i = 0; i < spec->named_arg_count; i++) + { + if (i > 0) + printf (", "); + printf ("'%s'", spec->named[i].name); + } + printf ("}"); +} + +int +main () +{ + for (;;) + { + char *line = NULL; + size_t line_size = 0; + int line_len; + char *invalid_reason; + void *descr; + + line_len = getline (&line, &line_size, stdin); + if (line_len < 0) + break; + if (line_len > 0 && line[line_len - 1] == '\n') + line[--line_len] = '\0'; + + invalid_reason = NULL; + descr = format_parse (line, &invalid_reason); + + format_print (descr); + printf ("\n"); + if (descr == NULL) + printf ("%s\n", invalid_reason); + + free (invalid_reason); + free (line); + } + + return 0; +} + +/* + * For Emacs M-x compile + * Local Variables: + * compile-command: "/bin/sh ../libtool --mode=link gcc -o a.out -static -O -g -Wall -I.. -I../lib -I../intl -DHAVE_CONFIG_H -DTEST format-perl-bracket.c ../lib/libgettextlib.la" + * End: + */ + +#endif /* TEST */ diff --git a/gettext-tools/src/format-perl.c b/gettext-tools/src/format-perl.c index ecdaf7a0a..6ed1ea577 100644 --- a/gettext-tools/src/format-perl.c +++ b/gettext-tools/src/format-perl.c @@ -1,6 +1,6 @@ /* Perl format strings. - Copyright (C) 2002-2003 Free Software Foundation, Inc. - Written by Guido Flohr , 2003. + Copyright (C) 2003 Free Software Foundation, Inc. + Written by Bruno Haible , 2003. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -19,133 +19,523 @@ #ifdef HAVE_CONFIG_H # include #endif -#include #include #include -#include #include "format.h" +#include "c-ctype.h" #include "xmalloc.h" +#include "xerror.h" +#include "format-invalid.h" #include "error.h" #include "progname.h" -#include "hash.h" #include "gettext.h" #define _(str) gettext (str) -/* Perl format strings are currently quite simple. They consist of - place-holders embedded in the string. - - messageFormatPattern := string ("[" messageFormatElement "]" string)* +/* Perl format strings are implemented in function Perl_sv_vcatpvfn in + perl-5.8.0/sv.c. + A directive + - starts with '%' or '%m$' where m is a positive integer starting with a + nonzero digit, + - is optionally followed by any of the characters '#', '0', '-', ' ', '+', + each of which acts as a flag, + - is optionally followed by a vector specification: 'v' or '*v' (reads an + argument) or '*m$v' where m is a positive integer starting with a nonzero + digit, + - is optionally followed by a width specification: '*' (reads an argument) + or '*m$' where m is a positive integer starting with a nonzero digit or + a nonempty digit sequence starting with a nonzero digit, + - is optionally followed by '.' and a precision specification: '*' (reads + an argument) or '*m$' where m is a positive integer starting with a + nonzero digit or a digit sequence, + - is optionally followed by a size specifier, one of 'h' 'l' 'll' 'L' 'q' + 'V' 'I32' 'I64' 'I', + - is finished by a specifier + - '%', that needs no argument, + - 'c', that needs a small integer argument, + - 's', that needs a string argument, + - '_', that needs a scalar vector argument, + - 'p', that needs a pointer argument, + - 'i', 'd', 'D', that need an integer argument, + - 'u', 'U', 'b', 'o', 'O', 'x', 'X', that need an unsigned integer + argument, + - 'e', 'E', 'f', 'F', 'g', 'G', that need a floating-point argument, + - 'n', that needs a pointer to integer. + So there can be numbered argument specifications: + - '%m$' for the format string, + - '*m$v' for the vector, + - '*m$' for the width, + - '.*m$' for the precision. + Numbered and unnumbered argument specifications can be used in the same + string. The effect of '%m$' is to take argument number m, without affecting + the current argument number. The current argument number is incremented + after processing a directive with an unnumbered argument specification. + */ - messageFormatElement := [_A-Za-z][_0-9A-Za-z]+ +enum format_arg_type +{ + FAT_NONE = 0, + /* Basic types */ + FAT_INTEGER = 1, + FAT_DOUBLE = 2, + FAT_CHAR = 3, + FAT_STRING = 4, + FAT_SCALAR_VECTOR = 5, + FAT_POINTER = 6, + FAT_COUNT_POINTER = 7, + /* Flags */ + FAT_UNSIGNED = 1 << 3, + FAT_SIZE_SHORT = 1 << 4, + FAT_SIZE_V = 2 << 4, + FAT_SIZE_PTR = 3 << 4, + FAT_SIZE_LONG = 4 << 4, + FAT_SIZE_LONGLONG = 5 << 4, + /* Bitmasks */ + FAT_SIZE_MASK = (FAT_SIZE_SHORT | FAT_SIZE_V | FAT_SIZE_PTR + | FAT_SIZE_LONG | FAT_SIZE_LONGLONG) +}; - However, C format strings are also allowed and used. The following - functions are therefore decorators for the C format checker, and - will only fall back to Perl format if the C check is negative. */ +struct numbered_arg +{ + unsigned int number; + enum format_arg_type type; +}; struct spec { unsigned int directives; - hash_table hash; - void *c_format; + unsigned int numbered_arg_count; + unsigned int allocated; + struct numbered_arg *numbered; }; -static void * -format_parse (const char *string, char **invalid_reason) -{ - char *last_pos = (char *) string; - char *pos; - struct spec *spec; - void *c_format = formatstring_c.parse (string, invalid_reason); - - if (c_format == NULL) - return NULL; - - spec = (struct spec *) xmalloc (sizeof (struct spec)); +/* Locale independent test for a decimal digit. + Argument can be 'char' or 'unsigned char'. (Whereas the argument of + isdigit must be an 'unsigned char'.) */ +#undef isdigit +#define isdigit(c) ((unsigned int) ((c) - '0') < 10) - spec->c_format = c_format; +/* Locale independent test for a nonzero decimal digit. */ +#define isnonzerodigit(c) ((unsigned int) ((c) - '1') < 9) - init_hash (&spec->hash, 13); - while ((pos = strchr (last_pos, '[')) != NULL) - { - char *start = pos + 1; - - last_pos = start; - - if (*last_pos == '_' - || (*last_pos >= 'A' && *last_pos <= 'Z') - || (*last_pos >= 'a' && *last_pos <= 'z')) - { - ++last_pos; - - while (*last_pos == '_' - || (*last_pos >= '0' && *last_pos <= '9') - || (*last_pos >= 'A' && *last_pos <= 'Z') - || (*last_pos >= 'a' && *last_pos <= 'z')) - ++last_pos; - if (*last_pos == ']') - { - size_t len = last_pos - start; - int *hits; - - if (find_entry (&spec->hash, start, len, (void **) &hits) == 0) - { - ++(*hits); - } - else - { - hits = (int *) xmalloc (sizeof (int)); - *hits = 1; - insert_entry (&spec->hash, start, len, hits); - } - ++last_pos; - ++spec->directives; - } - } - } +static int +numbered_arg_compare (const void *p1, const void *p2) +{ + unsigned int n1 = ((const struct numbered_arg *) p1)->number; + unsigned int n2 = ((const struct numbered_arg *) p2)->number; - return spec; + return (n1 > n2 ? 1 : n1 < n2 ? -1 : 0); } -static void -format_free (void *description) +static void * +format_parse (const char *format, char **invalid_reason) { - struct spec *spec = (struct spec *) description; - - if (spec != NULL) + unsigned int directives; + unsigned int numbered_arg_count; + unsigned int allocated; + struct numbered_arg *numbered; + unsigned int unnumbered_arg_count; + struct spec *result; + + directives = 0; + numbered_arg_count = 0; + unnumbered_arg_count = 0; + allocated = 0; + numbered = NULL; + + for (; *format != '\0';) + if (*format++ == '%') + { + /* A directive. */ + unsigned int number = 0; + bool vectorize = false; + enum format_arg_type type; + enum format_arg_type size; + + directives++; + + if (isnonzerodigit (*format)) + { + const char *f = format; + unsigned int m = 0; + + do + { + m = 10 * m + (*f - '0'); + f++; + } + while (isdigit (*f)); + + if (*f == '$') + { + number = m; + format = ++f; + } + } + + /* Parse flags. */ + while (*format == ' ' || *format == '+' || *format == '-' + || *format == '#' || *format == '0') + format++; + + /* Parse vector. */ + if (*format == 'v') + { + format++; + vectorize = true; + } + else if (*format == '*') + { + const char *f = format; + + f++; + if (*f == 'v') + { + format = ++f; + vectorize = true; + + /* Unnumbered argument. */ + if (allocated == numbered_arg_count) + { + allocated = 2 * allocated + 1; + numbered = (struct numbered_arg *) xrealloc (numbered, allocated * sizeof (struct numbered_arg)); + } + numbered[numbered_arg_count].number = ++unnumbered_arg_count; + numbered[numbered_arg_count].type = FAT_SCALAR_VECTOR; /* or FAT_STRING? */ + numbered_arg_count++; + } + else if (isnonzerodigit (*f)) + { + unsigned int m = 0; + + do + { + m = 10 * m + (*f - '0'); + f++; + } + while (isdigit (*f)); + + if (*f == '$') + { + f++; + if (*f == 'v') + { + unsigned int vector_number = m; + + format = ++f; + vectorize = true; + + /* Numbered argument. */ + /* Note: As of perl-5.8.0, this is not correctly + implemented in perl's sv.c. */ + if (allocated == numbered_arg_count) + { + allocated = 2 * allocated + 1; + numbered = (struct numbered_arg *) xrealloc (numbered, allocated * sizeof (struct numbered_arg)); + } + numbered[numbered_arg_count].number = vector_number; + numbered[numbered_arg_count].type = FAT_SCALAR_VECTOR; /* or FAT_STRING? */ + numbered_arg_count++; + } + } + } + } + + if (vectorize) + { + /* Numbered or unnumbered argument. */ + if (allocated == numbered_arg_count) + { + allocated = 2 * allocated + 1; + numbered = (struct numbered_arg *) xrealloc (numbered, allocated * sizeof (struct numbered_arg)); + } + numbered[numbered_arg_count].number = (number ? number : ++unnumbered_arg_count); + numbered[numbered_arg_count].type = FAT_SCALAR_VECTOR; + numbered_arg_count++; + } + + /* Parse width. */ + if (*format == '*') + { + unsigned int width_number = 0; + + format++; + + if (isnonzerodigit (*format)) + { + const char *f = format; + unsigned int m = 0; + + do + { + m = 10 * m + (*f - '0'); + f++; + } + while (isdigit (*f)); + + if (*f == '$') + { + width_number = m; + format = ++f; + } + } + + /* Numbered or unnumbered argument. */ + /* Note: As of perl-5.8.0, this is not correctly + implemented in perl's sv.c. */ + if (allocated == numbered_arg_count) + { + allocated = 2 * allocated + 1; + numbered = (struct numbered_arg *) xrealloc (numbered, allocated * sizeof (struct numbered_arg)); + } + numbered[numbered_arg_count].number = (width_number ? width_number : ++unnumbered_arg_count); + numbered[numbered_arg_count].type = FAT_INTEGER; + numbered_arg_count++; + } + else if (isnonzerodigit (*format)) + { + do format++; while (isdigit (*format)); + } + + /* Parse precision. */ + if (*format == '.') + { + format++; + + if (*format == '*') + { + unsigned int precision_number = 0; + + format++; + + if (isnonzerodigit (*format)) + { + const char *f = format; + unsigned int m = 0; + + do + { + m = 10 * m + (*f - '0'); + f++; + } + while (isdigit (*f)); + + if (*f == '$') + { + precision_number = m; + format = ++f; + } + } + + /* Numbered or unnumbered argument. */ + if (allocated == numbered_arg_count) + { + allocated = 2 * allocated + 1; + numbered = (struct numbered_arg *) xrealloc (numbered, allocated * sizeof (struct numbered_arg)); + } + numbered[numbered_arg_count].number = (precision_number ? precision_number : ++unnumbered_arg_count); + numbered[numbered_arg_count].type = FAT_INTEGER; + numbered_arg_count++; + } + else + { + while (isdigit (*format)) format++; + } + } + + /* Parse size. */ + size = 0; + if (*format == 'h') + { + size = FAT_SIZE_SHORT; + format++; + } + else if (*format == 'l') + { + if (format[1] == 'l') + { + size = FAT_SIZE_LONGLONG; + format += 2; + } + else + { + size = FAT_SIZE_LONG; + format++; + } + } + else if (*format == 'L' || *format == 'q') + { + size = FAT_SIZE_LONGLONG; + format++; + } + else if (*format == 'V') + { + size = FAT_SIZE_V; + format++; + } + else if (*format == 'I') + { + if (format[1] == '6' && format[2] == '4') + { + size = FAT_SIZE_LONGLONG; + format += 3; + } + else if (format[1] == '3' && format[2] == '2') + { + size = 0; /* FAT_SIZE_INT */ + format += 3; + } + else + { + size = FAT_SIZE_PTR; + format++; + } + } + + switch (*format) + { + case '%': + type = FAT_NONE; + break; + case 'c': + type = FAT_CHAR; + break; + case 's': + type = FAT_STRING; + break; + case '_': + type = FAT_SCALAR_VECTOR; + break; + case 'D': + type = FAT_INTEGER | FAT_SIZE_V; + break; + case 'i': case 'd': + type = FAT_INTEGER | size; + break; + case 'U': case 'O': + type = FAT_INTEGER | FAT_UNSIGNED | FAT_SIZE_V; + break; + case 'u': case 'b': case 'o': case 'x': case 'X': + type = FAT_INTEGER | FAT_UNSIGNED | size; + break; + case 'e': case 'E': case 'f': case 'F': case 'g': case 'G': + if (size == FAT_SIZE_SHORT || size == FAT_SIZE_LONG) + { + *invalid_reason = + xasprintf (_("In the directive number %u, the size specifier is incompatible with the conversion specifier '%c'."), directives, *format); + goto bad_format; + } + type = FAT_DOUBLE | size; + break; + case 'p': + type = FAT_POINTER; + break; + case 'n': + type = FAT_COUNT_POINTER | size; + break; + default: + *invalid_reason = + (*format == '\0' + ? INVALID_UNTERMINATED_DIRECTIVE () + : INVALID_CONVERSION_SPECIFIER (directives, *format)); + goto bad_format; + } + + if (type != FAT_NONE && !vectorize) + { + /* Numbered or unnumbered argument. */ + if (allocated == numbered_arg_count) + { + allocated = 2 * allocated + 1; + numbered = (struct numbered_arg *) xrealloc (numbered, allocated * sizeof (struct numbered_arg)); + } + numbered[numbered_arg_count].number = (number ? number : ++unnumbered_arg_count); + numbered[numbered_arg_count].type = type; + numbered_arg_count++; + } + + format++; + } + + /* Sort the numbered argument array, and eliminate duplicates. */ + if (numbered_arg_count > 1) { - void *ptr = NULL; - const void *key; - size_t keylen; - void *data; + unsigned int i, j; + bool err; + + qsort (numbered, numbered_arg_count, + sizeof (struct numbered_arg), numbered_arg_compare); + + /* Remove duplicates: Copy from i to j, keeping 0 <= j <= i. */ + err = false; + for (i = j = 0; i < numbered_arg_count; i++) + if (j > 0 && numbered[i].number == numbered[j-1].number) + { + enum format_arg_type type1 = numbered[i].type; + enum format_arg_type type2 = numbered[j-1].type; + enum format_arg_type type_both; + + if (type1 == type2) + type_both = type1; + else + { + /* Incompatible types. */ + type_both = FAT_NONE; + if (!err) + *invalid_reason = + INVALID_INCOMPATIBLE_ARG_TYPES (numbered[i].number); + err = true; + } + + numbered[j-1].type = type_both; + } + else + { + if (j < i) + { + numbered[j].number = numbered[i].number; + numbered[j].type = numbered[i].type; + } + j++; + } + numbered_arg_count = j; + if (err) + /* *invalid_reason has already been set above. */ + goto bad_format; + } - while (iterate_table (&spec->hash, &ptr, &key, &keylen, &data) == 0) - free (data); + result = (struct spec *) xmalloc (sizeof (struct spec)); + result->directives = directives; + result->numbered_arg_count = numbered_arg_count; + result->allocated = allocated; + result->numbered = numbered; + return result; - delete_hash (&spec->hash); + bad_format: + if (numbered != NULL) + free (numbered); + return NULL; +} - if (spec->c_format) - formatstring_c.free (spec->c_format); +static void +format_free (void *descr) +{ + struct spec *spec = (struct spec *) descr; - free (spec); - } + if (spec->numbered != NULL) + free (spec->numbered); + free (spec); } static int -format_get_number_of_directives (void *description) +format_get_number_of_directives (void *descr) { - struct spec *spec = (struct spec *) description; - int c_directives; - - if (spec->c_format) - c_directives = formatstring_c.get_number_of_directives (spec->c_format); - else - c_directives = 0; + struct spec *spec = (struct spec *) descr; - return c_directives + spec->directives; + return spec->directives; } static bool @@ -154,107 +544,89 @@ format_check (const lex_pos_ty *pos, void *msgid_descr, void *msgstr_descr, { struct spec *spec1 = (struct spec *) msgid_descr; struct spec *spec2 = (struct spec *) msgstr_descr; - bool result = false; - void *ptr = NULL; - const void *key; - size_t keylen; - int *hits1; - int *hits2; - - /* First check the perl arguments. We are probably faster than format-c. */ - if (equality) + bool err = false; + + if (spec1->numbered_arg_count + spec2->numbered_arg_count > 0) { - /* Pass 1: Check that every format specification in msgid has its - counterpart in msgstr. This is only necessary for equality. */ - while (iterate_table (&spec1->hash, &ptr, &key, &keylen, - (void **) &hits1) == 0) + unsigned int i, j; + unsigned int n1 = spec1->numbered_arg_count; + unsigned int n2 = spec2->numbered_arg_count; + + /* Check the argument names are the same. + Both arrays are sorted. We search for the first difference. */ + for (i = 0, j = 0; i < n1 || j < n2; ) { - if (find_entry (&spec2->hash, key, keylen, (void **) &hits2) == 0) + int cmp = (i >= n1 ? 1 : + j >= n2 ? -1 : + spec1->numbered[i].number > spec2->numbered[j].number ? 1 : + spec1->numbered[i].number < spec2->numbered[j].number ? -1 : + 0); + + if (cmp > 0) + { + if (noisy) + { + error_with_progname = false; + error_at_line (0, 0, pos->file_name, pos->line_number, + _("a format specification for argument %u, as in '%s', doesn't exist in 'msgid'"), + spec2->numbered[j].number, pretty_msgstr); + error_with_progname = true; + } + err = true; + break; + } + else if (cmp < 0) { - if (*hits1 != *hits2) + if (equality) { - result = true; if (noisy) { - char *argname = (char *) alloca (keylen + 1); - memcpy (argname, key, keylen); - argname[keylen] = '\0'; - - /* The next message shows the limitations of ngettext. - It is not smart enough to say "once", "twice", - "thrice/%d times", "%d times", ..., and it cannot - grok with two variable arguments. We have to - work around the problem. */ error_with_progname = false; error_at_line (0, 0, pos->file_name, pos->line_number, - _("\ -appearances of named argument '[%s]' do not match \ -(%d in original string, %d in '%s')"), - argname, *hits1, *hits2, pretty_msgstr); + _("a format specification for argument %u doesn't exist in '%s'"), + spec1->numbered[i].number, pretty_msgstr); error_with_progname = true; } - else - return true; - } - } - else - { - result = true; - if (noisy) - { - char *argname = (char *) alloca (keylen + 1); - memcpy (argname, key, keylen); - argname[keylen] = '\0'; - - error_with_progname = false; - error_at_line (0, 0, pos->file_name, pos->line_number, - _("\ -named argument '[%s]' appears in original string but not in '%s'"), - argname, pretty_msgstr); - error_with_progname = true; + err = true; + break; } else - return true; - } - } - } - - /* Pass 2: Check that the number of format specifications in msgstr - does not exceed the number of appearances in msgid. */ - ptr = NULL; - while (iterate_table (&spec2->hash, &ptr, &key, &keylen, (void**) &hits2) - == 0) - { - if (find_entry (&spec1->hash, key, keylen, (void**) &hits1) != 0) - { - result = true; - if (noisy) - { - char *argname = (char *) alloca (keylen + 1); - memcpy (argname, key, keylen); - argname[keylen] = '\0'; - - error_with_progname = false; - error_at_line (0, 0, pos->file_name, pos->line_number, - _("\ -named argument '[%s]' appears only in '%s' but not in the original string"), - argname, pretty_msgstr); - error_with_progname = true; + i++; } else - return true; + j++, i++; } + /* Check the argument types are the same. */ + if (!err) + for (i = 0, j = 0; j < n2; ) + { + if (spec1->numbered[i].number == spec2->numbered[j].number) + { + if (spec1->numbered[i].type != spec2->numbered[j].type) + { + if (noisy) + { + error_with_progname = false; + error_at_line (0, 0, pos->file_name, pos->line_number, + _("format specifications in 'msgid' and '%s' for argument %u are not the same"), + pretty_msgstr, + spec2->numbered[j].number); + error_with_progname = true; + } + err = true; + break; + } + j++, i++; + } + else + i++; + } } - if (spec1->c_format && spec2->c_format) - { - result |= formatstring_c.check (pos, spec1->c_format, spec2->c_format, - equality, noisy, pretty_msgstr); - } - - return result; + return err; } + struct formatstring_parser formatstring_perl = { format_parse, @@ -273,15 +645,11 @@ struct formatstring_parser formatstring_perl = #include "getline.h" static void -format_print (void *descr, const char *line) +format_print (void *descr) { struct spec *spec = (struct spec *) descr; - void *ptr = NULL; - const void *key; - size_t keylen; - int *data; - - printf ("%s=> ", line); + unsigned int last; + unsigned int i; if (spec == NULL) { @@ -289,14 +657,71 @@ format_print (void *descr, const char *line) return; } - while (iterate_table (&spec->hash, &ptr, &key, &keylen, (void**) &data) == 0) + printf ("("); + last = 1; + for (i = 0; i < spec->numbered_arg_count; i++) { - char *argname = (char *) alloca (keylen + 1); - memcpy (argname, key, keylen); - argname[keylen] = '\0'; - - printf (">>>[%s]<<< ", argname); + unsigned int number = spec->numbered[i].number; + + if (i > 0) + printf (" "); + if (number < last) + abort (); + for (; last < number; last++) + printf ("_ "); + if (spec->numbered[i].type & FAT_UNSIGNED) + printf ("[unsigned]"); + switch (spec->numbered[i].type & FAT_SIZE_MASK) + { + case 0: + break; + case FAT_SIZE_SHORT: + printf ("[short]"); + break; + case FAT_SIZE_V: + printf ("[IV]"); + break; + case FAT_SIZE_PTR: + printf ("[PTR]"); + break; + case FAT_SIZE_LONG: + printf ("[long]"); + break; + case FAT_SIZE_LONGLONG: + printf ("[long long]"); + break; + default: + abort (); + } + switch (spec->numbered[i].type & ~(FAT_UNSIGNED | FAT_SIZE_MASK)) + { + case FAT_INTEGER: + printf ("i"); + break; + case FAT_DOUBLE: + printf ("f"); + break; + case FAT_CHAR: + printf ("c"); + break; + case FAT_STRING: + printf ("s"); + break; + case FAT_SCALAR_VECTOR: + printf ("sv"); + break; + case FAT_POINTER: + printf ("p"); + break; + case FAT_COUNT_POINTER: + printf ("n"); + break; + default: + abort (); + } + last = number + 1; } + printf (")"); } int @@ -319,12 +744,11 @@ main () invalid_reason = NULL; descr = format_parse (line, &invalid_reason); - format_print (descr, line); + format_print (descr); printf ("\n"); if (descr == NULL) printf ("%s\n", invalid_reason); - format_free (descr); free (invalid_reason); free (line); } diff --git a/gettext-tools/src/format.c b/gettext-tools/src/format.c index 45e772999..0dc216fd5 100644 --- a/gettext-tools/src/format.c +++ b/gettext-tools/src/format.c @@ -38,5 +38,6 @@ struct formatstring_parser *formatstring_parsers[NFORMATS] = /* format_ycp */ &formatstring_ycp, /* format_tcl */ &formatstring_tcl, /* format_perl */ &formatstring_perl, + /* format_perl_bracket */ &formatstring_perl_bracket, /* format_php */ &formatstring_php }; diff --git a/gettext-tools/src/format.h b/gettext-tools/src/format.h index f633e1884..ca0d61c1b 100644 --- a/gettext-tools/src/format.h +++ b/gettext-tools/src/format.h @@ -67,6 +67,7 @@ extern struct formatstring_parser formatstring_pascal; extern struct formatstring_parser formatstring_ycp; extern struct formatstring_parser formatstring_tcl; extern struct formatstring_parser formatstring_perl; +extern struct formatstring_parser formatstring_perl_bracket; extern struct formatstring_parser formatstring_php; /* Table of all format string parsers. */ diff --git a/gettext-tools/src/message.c b/gettext-tools/src/message.c index b6dee1915..e72957451 100644 --- a/gettext-tools/src/message.c +++ b/gettext-tools/src/message.c @@ -46,6 +46,7 @@ const char *const format_language[NFORMATS] = /* format_ycp */ "ycp", /* format_tcl */ "tcl", /* format_perl */ "perl", + /* format_perl_bracket */ "perl-bracket", /* format_php */ "php" }; @@ -63,6 +64,7 @@ const char *const format_language_pretty[NFORMATS] = /* format_ycp */ "YCP", /* format_tcl */ "Tcl", /* format_perl */ "Perl", + /* format_perl_bracket */ "Perl bracket", /* format_php */ "PHP" }; diff --git a/gettext-tools/src/message.h b/gettext-tools/src/message.h index 323715b5a..2926ca2a7 100644 --- a/gettext-tools/src/message.h +++ b/gettext-tools/src/message.h @@ -46,9 +46,10 @@ enum format_type format_ycp, format_tcl, format_perl, + format_perl_bracket, format_php }; -#define NFORMATS 13 /* Number of format_type enum values. */ +#define NFORMATS 14 /* Number of format_type enum values. */ extern const char *const format_language[NFORMATS]; extern const char *const format_language_pretty[NFORMATS]; diff --git a/gettext-tools/src/x-perl.c b/gettext-tools/src/x-perl.c index 8401289c8..7e89b7d3e 100644 --- a/gettext-tools/src/x-perl.c +++ b/gettext-tools/src/x-perl.c @@ -811,6 +811,8 @@ extract_quotelike_pass3 (token_ty *tp, int error_level) quotemeta = false; while (*crs) { + bool backslashed; + /* Ensure room for 6 bytes. */ if (bufpos + 6 > bufmax) { @@ -1032,10 +1034,15 @@ extract_quotelike_pass3 (token_ty *tp, int error_level) buffer[bufpos++] = *crs; ++crs; continue; + default: + backslashed = true; + break; } } + else + backslashed = false; - if (*crs == '$' || *crs == '@') + if (!backslashed && (*crs == '$' || *crs == '@')) { error_with_progname = false; error (error_level, 0, _("\ diff --git a/gettext-tools/src/x-perl.h b/gettext-tools/src/x-perl.h index 28f3e4ebb..0f67b0329 100644 --- a/gettext-tools/src/x-perl.h +++ b/gettext-tools/src/x-perl.h @@ -24,7 +24,7 @@ { "cgi", "perl" }, \ #define SCANNERS_PERL \ - { "perl", extract_perl, &formatstring_c, &formatstring_perl }, \ + { "perl", extract_perl, &formatstring_perl, &formatstring_perl_bracket }, \ /* Scan a Perl file and add its translatable strings to mdlp. */ extern void extract_perl (FILE *fp, const char *real_filename, diff --git a/gettext-tools/tests/ChangeLog b/gettext-tools/tests/ChangeLog index 1550e1643..af72d7132 100644 --- a/gettext-tools/tests/ChangeLog +++ b/gettext-tools/tests/ChangeLog @@ -1,3 +1,20 @@ +2003-06-21 Bruno Haible + + * xgettext-26: Add more test cases, some from Guido Flohr. + * format-perl-1: Complete rewrite, based on format-c-1. + * format-perl-2: Complete rewrite, based on format-c-2. + * format-perl-bracket-1: New file, based on format-python-1, partially + from Guido Flohr. + * format-perl-bracket-2: New file, based on format-python-2, partially + from Guido Flohr. + * format-perl-mixed-1: New file, from Guido Flohr. + * format-perl-mixed-2: New file, from Guido Flohr. + * lang-perl-1: Renamed from lang-perl. + * lang-perl-2: New file, from Guido Flohr, based on lang-perl. + * Makefile.am (TESTS): Add format-perl-bracket-1, + format-perl-bracket-2, format-perl-mixed-1, format-perl-mixed-2, + lang-perl-1, lang-perl-2. Remove lang-perl. + 2003-06-19 Bruno Haible * Makefile.am (TESTS_ENVIRONMENT): Also set LOCALE_FR. diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am index 95731bf2c..118de205c 100644 --- a/gettext-tools/tests/Makefile.am +++ b/gettext-tools/tests/Makefile.am @@ -60,12 +60,14 @@ TESTS = gettext-1 gettext-2 \ format-python-1 format-python-2 \ format-pascal-1 format-pascal-2 \ format-perl-1 format-perl-2 \ + format-perl-bracket-1 format-perl-bracket-2 \ + format-perl-mixed-1 format-perl-mixed-2 \ format-tcl-1 format-tcl-2 \ format-ycp-1 format-ycp-2 \ plural-1 plural-2 \ lang-c lang-c++ lang-objc lang-python-1 lang-python-2 lang-clisp \ lang-elisp lang-librep lang-smalltalk lang-java lang-gawk lang-pascal \ - lang-ycp lang-tcl lang-perl lang-php lang-po \ + lang-ycp lang-tcl lang-perl-1 lang-perl-2 lang-php lang-po \ lang-rst EXTRA_DIST += $(TESTS) \ diff --git a/gettext-tools/tests/format-perl-1 b/gettext-tools/tests/format-perl-1 index 801a5b5cb..878a7ed5a 100755 --- a/gettext-tools/tests/format-perl-1 +++ b/gettext-tools/tests/format-perl-1 @@ -8,123 +8,127 @@ trap 'rm -fr $tmpfiles' 1 2 3 15 tmpfiles="$tmpfiles f-pl-1.data" cat <<\EOF > f-pl-1.data # Valid: no argument -'abc%%' -# Valid: one character argument -'abc%c' +"abc%%" +# Valid: one small integer argument +"abc%c" # Valid: one string argument -'abc%s' +"abc%s" +# Valid: one scalar vector argument +"abc%_" +# Valid: one pointer argument +"abc%p" # Valid: one integer argument -'abc%i' +"abc%i" # Valid: one integer argument -'abc%d' +"abc%d" # Valid: one integer argument -'abc%o' +"abc%D" # Valid: one integer argument -'abc%u' +"abc%u" # Valid: one integer argument -'abc%x' +"abc%U" # Valid: one integer argument -'abc%X' -# Valid: one floating-point argument -'abc%e' -# Valid: one floating-point argument -'abc%E' +"abc%b" +# Valid: one integer argument +"abc%o" +# Valid: one integer argument +"abc%O" +# Valid: one integer argument +"abc%x" +# Valid: one integer argument +"abc%X" # Valid: one floating-point argument -'abc%f' +"abc%e" # Valid: one floating-point argument -'abc%F' +"abc%E" # Valid: one floating-point argument -'abc%g' +"abc%f" # Valid: one floating-point argument -'abc%G' +"abc%F" # Valid: one floating-point argument -'abc%a' +"abc%g" # Valid: one floating-point argument -'abc%A' -# Valid: one pointer argument -'abc%p' +"abc%G" # Valid: one argument with flags -'abc%0#g' +"abc%0#g" # Valid: one argument with width -'abc%2g' +"abc%2g" # Valid: one argument with width -'abc%*g' +"abc%*g" # Valid: one argument with precision -'abc%.4g' +"abc%.4g" # Valid: one argument with precision -'abc%.*g' +"abc%.*g" # Valid: one argument with width and precision -'abc%14.4g' +"abc%14.4g" # Valid: one argument with width and precision -'abc%14.*g' +"abc%14.*g" # Valid: one argument with width and precision -'abc%*.4g' +"abc%*.4g" # Valid: one argument with width and precision -'abc%*.*g' +"abc%*.*g" # Valid: one argument with size specifier -'abc%hhi' +"abc%hi" # Valid: one argument with size specifier -'abc%hi' +"abc%li" # Valid: one argument with size specifier -'abc%li' +"abc%lli" # Valid: one argument with size specifier -'abc%lli' +"abc%Lg" # Valid: one argument with size specifier -'abc%Lg' +"abc%qi" # Valid: one argument with size specifier -'abc%qi' +"abc%Vi" # Valid: one argument with size specifier -'abc%ji' +"abc%I32i" # Valid: one argument with size specifier -'abc%zi' +"abc%I64i" # Valid: one argument with size specifier -'abc%ti' +"abc%Ii" # Invalid: unterminated -'abc%' +"abc%" # Invalid: unknown format specifier -'abc%y' +"abc%y" # Invalid: flags after width -'abc%*0g' +"abc%*0g" # Invalid: twice precision -'abc%.4.2g' +"abc%.4.2g" # Valid: three arguments -'abc%d%u%u' +"abc%d%u%u" # Valid: a numbered argument -'abc%1$d' +"abc%1$d" # Invalid: zero -'abc%0$d' +"abc%0$d" # Valid: two-digit numbered arguments -'abc%11$def%10$dgh%9$dij%8$dkl%7$dmn%6$dop%5$dqr%4$dst%3$duv%2$dwx%1$dyz' +"abc%11$def%10$dgh%9$dij%8$dkl%7$dmn%6$dop%5$dqr%4$dst%3$duv%2$dwx%1$dyz" # Invalid: unterminated number -'abc%1' +"abc%1" # Invalid: flags before number -'abc%+1$d' +"abc%+1$d" # Valid: three arguments, two with same number -'abc%1$4x,%2$c,%1$u' +"abc%1$4x,%2$c,%1$u" # Invalid: argument with conflicting types -'abc%1$4x,%2$c,%1$s' +"abc%1$4x,%2$c,%1$s" # Valid: no conflict -'abc%1$4x,%2$c,%1$u' -# Invalid: mixing of numbered and unnumbered arguments -'abc%d%2$x' +"abc%1$4x,%2$c,%1$u" +# Valid: mixing of numbered and unnumbered arguments +"abc%d%2$x" # Valid: numbered argument with constant precision -'abc%1$.9x' -# Invalid: mixing of numbered and unnumbered arguments -'abc%1$.*x' -# Invalid: missing non-final argument -'abc%2$x%3$s' +"abc%1$.9x" +# Valid: mixing of numbered and unnumbered arguments +"abc%2$.*x" +# Valid: missing non-final argument +"abc%2$x%3$s" # Valid: permutation -'abc%2$ddef%1$d' +"abc%2$ddef%1$d" # Valid: multiple uses of same argument -'abc%2$xdef%1$pghi%2$x' +"abc%2$xdef%1$pghi%2$x" # Valid: one argument with width -'abc%2$#*1$g' +"abc%2$#*1$g" # Valid: one argument with width and precision -'abc%3$*2$.*1$g' +"abc%3$*2$.*1$g" # Invalid: zero -'abc%2$*0$.*1$g' -# Valid: named arguments -'[foo] [bar] [baz]' +"abc%2$*0$.*1$g" EOF : ${XGETTEXT=xgettext} @@ -133,10 +137,10 @@ while read comment; do read string n=`expr $n + 1` tmpfiles="$tmpfiles f-pl-1-$n.in f-pl-1-$n.po" - cat < f-pl-1-$n.in + sed -e 's,\$,\\$,g' < f-pl-1-$n.in gettext(${string}); EOF - ${XGETTEXT} -L perl -o f-pl-1-$n.po f-pl-1-$n.in || exit 1 + ${XGETTEXT} -L Perl -o f-pl-1-$n.po f-pl-1-$n.in || exit 1 test -f f-pl-1-$n.po || exit 1 fail= if echo "$comment" | grep 'Valid:' > /dev/null; then @@ -159,6 +163,7 @@ EOF cat f-pl-1-$n.po 1>&2 exit 1 fi + rm -f f-pl-1-$n.in f-pl-1-$n.po done < f-pl-1.data rm -fr $tmpfiles diff --git a/gettext-tools/tests/format-perl-2 b/gettext-tools/tests/format-perl-2 index 929628aff..9382fd7c0 100755 --- a/gettext-tools/tests/format-perl-2 +++ b/gettext-tools/tests/format-perl-2 @@ -50,6 +50,9 @@ msgstr "xyz%1$uvw%2$c" msgid "abc%i" msgstr "xyz%d" # Valid: type compatibility +msgid "abc%b" +msgstr "xyz%o" +# Valid: type compatibility msgid "abc%o" msgstr "xyz%u" # Valid: type compatibility @@ -58,6 +61,9 @@ msgstr "xyz%x" # Valid: type compatibility msgid "abc%u" msgstr "xyz%X" +# Valid: type and size compatibility +msgid "abc%x" +msgstr "xyz%X" # Valid: type compatibility msgid "abc%e" msgstr "xyz%E" @@ -66,6 +72,9 @@ msgid "abc%e" msgstr "xyz%f" # Valid: type compatibility msgid "abc%e" +msgstr "xyz%F" +# Valid: type compatibility +msgid "abc%e" msgstr "xyz%g" # Valid: type compatibility msgid "abc%e" @@ -75,49 +84,127 @@ msgid "abc%c" msgstr "xyz%s" # Invalid: type incompatibility msgid "abc%c" +msgstr "xyz%_" +# Invalid: type incompatibility +msgid "abc%c" msgstr "xyz%i" # Invalid: type incompatibility msgid "abc%c" -msgstr "xyz%o" +msgstr "xyz%u" # Invalid: type incompatibility msgid "abc%c" msgstr "xyz%e" # Invalid: type incompatibility +msgid "abc%c" +msgstr "xyz%p" +# Invalid: type incompatibility +msgid "abc%c" +msgstr "xyz%n" +# Invalid: type incompatibility +msgid "abc%s" +msgstr "xyz%_" +# Invalid: type incompatibility msgid "abc%s" msgstr "xyz%i" # Invalid: type incompatibility msgid "abc%s" -msgstr "xyz%o" +msgstr "xyz%u" # Invalid: type incompatibility msgid "abc%s" msgstr "xyz%e" # Invalid: type incompatibility +msgid "abc%s" +msgstr "xyz%p" +# Invalid: type incompatibility +msgid "abc%s" +msgstr "xyz%n" +# Invalid: type incompatibility +msgid "abc%_" +msgstr "xyz%i" +# Invalid: type incompatibility +msgid "abc%_" +msgstr "xyz%u" +# Invalid: type incompatibility +msgid "abc%_" +msgstr "xyz%e" +# Invalid: type incompatibility +msgid "abc%_" +msgstr "xyz%p" +# Invalid: type incompatibility +msgid "abc%_" +msgstr "xyz%n" +# Invalid: type incompatibility msgid "abc%i" -msgstr "xyz%o" +msgstr "xyz%u" # Invalid: type incompatibility msgid "abc%i" msgstr "xyz%e" # Invalid: type incompatibility +msgid "abc%i" +msgstr "xyz%p" +# Invalid: type incompatibility +msgid "abc%i" +msgstr "xyz%n" +# Invalid: type incompatibility msgid "abc%u" msgstr "xyz%e" +# Invalid: type incompatibility +msgid "abc%u" +msgstr "xyz%p" +# Invalid: type incompatibility +msgid "abc%u" +msgstr "xyz%n" +# Invalid: type incompatibility +msgid "abc%e" +msgstr "xyz%p" +# Invalid: type incompatibility +msgid "abc%e" +msgstr "xyz%n" +# Invalid: type incompatibility +msgid "abc%p" +msgstr "xyz%n" +# Invalid: size incompatibility +msgid "abc%hd" +msgstr "xyz%d" +# Invalid: size incompatibility +msgid "abc%hd" +msgstr "xyz%ld" +# Invalid: size incompatibility +msgid "abc%hd" +msgstr "xyz%Vd" +# Invalid: size incompatibility +msgid "abc%hd" +msgstr "xyz%qd" +# Invalid: size incompatibility +msgid "abc%d" +msgstr "xyz%ld" +# Invalid: size incompatibility +msgid "abc%d" +msgstr "xyz%Vd" +# Invalid: size incompatibility +msgid "abc%d" +msgstr "xyz%qd" +# Invalid: size incompatibility +msgid "abc%ld" +msgstr "xyz%Vd" +# Invalid: size incompatibility +msgid "abc%ld" +msgstr "xyz%qd" +# Invalid: size incompatibility +msgid "abc%Vd" +msgstr "xyz%qd" +# Invalid: size incompatibility +msgid "abc%d" +msgstr "xyz%D" +# Invalid: size incompatibility +msgid "abc%u" +msgstr "xyz%U" +# Invalid: size incompatibility +msgid "abc%o" +msgstr "xyz%O" # Invalid: type incompatibility for width msgid "abc%g%*g" msgstr "xyz%*g%g" -# Valid: named arguments in arbitrary order -msgid "[foo] [bar] [baz]: 1" -msgstr "[baz] [bar] [foo]: 1" -# Invalid: missing named argument in msgstr -msgid "[foo] [bar] [baz]: 2" -msgstr "[baz] [foo]: 2" -# Invalid: missing named arguments in msgid -msgid "[foo] [baz]: 3" -msgstr "[baz] [bar] [foo]: 3" -# Invalid: number of named arguments in msgstr exceeds msgid -msgid "[foo] [bar] [baz]: 4" -msgstr "[baz] [bar] [foo] [bar]: 4" -# Invalid: number of named arguments in msgid exceeds msgstr -msgid "[baz] [bar] [foo] [bar]: 5" -msgstr "[foo] [bar] [baz]: 5" EOF : ${MSGFMT=msgfmt} @@ -152,6 +239,7 @@ EOF cat f-pl-2-$n.po 1>&2 exit 1 fi + rm -f f-pl-2-$n.po f-pl-2-$n.mo done < f-pl-2.data rm -fr $tmpfiles diff --git a/gettext-tools/tests/format-perl-bracket-1 b/gettext-tools/tests/format-perl-bracket-1 new file mode 100755 index 000000000..16ce6842f --- /dev/null +++ b/gettext-tools/tests/format-perl-bracket-1 @@ -0,0 +1,71 @@ +#! /bin/sh + +# Test recognition of Perl bracket format strings. + +tmpfiles="" +trap 'rm -fr $tmpfiles' 1 2 3 15 + +tmpfiles="$tmpfiles f-pb-1.data" +cat <<\EOF > f-pb-1.data +# Invalid: no argument +"abc" +# Valid: a named argument +"abc[value]" +# Invalid: an empty name +"abc[]" +# Invalid: unterminated name +"abc[value" +# Valid: three arguments, two with equal names +"abc[addr],[char],[addr]" +# Invalid: place-holder contains a space. +"[foo bar]" +# Invalid: missing right angle bracket. +"[foo bar" +# Valid: not nested, but one single place-holder. +"[foo[bar]baz]" +# Valid: no nesting error, but one single place-holder. +"[foo[bar]baz" +# Valid: place-holder with spaces must be ignored, but still one remaining. +"[foo bar] [baz]" +# Invalid: percent sign not allowed. +"[foo%bar]" +EOF + +: ${XGETTEXT=xgettext} +n=0 +while read comment; do + read string + n=`expr $n + 1` + tmpfiles="$tmpfiles f-pb-1-$n.in f-pb-1-$n.po" + cat < f-pb-1-$n.in +gettext(${string}); +EOF + ${XGETTEXT} -L Perl -o f-pb-1-$n.po f-pb-1-$n.in || exit 1 + test -f f-pb-1-$n.po || exit 1 + fail= + if echo "$comment" | grep 'Valid:' > /dev/null; then + if grep perl-bracket-format f-pb-1-$n.po > /dev/null; then + : + else + fail=yes + fi + else + if grep perl-bracket-format f-pb-1-$n.po > /dev/null; then + fail=yes + else + : + fi + fi + if test -n "$fail"; then + echo "Format string recognition error:" 1>&2 + cat f-pb-1-$n.in 1>&2 + echo "Got:" 1>&2 + cat f-pb-1-$n.po 1>&2 + exit 1 + fi + rm -f f-pb-1-$n.in f-pb-1-$n.po +done < f-pb-1.data + +rm -fr $tmpfiles + +exit 0 diff --git a/gettext-tools/tests/format-perl-bracket-2 b/gettext-tools/tests/format-perl-bracket-2 new file mode 100755 index 000000000..5c565d814 --- /dev/null +++ b/gettext-tools/tests/format-perl-bracket-2 @@ -0,0 +1,70 @@ +#! /bin/sh + +# Test checking of Perl bracket format strings. + +tmpfiles="" +trap 'rm -fr $tmpfiles' 1 2 3 15 + +tmpfiles="$tmpfiles f-pb-2.data" +cat <<\EOF > f-pb-2.data +# Valid: same named arguments +msgid "abc[date][time]" +msgstr "xyz[date][time]" +# Valid: permutation +msgid "abc[x3][x1][x2]def" +msgstr "xyz[x2][x1][x3]" +# Invalid: missing argument +msgid "abc[x2]def[x1]" +msgstr "xyz[x1]" +# Invalid: missing argument +msgid "abc[x1]def[x2]" +msgstr "xyz[x2]" +# Invalid: added argument +msgid "abc[foo]def" +msgstr "xyz[foo]uvw[zoo]" +# Valid: multiple reuse of same argument +msgid "[foo] [bar] [baz]" +msgstr "[baz] [bar] [foo] [bar]" +# Valid: single reuse of same argument +msgid "[baz] [bar] [foo] [bar]" +msgstr "[foo] [bar] [baz]" +EOF + +: ${MSGFMT=msgfmt} +n=0 +while read comment; do + read msgid_line + read msgstr_line + n=`expr $n + 1` + tmpfiles="$tmpfiles f-pb-2-$n.po f-pb-2-$n.mo" + cat < f-pb-2-$n.po +#, perl-bracket-format +${msgid_line} +${msgstr_line} +EOF + fail= + if echo "$comment" | grep 'Valid:' > /dev/null; then + if ${MSGFMT} --check-format -o f-pb-2-$n.mo f-pb-2-$n.po; then + : + else + fail=yes + fi + else + ${MSGFMT} --check-format -o f-pb-2-$n.mo f-pb-2-$n.po 2> /dev/null + if test $? = 1; then + : + else + fail=yes + fi + fi + if test -n "$fail"; then + echo "Format string checking error:" 1>&2 + cat f-pb-2-$n.po 1>&2 + exit 1 + fi + rm -f f-pb-2-$n.po f-pb-2-$n.mo +done < f-pb-2.data + +rm -fr $tmpfiles + +exit 0 diff --git a/gettext-tools/tests/format-perl-mixed-1 b/gettext-tools/tests/format-perl-mixed-1 new file mode 100755 index 000000000..83da4a6f6 --- /dev/null +++ b/gettext-tools/tests/format-perl-mixed-1 @@ -0,0 +1,71 @@ +#! /bin/sh + +# Test recognition of Perl format strings of both kinds (printf and bracketed). +# This test is for the combination of both kinds. + +tmpfiles="" +trap 'rm -fr $tmpfiles' 1 2 3 15 + +tmpfiles="$tmpfiles f-pm-1.data" + +cat <<\EOF > f-pm-1.data +# Both formats. +#, perl-format, perl-bracket-format +"[foo] %c [bar] %d [baz]" +# printf format only. +#, perl-format +"%c %d" +# printf format only, because '%' is not allowed in identifier. +#, perl-format +"[foo%cbar]" +# Valid bracketed format because there is still one valid identifier. +#, perl-format, perl-bracket-format +"[foo%cbar] [baz]" +# Bracketed format only, because %l is not recognized in printf format. +#, perl-bracket-format +"[foo] %l [bar]" +# Neither format recognized here. + +"[foo bar %l" +EOF + +: ${XGETTEXT=xgettext} +n=0 +while read comment; do + read formats + read string + n=`expr $n + 1` + tmpfiles="$tmpfiles f-pm-1-$n.in f-pm-1-$n.po" + cat < f-pm-1-$n.in +gettext(${string}); +EOF + ${XGETTEXT} -L perl --omit-header --no-location -o f-pm-1-$n.po f-pm-1-$n.in || exit 1 + test -f f-pm-1-$n.po || exit 1 + fail= + if test -n "${formats}"; then + # Verify that the first line contains the expected #, comment. + if sed 1q < f-pm-1-$n.po | grep '^'"${formats}"'$' > /dev/null; then + : + else + fail=yes + fi + else + # Verify that there is no #, comment. + if sed 1q < f-pm-1-$n.po | grep '^msgid' > /dev/null; then + : + else + fail=yes + fi + fi + if test -n "$fail"; then + echo "Format string recognition error:" 1>&2 + cat f-pm-1-$n.in 1>&2 + echo "Got:" 1>&2 + cat f-pm-1-$n.po 1>&2 + exit 1 + fi +done < f-pm-1.data + +rm -fr $tmpfiles + +exit 0 diff --git a/gettext-tools/tests/format-perl-mixed-2 b/gettext-tools/tests/format-perl-mixed-2 new file mode 100755 index 000000000..696315cd9 --- /dev/null +++ b/gettext-tools/tests/format-perl-mixed-2 @@ -0,0 +1,78 @@ +#! /bin/sh + +# Test checking of Perl format strings. +# This test is for the combination of printf and bracketed format strings. + +tmpfiles="" +trap 'rm -fr $tmpfiles' 1 2 3 15 + +tmpfiles="$tmpfiles f-pm-2.data" +cat <<\EOF > f-pm-2.data +# Valid: normal case. +#, perl-format, perl-bracket-format +msgid "[foo] %d [bar] %s" +msgstr "[bar] [foo] %d %s" +# Invalid: missing argument. +#, perl-format, perl-bracket-format +msgid "[foo] %d [bar] %s" +msgstr "[bar] %d %s" +# Valid: missing argument but checking disabled. +#, perl-format, no-perl-bracket-format +msgid "[foo] %d [bar] %s" +msgstr "[bar] %d %s" +# Invalid: printf format reordered without position specifiers %1$, %2$. +#, perl-format, perl-bracket-format +msgid "[foo] %d [bar] %s" +msgstr "[bar] %s [foo] %d" +# Valid: same thing but checking disabled. +#, no-perl-format, perl-bracket-format +msgid "[foo] %d [bar] %s" +msgstr "[bar] %s [foo] %d" +# Invalid: unnamed vs. named arguments +#, perl-format +msgid "abc%sdef" +msgstr "xyz[value]" +# Invalid: named vs. unnamed arguments +#, perl-bracket-format +msgid "abc[value]def" +msgstr "xyz%s" +EOF + +: ${MSGFMT=msgfmt} +n=0 +while read comment; do + read formats + read msgid_line + read msgstr_line + n=`expr $n + 1` + tmpfiles="$tmpfiles f-pm-2-$n.po f-pm-2-$n.mo" + cat < f-pm-2-$n.po +${formats} +${msgid_line} +${msgstr_line} +EOF + fail= + if echo "$comment" | grep 'Valid:' > /dev/null; then + if ${MSGFMT} --check-format -o f-pm-2-$n.mo f-pm-2-$n.po; then + : + else + fail=yes + fi + else + ${MSGFMT} --check-format -o f-pm-2-$n.mo f-pm-2-$n.po 2> /dev/null + if test $? = 1; then + : + else + fail=yes + fi + fi + if test -n "$fail"; then + echo "Format string checking error:" 1>&2 + cat f-pm-2-$n.po 1>&2 + exit 1 + fi +done < f-pm-2.data + +rm -fr $tmpfiles + +exit 0 diff --git a/gettext-tools/tests/lang-perl-1 b/gettext-tools/tests/lang-perl-1 new file mode 100755 index 000000000..ed015e2db --- /dev/null +++ b/gettext-tools/tests/lang-perl-1 @@ -0,0 +1,100 @@ +#! /bin/sh + +# Test of gettext facilities in the Perl language, +# using printf format strings. +# Assumes an fr_FR locale is installed. +# Assumes the following packages are installed: perl. + +tmpfiles="" +trap 'rm -fr $tmpfiles' 1 2 3 15 + +tmpfiles="$tmpfiles program.pl" +cat <<\EOF > program.pl +use Locale::Messages; +textdomain "prog"; +bindtextdomain ("./"); +my $n = 2; +print _"'Your command, please?', asked the waiter."; +printf ngettext ("a piece of cake", "%d pieces of cake", $n), $n; +printf _"%s is replaced by %s.", "FF", "EUR"; +EOF + +tmpfiles="$tmpfiles prog.pot" +: ${XGETTEXT=xgettext} +${XGETTEXT} -k_ -o prog.pot --omit-header --no-location program.pl + +tmpfiles="$tmpfiles prog.ok" +cat < prog.ok +msgid "'Your command, please?', asked the waiter." +msgstr "" + +#, perl-format +msgid "a piece of cake" +msgid_plural "%d pieces of cake" +msgstr[0] "" +msgstr[1] "" + +#, perl-format +msgid "%s is replaced by %s." +msgstr "" +EOF + +: ${DIFF=diff} +${DIFF} prog.ok prog.pot || exit 1 + +tmpfiles="$tmpfiles fr.po" +cat <<\EOF > fr.po +msgid "" +msgstr "" +"Content-Type: text/plain; charset=ISO-8859-1\n" +"Plural-Forms: nplurals=2; plural=(n > 1);\n" + +msgid "'Your command, please?', asked the waiter." +msgstr "«Votre commande, s'il vous plait», dit le garçon." + +# Les gateaux allemands sont les meilleurs du monde. +#, perl-format +msgid "a piece of cake" +msgid_plural "%d pieces of cake" +msgstr[0] "un morceau de gateau" +msgstr[1] "%d morceaux de gateau" + +# Reverse the arguments. +#, perl-format +msgid "%s is replaced by %s." +msgstr "%2$s remplace %1$s." +EOF + +tmpfiles="$tmpfiles fr.po.new" +: ${MSGMERGE=msgmerge} +${MSGMERGE} -q -o fr.po.new fr.po prog.pot + +: ${DIFF=diff} +${DIFF} fr.po fr.po.new || exit 1 + +tmpfiles="$tmpfiles fr" +test -d fr || mkdir fr +test -d fr/LC_MESSAGES || mkdir fr/LC_MESSAGES + +: ${MSGFMT=msgfmt} +${MSGFMT} -o fr/LC_MESSAGES/prog.mo fr.po + +tmpfiles="$tmpfiles prog.ok prog.out" +: ${DIFF=diff} +cat <<\EOF > prog.ok +«Votre commande, s'il vous plait», dit le garçon. +2 morceaux de gateau +EUR remplace FF. +EOF + +# Test for perl with Locale::Messages package. +perl -e 'use Locale::Messages;' 2>/dev/null \ + || { rm -fr $tmpfiles; exit 77; } + +: ${LOCALE_FR=fr_FR} +LANGUAGE= LANG=$LOCALE_FR LC_MESSAGES= LC_ALL= perl program.pl > prog.out || exit 1 +${DIFF} prog.ok prog.out || exit 1 + +rm -fr $tmpfiles + +exit 0 diff --git a/gettext-tools/tests/lang-perl-2 b/gettext-tools/tests/lang-perl-2 new file mode 100755 index 000000000..7b7ebccda --- /dev/null +++ b/gettext-tools/tests/lang-perl-2 @@ -0,0 +1,101 @@ +#! /bin/sh + +# Test of gettext facilities in the Perl language, +# using bracket format strings. +# Assumes an fr_FR locale is installed. +# Assumes the following packages are installed: perl, libintl-perl. + +tmpfiles="" +trap 'rm -fr $tmpfiles' 1 2 3 15 + +tmpfiles="$tmpfiles program.pl" +cat <<\EOF > program.pl +use Locale::TextDomain (prog => './'); +my $n = 2; +print __"'Your command, please?', asked the waiter."; +print "\n" +printf __n ("a piece of cake", "%d pieces of cake", $n), $n; +print "\n" +printf __x ("[old] is replaced by [new].", old => "FF", new => "EUR"); +print "\n" +EOF + +tmpfiles="$tmpfiles prog.pot" +: ${XGETTEXT=xgettext} +${XGETTEXT} -k__ -k__n:1,2 -k__x -o prog.pot --omit-header --no-location program.pl + +tmpfiles="$tmpfiles prog.ok" +cat < prog.ok +msgid "'Your command, please?', asked the waiter." +msgstr "" + +#, perl-format +msgid "a piece of cake" +msgid_plural "%d pieces of cake" +msgstr[0] "" +msgstr[1] "" + +#, perl-bracket-format +msgid "[old] is replaced by [new]." +msgstr "" +EOF + +: ${DIFF=diff} +${DIFF} prog.ok prog.pot || exit 1 + +tmpfiles="$tmpfiles fr.po" +cat <<\EOF > fr.po +msgid "" +msgstr "" +"Content-Type: text/plain; charset=ISO-8859-1\n" +"Plural-Forms: nplurals=2; plural=(n > 1);\n" + +msgid "'Your command, please?', asked the waiter." +msgstr "«Votre commande, s'il vous plait», dit le garçon." + +# Les gateaux allemands sont les meilleurs du monde. +#, perl-format +msgid "a piece of cake" +msgid_plural "%d pieces of cake" +msgstr[0] "un morceau de gateau" +msgstr[1] "%d morceaux de gateau" + +# Reverse the arguments. +#, perl-bracket-format +msgid "[old] is replaced by [new]." +msgstr "[new] remplace [old]." +EOF + +tmpfiles="$tmpfiles fr.po.new" +: ${MSGMERGE=msgmerge} +${MSGMERGE} -q -o fr.po.new fr.po prog.pot + +: ${DIFF=diff} +${DIFF} fr.po fr.po.new || exit 1 + +tmpfiles="$tmpfiles fr" +test -d fr || mkdir fr +test -d fr/LC_MESSAGES || mkdir fr/LC_MESSAGES + +: ${MSGFMT=msgfmt} +${MSGFMT} -o fr/LC_MESSAGES/prog.mo fr.po + +tmpfiles="$tmpfiles prog.ok prog.out" +: ${DIFF=diff} +cat <<\EOF > prog.ok +«Votre commande, s'il vous plait», dit le garçon. +2 morceaux de gateau +EUR remplace FF. +EOF + +# Test for perl with libintl-perl package. +perl -M'Locale::TextDomain' -e '' 2>/dev/null \ + || { rm -fr $tmpfiles; exit 77; } + +: ${LOCALE_FR=fr_FR} +LANGUAGE= LANG=$LOCALE_FR LC_MESSAGES= LC_ALL= perl program.pl > prog.out || exit 1 +${DIFF} prog.ok prog.out || exit 1 + +rm -fr $tmpfiles + +exit 0 diff --git a/gettext-tools/tests/xgettext-26 b/gettext-tools/tests/xgettext-26 index 13f68e012..1c5f6a412 100755 --- a/gettext-tools/tests/xgettext-26 +++ b/gettext-tools/tests/xgettext-26 @@ -60,6 +60,71 @@ First here document. PERL Second here document. PERL + +# These are not invalid interpolations, because the dollar is backslashed. +printf "%s\n", gettext "abc\$def"; +printf "%s\n", gettext "abc\\\$def"; + +# These are not interpolations. +printf "%s\n", gettext 'abc$defg'; +printf "%s\n", gettext 'abc\$defg'; +printf "%s\n", gettext 'abc\\$defg'; +printf "%s\n", gettext 'abc\\\$defg'; + +# Two consecutive backslashes count as one inside single-quote strings. +printf "%s\n", gettext 'ecs\tasy'; +printf "%s\n", gettext 'ecs\\tasy'; +printf "%s\n", gettext 'ecs\\\tasy'; +printf "%s\n", gettext 'ecs\\\\tasy'; +printf "%s\n", gettext 'ecs\\\\\tasy'; +printf "%s\n", gettext q(ecs\tasy); +printf "%s\n", gettext q(ecs\\tasy); +printf "%s\n", gettext q(ecs\\\tasy); +printf "%s\n", gettext q(ecs\\\\tasy); +printf "%s\n", gettext q(ecs\\\\\tasy); + +# Similarly, inside double-quote strings, two consecutive backslashes count +# as one, but the last backslash of a sequence is combined with the following +# character if possible. +printf "%s\n", gettext "ecs\tasy"; +printf "%s\n", gettext "ecs\\tasy"; +printf "%s\n", gettext "ecs\\\tasy"; +printf "%s\n", gettext "ecs\\\\tasy"; +printf "%s\n", gettext "ecs\\\\\tasy"; +printf "%s\n", gettext qq(ecs\tasy); +printf "%s\n", gettext qq(ecs\\tasy); +printf "%s\n", gettext qq(ecs\\\tasy); +printf "%s\n", gettext qq(ecs\\\\tasy); +printf "%s\n", gettext qq(ecs\\\\\tasy); +printf "%s\n", gettext "mari\huana"; +printf "%s\n", gettext "mari\\huana"; +printf "%s\n", gettext "mari\\\huana"; +printf "%s\n", gettext "mari\\\\huana"; +printf "%s\n", gettext "mari\\\\\huana"; +printf "%s\n", gettext qq(mari\huana); +printf "%s\n", gettext qq(mari\\huana); +printf "%s\n", gettext qq(mari\\\huana); +printf "%s\n", gettext qq(mari\\\\huana); +printf "%s\n", gettext qq(mari\\\\\huana); + +# Recognition of format strings. +gettext "This is [only] a bracketed formatstring."; +gettext "This is %s [mixed]."; +gettext "This is only %c."; +gettext "This is nothing at all."; +gettext "And this is %l also no format at all."; + +# xgettext: no-perl-format, perl-bracket-format +gettext "The function '[func]' expects '%c' here."; + +# This is a contradictory case: The same string three times, +# with different xgettext comments. +# xgettext: perl-bracket-format, no-perl-format +gettext "Left as an %exercise to [maintainer]."; +# xgettext: no-perl-bracket-format, perl-format +gettext "Left as an %exercise to [maintainer]."; +# No xgettext comment this time. +gettext "Left as an %exercise to [maintainer]."; __END__ gettext "Discarded!"; EOF @@ -70,7 +135,7 @@ ${XGETTEXT} --omit-header -n -k_ -k%__ -k\$__ xg-test26.pl -d xg-test26 test $? = 0 || { rm -fr $tmpfiles; exit 1; } tmpfiles="$tmpfiles xg-test26.ok" -cat < xg-test26.ok +cat <<\EOF > xg-test26.ok #: xg-test26.pl:9 msgid "'Your command, please?', asked the waiter." msgstr "" @@ -137,6 +202,101 @@ msgstr "" #: xg-test26.pl:52 msgid "Second here document.\n" msgstr "" + +#: xg-test26.pl:56 +msgid "abc$def" +msgstr "" + +#: xg-test26.pl:57 +msgid "abc\\$def" +msgstr "" + +#: xg-test26.pl:60 +msgid "abc$defg" +msgstr "" + +#: xg-test26.pl:61 +msgid "abc\\$defg" +msgstr "" + +#: xg-test26.pl:62 +msgid "abc\\\\$defg" +msgstr "" + +#: xg-test26.pl:63 +msgid "abc\\\\\\$defg" +msgstr "" + +#: xg-test26.pl:66 xg-test26.pl:67 xg-test26.pl:71 xg-test26.pl:72 +#: xg-test26.pl:81 xg-test26.pl:86 +msgid "ecs\\tasy" +msgstr "" + +#: xg-test26.pl:68 xg-test26.pl:69 xg-test26.pl:73 xg-test26.pl:74 +#: xg-test26.pl:83 xg-test26.pl:88 +msgid "ecs\\\\tasy" +msgstr "" + +#: xg-test26.pl:70 xg-test26.pl:75 +msgid "ecs\\\\\\tasy" +msgstr "" + +#: xg-test26.pl:80 xg-test26.pl:85 +msgid "ecs\tasy" +msgstr "" + +#: xg-test26.pl:82 xg-test26.pl:87 +msgid "ecs\\\tasy" +msgstr "" + +#: xg-test26.pl:84 xg-test26.pl:89 +msgid "ecs\\\\\tasy" +msgstr "" + +#: xg-test26.pl:90 xg-test26.pl:95 +msgid "marihuana" +msgstr "" + +#: xg-test26.pl:91 xg-test26.pl:92 xg-test26.pl:96 xg-test26.pl:97 +msgid "mari\\huana" +msgstr "" + +#: xg-test26.pl:93 xg-test26.pl:94 xg-test26.pl:98 xg-test26.pl:99 +msgid "mari\\\\huana" +msgstr "" + +#: xg-test26.pl:102 +#, perl-bracket-format +msgid "This is [only] a bracketed formatstring." +msgstr "" + +#: xg-test26.pl:103 +#, perl-format, perl-bracket-format +msgid "This is %s [mixed]." +msgstr "" + +#: xg-test26.pl:104 +#, perl-format +msgid "This is only %c." +msgstr "" + +#: xg-test26.pl:105 +msgid "This is nothing at all." +msgstr "" + +#: xg-test26.pl:106 +msgid "And this is %l also no format at all." +msgstr "" + +#: xg-test26.pl:109 +#, no-perl-format, perl-bracket-format +msgid "The function '[func]' expects '%c' here." +msgstr "" + +#: xg-test26.pl:114 xg-test26.pl:116 xg-test26.pl:118 +#, perl-format, no-perl-bracket-format +msgid "Left as an %exercise to [maintainer]." +msgstr "" EOF : ${DIFF=diff}