From: Bruno Haible Date: Thu, 12 Jun 2003 12:09:38 +0000 (+0000) Subject: xgettext support for Perl. Contributed by Guido Flohr. X-Git-Tag: v0.13~457 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=12e548eacbf642e25a8a6a060cccffc8e9eaecf3;p=thirdparty%2Fgettext.git xgettext support for Perl. Contributed by Guido Flohr. --- diff --git a/gettext-tools/doc/ChangeLog b/gettext-tools/doc/ChangeLog index 1d87cd228..b74d4a973 100644 --- a/gettext-tools/doc/ChangeLog +++ b/gettext-tools/doc/ChangeLog @@ -1,3 +1,8 @@ +2003-06-11 Guido Flohr + + * gettext.texi (Perl): Extend and update. + * xgettext.texi: Mention language Perl. + 2003-06-10 Stepan Kasal * gettext.texi: Fix a few typos. diff --git a/gettext-tools/doc/gettext.texi b/gettext-tools/doc/gettext.texi index 69237bd89..08de720b7 100644 --- a/gettext-tools/doc/gettext.texi +++ b/gettext-tools/doc/gettext.texi @@ -338,6 +338,7 @@ The Translator's View * ycp-format:: YCP Format Strings * tcl-format:: Tcl Format Strings * php-format:: PHP Format Strings +* perl-format:: Perl Format Strings Individual Programming Languages @@ -7196,6 +7197,7 @@ strings. * ycp-format:: YCP Format Strings * tcl-format:: Tcl Format Strings * php-format:: PHP Format Strings +* perl-format:: Perl Format Strings @end menu @node c-format, python-format, Translators for other Languages, Translators for other Languages @@ -7288,13 +7290,20 @@ or a nonzero digit (@samp{1} to @samp{9}). Tcl format strings are described in the @file{format.n} manual page, @uref{http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm}. -@node php-format, , tcl-format, Translators for other Languages +@node php-format, perl-format, tcl-format, Translators for other Languages @subsection PHP Format Strings PHP format strings are described in the documentation of the PHP function @code{sprintf}, in @file{phpdoc/manual/function.sprintf.html} or @uref{http://www.php.net/manual/en/function.sprintf.php}. +@node perl-format, , php-format, Translators for other Languages +@subsection Perl Format Strings + +Perl format strings are described in the @file{Locale::TextDomain} +manual page, or in +@uref{http://let.imperia.org/howto/en/perl-i18n/}. + @node Maintainers for other Languages, List of Programming Languages, Translators for other Languages, Programming Languages @section The Maintainer's View @@ -8241,19 +8250,46 @@ argument is given. @table @asis @item RPMs -perl, perl-gettext +--- @item File extension -@code{pl}, @code{PL} +@code{pl}, @code{PL}, @code{pm}, @code{cgi} @item String syntax -@code{"abc"} +@itemize @bullet + +@item @code{"abc"} + +@item @code{'abc'} + +@item @code{qq (abc)} + +@item @code{q (abc)} + +@item @code{qr /abc/} + +@item @code{qx (/bin/date)} + +@item @code{/pattern match/} + +@item @code{?pattern match?} + +@item @code{s/substitution/operators/} + +@item @code{$tied_hash@{"message"@}} + +@item @code{$tied_hash_reference->@{"message"@}} + +@item etc., issue the command @code{perldoc perlsyn} for details + +@end itemize @item gettext shorthand ---- +@code{__} (double underscore) @item gettext/ngettext functions -@code{gettext}, @code{dgettext}, @code{dcgettext} +@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext}, +@code{dngettext}, @code{dcngettext} @item textdomain @code{textdomain} function @@ -8261,29 +8297,665 @@ perl, perl-gettext @item bindtextdomain @code{bindtextdomain} function +@item bind_textdomain_codeset +@code{bind_textdomain_codeset} function + @item setlocale Use @code{setlocale (LC_ALL, "");} @item Prerequisite @code{use POSIX;} -@*@code{use Locale::gettext;} +@*@code{use Locale::TextDomain;} (included in the package libintl-perl +which is available on the Comprehensive Perl Authors Network CPAN, +http://www.cpan.org/). @item Use or emulate GNU gettext -use +emulate @item Extractor -? +@code{xgettext -k__ -k\$__ -k%__ -k__x -kN__ -k} @item Formatting with positions ---- +@code{__x("error opening '[file]': [err]", file => $file, err => $!);} @item Portability -? +The @code{libintl-perl} package is platform-independent but does +not belong to the Perl core. The programmer is responsible for +providing a dummy implementation of the require functions if the +package is not installed on the target system. @item po-mode marking ---- +yes + +@item Documentation +In-depth documentation can be found at +http://let.imperia.org/howto/en/perl-18n/. + @end table +@cindex marking Perl sources + +The @code{xgettext} parser backend for Perl differs significantly from +the parser backends for other programming languages, just as Perl +itself differs significantly from other programming languages. The +Perl parser backend offers many more string marking facilities than +the other backends but it also has some Perl-specific limitations, the +worst probably being its imperfectness. + +@menu +* General Problems:: General Problems Parsing Perl Code +* Default Keywords:: Which Keywords Will xgettext Look For? +* Special Keywords:: How to Extract Hash Keys? +* Quote-like Expressions:: What are Strings And Quote-like Expressions? +* Interpolation I:: Illegal String Interpolation +* Interpolation II:: Legal String Interpolation +* Parentheses:: When To Use Parentheses +* Long Lines:: How To Grok with Long Lines +* Perl Pitfalls:: Bugs, Pitfalls, and Things That Do Not Work +@end menu + +@node General Problems, Default Keywords, , Perl +@subsubsection General Problems Parsing Perl Code + +It is often heard that only Perl can parse Perl. This is not true. +Perl cannot be @emph{parsed} at all, it can only be @emph{executed}. +Perl has various built-in ambiguities that can only be resolved at runtime. + +The following example may illustrate one common problem: + +@example +print gettext "Hello World!"; +@end example + +Although this example looks like a bullet-proof case of a function +invocation, it is not: + +@example +open gettext, ">testfile" or die; +print gettext "Hello world!" +@end example + +In this context, the string @code{gettext} looks more like a +file handle. But not necessarily: + +@example +use Locale::Messages; +open gettext ">testfile" or die; +print gettext "Hello world!"; +@end example + +Now, the file is probably syntactically incorrect, provided that the module +@code{Locale::Messages} found first in the Perl include path exports a +function @code{gettext}. But what if the module +@code{Locale::Messages} really looks like this? + +@example +use vars qw (*gettext); + +1; +@end example + +In this case, the string @code{gettext} will be interpreted as a file +handle again, and the above example will create a file @file{testfile} +and write the string ``Hello world!'' into it. Even advanced +control flow analysis will not really help: + +@example +if (0.5 < rand) @{ + eval "use Sane"; +@} else @{ + eval "use InSane"; +@} +print gettext "Hello world!"; +@end example + +If the module @code{Sane} exports a function @code{gettext} that does +what we expect, and the module @code{InSane} opens a file for writing +and associates the @emph{handle} @code{gettext} with this output +stream, we are clueless again about what will happen at runtime. It is +completely unpredictable. The truth is that Perl has so many ways to +fill its symbol table at runtime that it is impossible to interpret a +particular piece of code without executing it. + +Of course, @code{xgettext} will not execute your Perl sources while +scanning for translatable strings, but rather use heuristics in order +to guess what you meant. + +Another problem is the ambiguity of the slash and the question mark. +Their interpretation depends on the context: + +@example +# A pattern match. +print "OK\n" if /foobar/; + +# A division. +print 1 / 2; + +# Another pattern match. +print "OK\n" if ?foobar?; + +# Conditional. +print $x ? "foo" : "bar"; +@end example + +The slash may either act as the division operator or introduce a +pattern match, whereas the question mark may act as the ternary +question mark operator or as a pattern match, too. Other programming +languages like awk arise similar problems, but the consequences of a +misinterpretation are particularly nasty with Perl sources. In awk +for instance, a statement can never exceed one line and the parser +can recover from a parsing error at the next newline and interpret +the rest of the input stream correctly. Perl is different, as a +pattern match is terminated by the next appearance of the delimiter +(the slash or the question mark) in the input stream, regardless of +the semantic context. If a slash is really a division sign but +mis-interpreted as a pattern match, the rest of the input file is most +probably parsed incorrectly. + +If you find that @code{xgettext} fails to extract strings from +portions of your sources, you should therefore look out for slashes +and/or question marks preceding these sections. You may have come +across a bug in the Perl parser (and of course you should report that bug). + +@node Default Keywords, Special Keywords, General Problems, Perl +@subsubsection Which keywords will xgettext look for? +@cindex Perl default keywords + +Unless you instruct @code{xgettext} otherwise by invoking it with one +of the options @code{--keyword} or @code{-k}, it will recognize the +following keywords in your Perl sources: + +@itemize @bullet + +@item @code{gettext} + +@item @code{dgettext} + +@item @code{dcgettext} + +@item @code{ngettext:1,2} + +The first (singular) and the second (plural) argument will be +extracted. + +@item @code{dngettext:1,2} + +The first (singular) and the second (plural) argument will be +extracted. + +@item @code{dcngettext:1,2} + +The first (singular) and the second (plural) argument will be +extracted. + +@item @code{gettext_noop} + +@item @code{%gettext} + +The keys of lookups into the hash @code{%gettext} will be extracted. + +@item @code{$gettext} + +The keys of lookups into the hash reference @code{$gettext} will be extracted. + +@end itemize + +@node Special Keywords, Quote-like Expressions, Default Keywords, Perl +@subsubsection How to Extract Hash Keys? +@cindex Perl special keywords for hash-lookups + +Translating messages at runtime is normally performed by looking up the +original string in the translation database and returning the +translated version. The ``natural'' Perl implementation is a hash +lookup, and, of course, @code{xgettext} supports such practice. + +@example +print __"Hello world!"; +print $__@{"Hello world!"@}; +print $__->@{"Hello world!"@}; +print $$__@{"Hello world!"@}; +@end example + +The above four lines all do the same thing. The Perl module +@code{Locale::TextDomain} exports by default a hash @code{%__} that +is tied to the function @code{__()}. It also exports a reference +@code{$__} to @code{%__}. + +If an argument to the @code{xgettext} option @code{--keyword}, +resp. @code{-k} starts with a percent sign, the rest of the keyword is +interpreted as the name of a hash. If it starts with a dollar +sign, the rest of the keyword is interpreted as a reference to a +hash. + +Note that you can omit the quotation marks (single or double) around +the hash key (almost) whenever Perl itself allows it: + +@example +print $gettext@{Error@}; +@end example + +The exact rule is: You can omit the surrounding quotes, when the hash +key is a valid C (!) identifier, i. e. when it starts with an +underscore or a US-ASCII letter and is followed by an arbitrary number +of underscores, US-ASCII letters or digits. Other Unicode characters +are @emph{not} allowed, regardless of the @code{use utf8} pragma. + +@node Quote-like Expressions, Interpolation I, Special Keywords, Perl +@subsubsection What are Strings And Quote-like Expressions? +@cindex Perl quote-like expressions + +Perl offers a plethora of different string constructs. Those that can +be used either as arguments to functions or inside braces for hash +lookups are generally supported by @code{xgettext}. + +@itemize @bullet +@item @strong{double-quoted strings} +@* +@example +print gettext "Hello World!"; +@end example + +@item @strong{single-quoted strings} +@* +@example +print gettext 'Hello World!'; +@end example + +@item @strong{the operator qq} +@* +@example +print gettext qq |Hello World!|; +print gettext qq >; +@end example + +The operator @code{qq} is fully supported. You can use arbitrary +delimiters, including the four bracketing delimiters (round, angle, +square, curly) that nest. + +@item @strong{the operator q} +@* +@example +print gettext q |Hello World!|; +print gettext q >; +@end example + +The operator @code{q} is fully supported. You can use arbitrary +delimiters, including the four bracketing delimiters (round, angle, +square, curly) that nest. + +@item @strong{the operator qx} +@* +@example +print gettext qx ;LANGUAGE=C /bin/date; +print gettext qx [/usr/bin/ls | grep '^[A-Z]*']; +@end example + +The operator @code{qx} is fully supported. You can use arbitrary +delimiters, including the four bracketing delimiters (round, angle, +square, curly) that nest. + +Use this feature with care! If you use it, you lose control of the +external code executed from within your Perl script to your translator! + +@item @strong{here documents} +@group +@* +@example +print ngettext <My Homepage + +EOF +@end example + +If you exaggerate with constructs like the above, you will run the +risk that the translators of your package will look out for a less +challenging project. You should consider an alternative expression: + +@example +print <$gettext@{"My Homepage"@} + +EOF +@end example + +The resulting PO file will begrudgingly improve in terms of +readability. + +You can interpolate hash lookups in all strings or quote-like +expressions that are subject to interpolation (see the manual page +@code{perlop} for details). Double interpolation is illegal, however: + +@example +# TRANSLATORS: Replace "the earth" with the name of your planet. +print gettext qq@{Welcome to $gettext->@{"the earth"@}@}; +@end example + +The quoted string is recognized as an argument to @code{xgettext} in +the first place, and checked for illegal variable interpolation. The +dollar sign will therefore terminate the parser with an illegal +interpolation error. + +It is legal to interpolate hash lookups in regular expressions: + +@example +if ($var =~ /$gettext@{"the earth"@}/) @{ + print gettext "Match!\n"; +@} +s/$gettext@{"U. S. A."@}/$gettext@{"U. S. A."@} $gettext@{"(dial +0)"@}/g; +@end example + +@node Parentheses, Long Lines, Interpolation II, Perl +@subsubsection When To Use Parentheses +@cindex Perl parentheses + +In Perl, parentheses around function arguments are mostly optional. +The Perl backend for @code{xgettext} will always assume that all +recognized keywords (except for hashs and hash references) are names +of properly prototyped functions, and will (hopefully) only require +parentheses where Perl itself requires them. All constructs in the +following example are therefore ok to use: + +@example +@group +print gettext ("Hello World!\n"); +print gettext "Hello World!\n"; +print dgettext ($package => "Hello World!\n"); +print dgettext $package, "Hello World!\n"; + +# The "fat comma" => turns the left-hand side argument into a +# single-quoted string! +print dgettext smellovision => "Hello World!\n"; + +# The following assignment only works with prototyped functions. +# Otherwise, the functions will act as "greedy" list operators and +# eat up all following arguments. +my $anonymous_hash = @{ + planet => gettext "earth", + cakes => ngettext "one cake", "several cakes", $n, + still => $works, +@}; +# The same w/o fat comma: +my $other_hash = @{ + 'planet', gettext "earth", + 'cakes', ngettext "one cake", "several cakes", $n, + 'still', $works, +@}; + +# Parentheses are only significant for the first argument. +print dngettext 'package', ("one cake", "several cakes", $n), $discarded; +@end group +@end example + +@node Long Lines, Perl Pitfalls, Parentheses, Perl +@subsubsection How To Grok with Long Lines +@cindex Perl long lines + +The necessity of long messages can often lead to a cumbersome or +unreadable coding style. Perl has several options that may prevent +you from writing unreadable code, and the Perl backend for +@code{xgettext} does its best to do likewise. This is where the dot +operator (the string concatenation operator) may come in handy: + +@example +@group +print gettext ("This is a very long" + . " message that is still" + . " readable, because" + . " it is split into" + . " multiple lines.\n"); +@end group +@end example + +Perl is smart enough to concatenate these constant string fragments +into one long string at compile time, and so is the Perl backend to +@code{xgettext}. You will only find one long message in the resulting +PO file. + +If embedded newline characters are not an issue, or even desired, you +may also insert newline characters inside quoted strings wherever you +feel like it: + +@example +@group +print gettext ("In HTML output +embedded newlines are generally no +problem, since adjacent whitespace +is always rendered into a single +space character."); +@end group +@end example + +You may also consider to use here documents: + +@example +@group +print gettext <In HTML output +embedded newlines are generally no +problem, since adjacent whitespace +is always rendered into a single +space character. +EOF +@end group +@end example + +Please do not forget, that the line-breaks are real, i. e. they +translate into newline characters that will consequently show up in +the resulting PO file. + +@node Perl Pitfalls, , Long Lines, Perl +@subsubsection Bugs, Pitfalls, And Things That Do Not Work +@cindex Perl pitfalls + +The foregoing sections should have proven that the Perl backend for +@code{xgettext} is quite smart in extracting translatable strings from +Perl sources. Yet, some more or less exotic constructs that could be +expected to work, actually do not work. + +One of the more relevant limitations can be found in the +implementation of variable interpolation inside quoted strings. Only +simple hash lookups can be used there: + +@example +print </gettext ("Sunday")/e; +@end example + +The modifier @code{e} will cause the substitution to be interpreted as +an evaluable statement. Consequently, at runtime the function +@code{gettext()} is called, but again, the parser fails to extract the +string ``Sunday''. Use a temporary variable as a simple workaround if +you really happen to need this feature: + +@example +my $sunday = gettext "Sunday"; +s//$sunday/; +@end example + +Hash slices would also be handy but are not recognized: + +@example +my @@weekdays = @@gettext@{'Sunday', 'Monday', 'Tuesday', 'Wednesday', + 'Thursday', 'Friday', 'Saturday'@}; +# Or even: +@@weekdays = @@gettext@{qw (Sunday Monday Tuesday Wednesday Thursday + Friday Saturday) @}; +@end example + +This is perfectly legal usage of the tied hash @code{%gettext} but the +strings are not recognized and therefore will not be extracted. + +Another caveat of the current version is its rudimentary support for +non-US-ASCII characters in identifiers. You may encounter seriouos +problems if you use identifiers with characters outside the range of +'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'. + +Maybe some of these missing features will be implemented in future +versions, but since you can always make do without them at minimal effort, +these todos have very low priority. + @node PHP, Pike, Perl, List of Programming Languages @subsection PHP Hypertext Preprocessor @cindex PHP diff --git a/gettext-tools/doc/xgettext.texi b/gettext-tools/doc/xgettext.texi index 959456296..a709f072c 100644 --- a/gettext-tools/doc/xgettext.texi +++ b/gettext-tools/doc/xgettext.texi @@ -72,7 +72,7 @@ Specifies the language of the input files. The supported languages are @code{C}, @code{C++}, @code{ObjectiveC}, @code{PO}, @code{Python}, @code{Lisp}, @code{EmacsLisp}, @code{librep}, @code{Smalltalk}, @code{Java}, @code{JavaProperties}, @code{awk}, @code{YCP}, @code{Tcl}, @code{PHP}, -@code{RST}, @code{Glade}. +@code{RST}, @code{Glade}, @code{Perl}. @item -C @itemx --c++ diff --git a/gettext-tools/po/ChangeLog b/gettext-tools/po/ChangeLog index 549c6ebd4..e6ff7867d 100644 --- a/gettext-tools/po/ChangeLog +++ b/gettext-tools/po/ChangeLog @@ -1,3 +1,7 @@ +2003-06-11 Guido Flohr + + * POTFILES.in: Add src/format-perl.c and src/x-perl.c. + 2003-06-11 Bruno Haible * de.po: Update from Karl Eichwalder . diff --git a/gettext-tools/po/POTFILES.in b/gettext-tools/po/POTFILES.in index 787a1d162..936c36ae4 100644 --- a/gettext-tools/po/POTFILES.in +++ b/gettext-tools/po/POTFILES.in @@ -31,6 +31,7 @@ src/format-java.c src/format-librep.c src/format-lisp.c src/format-pascal.c +src/format-perl.c src/format-php.c src/format-python.c src/format-tcl.c @@ -40,6 +41,7 @@ src/msgattrib.c src/msgcat.c src/msgcmp.c src/msgcomm.c +src/format-perl.c src/msgconv.c src/msgen.c src/msgexec.c @@ -73,6 +75,7 @@ src/x-elisp.c src/x-glade.c src/x-librep.c src/x-lisp.c +src/x-perl.c src/x-php.c src/x-po.c src/x-python.c diff --git a/gettext-tools/src/ChangeLog b/gettext-tools/src/ChangeLog index cf763ada9..9f6753a5f 100644 --- a/gettext-tools/src/ChangeLog +++ b/gettext-tools/src/ChangeLog @@ -1,4 +1,21 @@ -2003-05-08 Bruno Haible +2003-06-11 Guido Flohr + + * message.h (enum format_type): New enum value 'format_perl'. + (NFORMATS): Increment. + * message.c (format_language, format_language_pretty): Add entry + for perl. + * format.h (formatstring_perl): New declaration. + * format-perl.c: New file. + * format.c (formatstring_parsers): Add entry for perl. + * x-perl.h: New file. + * x-perl.c: New file. + * xgettext.c: Include x-perl.h. + (main): Call x_perl_extract_all, x_perl_keyword. + (usage): Mention Perl language. + (language_to_extractor): Add Perl rule. + (extension_to_language): Add Perl rule. + +2003-06-08 Bruno Haible * Makefile.vms (LDADD): Take options from .opt files. (libiconv.opt): New rule. diff --git a/gettext-tools/src/format-perl.c b/gettext-tools/src/format-perl.c new file mode 100644 index 000000000..fb48cdd12 --- /dev/null +++ b/gettext-tools/src/format-perl.c @@ -0,0 +1,327 @@ +/* Perl format strings. + Copyright (C) 2002-2003 Free Software Foundation, Inc. + Written by Guido Flohr , 2003. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software Foundation, + Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ + +#ifdef HAVE_CONFIG_H +# include +#endif +#include + +#include +#include +#include + +#include "format.h" +#include "xmalloc.h" +#include "error.h" +#include "progname.h" +#include "gettext.h" +#include "hash.h" + +#define _(str) gettext (str) + +/* Perl format strings are currently quite simple. They consist of + place-holders embedded in the string. + + messageFormatPattern := string ("[" messageFormatElement "]" string)* + + messageFormatElement := [_A-Za-z][_0-9A-Za-z]+ + + However, C format strings are also allowed and used. The following + functions are therefore decorators for the C format checker, and + will only fall back to Perl format if the C check is negative. */ +struct spec +{ + unsigned int directives; + hash_table hash; + void* c_format; +}; + +static void* +format_parse (const char* string, char** invalid_reason) +{ + char* last_pos = (char*) string; + char* pos; + struct spec* spec; + void* c_format = formatstring_c.parse (string, invalid_reason); + + if (c_format == NULL) + return NULL; + + spec = xcalloc (1, sizeof (struct spec)); + + spec->c_format = c_format; + + init_hash (&spec->hash, 13); + while ((pos = strchr (last_pos, '['))) { + char* start = pos + 1; + last_pos = start; + + if (!(*last_pos == '_' + || (*last_pos >= 'A' && *last_pos <= 'Z') + || (*last_pos >= 'a' && *last_pos <= 'z'))) + continue; + + ++last_pos; + + while (*last_pos == '_' + || (*last_pos >= '0' && *last_pos <= '9') + || (*last_pos >= 'A' && *last_pos <= 'Z') + || (*last_pos >= 'a' && *last_pos <= 'z')) + ++last_pos; + + if (*last_pos == ']') + { + size_t len = last_pos - start; + int* hits; + + if (0 == find_entry (&spec->hash, start, len, (void**) &hits)) + { + ++(*hits); + } + else + { + hits = xmalloc (sizeof *hits); + *hits = 1; + insert_entry (&spec->hash, start, len, hits); + } + ++last_pos; + ++spec->directives; + } + } + + return spec; +} + +static void +format_free (void* description) +{ + void* ptr = NULL; + const void* key; + size_t keylen; + void* data; + + struct spec* spec = (struct spec*) description; + + if (spec != NULL) + { + while (0 == iterate_table (&spec->hash, &ptr, &key, &keylen, &data)) + free (data); + + delete_hash (&spec->hash); + + if (spec->c_format) + formatstring_c.free (spec->c_format); + + free (spec); + } +} + +static int +format_get_number_of_directives (void* description) +{ + int c_directives = 0; + + struct spec* spec = (struct spec*) description; + if (spec->c_format) + c_directives = formatstring_c.get_number_of_directives (spec->c_format); + + return c_directives + spec->directives; +} + +static bool +format_check (const lex_pos_ty* pos, void* msgid_descr, void* msgstr_descr, + bool equality, bool noisy, const char* pretty_msgstr) +{ + struct spec* spec1 = (struct spec *) msgid_descr; + struct spec* spec2 = (struct spec *) msgstr_descr; + bool result = false; + void* ptr = NULL; + const void* key; + size_t keylen; + int* hits1; + int* hits2; + + /* First check the perl arguments. We are probably faster than format-c. */ + if (equality) + { + /* Pass 1: Check that every format specification in msgid has its + counterpart in msgstr. This is only necessary for equality. */ + while (0 == iterate_table (&spec1->hash, &ptr, &key, + &keylen, (void**) &hits1)) + { + if (0 == find_entry (&spec2->hash, key, keylen, (void**) &hits2)) + { + if (*hits1 != *hits2) + { + result = true; + if (noisy) + { + char* argname = alloca (keylen + 1); + memcpy (argname, key, keylen); + argname[keylen] = 0; + + /* The next message shows the limitations of ngettext. + It is not smart enough to say "once", "twice", + "thrice/%d times", "%d times", ..., and it cannot + grok with two variable arguments. We have to + work around the problem. */ + error_with_progname = false; + error_at_line (0, 0, pos->file_name, pos->line_number, + _("\ +appearances of named argument '[%s]' do not match \ +(%d in original string, %d in '%s')"), + argname, *hits1, *hits2, pretty_msgstr); + error_with_progname = true; + } + else + return true; + } + } + else + { + result = true; + if (noisy) + { + char* argname = alloca (keylen + 1); + memcpy (argname, key, keylen); + argname[keylen] = 0; + + error_with_progname = false; + error_at_line (0, 0, pos->file_name, pos->line_number, + _("\ +named argument '[%s]' appears in original string but not in '%s'"), + argname, pretty_msgstr); + error_with_progname = true; + } + else + return true; + } + } + } + + /* Pass 2: Check that the number of format specifications in msgstr + does not exceed the number of appearances in msgid. */ + ptr = NULL; + while (0 == iterate_table (&spec2->hash, &ptr, &key, &keylen, + (void**) &hits2)) + { + if (0 != find_entry (&spec1->hash, key, keylen, (void**) &hits1)) + { + result = true; + if (noisy) + { + char* argname = alloca (keylen + 1); + memcpy (argname, key, keylen); + argname[keylen] = 0; + + error_with_progname = false; + error_at_line (0, 0, pos->file_name, pos->line_number, + _("\ +named argument '[%s]' appears only in '%s' but not in the original string"), + argname, pretty_msgstr); + error_with_progname = true; + } + else + return true; + } + } + + if (spec1->c_format && spec2->c_format) + { + result |= formatstring_c.check (pos, spec1->c_format, spec2->c_format, + equality, noisy, pretty_msgstr); + } + + return result; +} + +struct formatstring_parser formatstring_perl = + { + format_parse, + format_free, + format_get_number_of_directives, + format_check + }; + +#ifdef TEST + +/* Test program: Print the argument list specification returned by + format_parse for strings read from standard input. */ + +#include +#include "getline.h" + +static void +format_print (descr, line) + void* descr; + const char* line; +{ + struct spec *spec = (struct spec *) descr; + char* ptr = NULL; + char* key = NULL; + size_t keylen; + int data; + + printf ("%s=> ", line); + + if (spec == NULL) + { + printf ("INVALID\n"); + return; + } + + while (iterate_table (&spec->hash, (void**) &ptr, (const void**) &key, + &keylen, (void**) &data) == 0) + { + key[keylen - 1] = '\0'; + printf (">>>[%s]<<< ", key); + } + + printf ("\n"); +} + +int +main () +{ + char *line = NULL; + size_t line_len = 0; + void *descr; + + for (;;) + { + if (getline (&line, &line_len, stdin) < 0) + break; + + descr = format_parse (line); + + format_print (descr, line); + + format_free (descr); + } + + return 0; +} + +/* + * For Emacs M-x compile + * Local Variables: + * compile-command: "/bin/sh ../libtool --mode=link gcc -o a.out -static -O -g -Wall -I.. -I../lib -I../intl -DHAVE_CONFIG_H -DTEST format-perl.c ../lib/libgettextlib.la" + * End: + */ + +#endif /* TEST */ diff --git a/gettext-tools/src/format.c b/gettext-tools/src/format.c index 00660a780..02e25216c 100644 --- a/gettext-tools/src/format.c +++ b/gettext-tools/src/format.c @@ -34,6 +34,7 @@ struct formatstring_parser *formatstring_parsers[NFORMATS] = /* format_smalltalk */ &formatstring_smalltalk, /* format_java */ &formatstring_java, /* format_awk */ &formatstring_awk, + /* format_perl */ &formatstring_perl, /* format_pascal */ &formatstring_pascal, /* format_ycp */ &formatstring_ycp, /* format_tcl */ &formatstring_tcl, diff --git a/gettext-tools/src/format.h b/gettext-tools/src/format.h index f2857215c..bcc2ab3ca 100644 --- a/gettext-tools/src/format.h +++ b/gettext-tools/src/format.h @@ -63,6 +63,7 @@ extern struct formatstring_parser formatstring_librep; extern struct formatstring_parser formatstring_smalltalk; extern struct formatstring_parser formatstring_java; extern struct formatstring_parser formatstring_awk; +extern struct formatstring_parser formatstring_perl; extern struct formatstring_parser formatstring_pascal; extern struct formatstring_parser formatstring_ycp; extern struct formatstring_parser formatstring_tcl; diff --git a/gettext-tools/src/message.c b/gettext-tools/src/message.c index 6306bc18c..2d1a86070 100644 --- a/gettext-tools/src/message.c +++ b/gettext-tools/src/message.c @@ -42,6 +42,7 @@ const char *const format_language[NFORMATS] = /* format_smalltalk */ "smalltalk", /* format_java */ "java", /* format_awk */ "awk", + /* format_perl */ "perl", /* format_pascal */ "object-pascal", /* format_ycp */ "ycp", /* format_tcl */ "tcl", @@ -58,6 +59,7 @@ const char *const format_language_pretty[NFORMATS] = /* format_smalltalk */ "Smalltalk", /* format_java */ "Java", /* format_awk */ "awk", + /* format_perl */ "Perl", /* format_pascal */ "Object Pascal", /* format_ycp */ "YCP", /* format_tcl */ "Tcl", diff --git a/gettext-tools/src/message.h b/gettext-tools/src/message.h index 393ede215..5a89f0adf 100644 --- a/gettext-tools/src/message.h +++ b/gettext-tools/src/message.h @@ -45,9 +45,10 @@ enum format_type format_pascal, format_ycp, format_tcl, - format_php + format_php, + format_perl }; -#define NFORMATS 12 /* Number of format_type enum values. */ +#define NFORMATS 13 /* Number of format_type enum values. */ extern const char *const format_language[NFORMATS]; extern const char *const format_language_pretty[NFORMATS]; diff --git a/gettext-tools/src/x-perl.c b/gettext-tools/src/x-perl.c new file mode 100644 index 000000000..e1be7c57d --- /dev/null +++ b/gettext-tools/src/x-perl.c @@ -0,0 +1,3152 @@ +/* xgettext Perl backend. + Copyright (C) 2002-2003 Free Software Foundation, Inc. + + This file was written by Guido Flohr , 2002-2003. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software Foundation, + Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ + +#ifdef HAVE_CONFIG_H +# include "config.h" +#endif + +#include +#include +#include +#include +#include + +#include "message.h" +#include "x-perl.h" +#include "xgettext.h" +#include "error.h" +#include "progname.h" +#include "xmalloc.h" +#include "exit.h" +#include "gettext.h" +#include "ucs4-utf8.h" +#include "uniname.h" + +#define _(s) gettext(s) + +#if HAVE_C_BACKSLASH_A +# define ALERT_CHAR '\a' +#else +# define ALERT_CHAR '\7' +#endif + +#define ever (;;) + +/* The Perl syntax is defined in perlsyn.pod. Try the command + "perldoc perlsyn". */ + +#define DEBUG_PERL 0 +#define DEBUG_MEMORY 0 + +/* FIXME: All known Perl operators should be listed here. It does not + cost that much and it may improve the stability of the parser. */ +enum token_type_ty +{ + token_type_eof, + token_type_lparen, /* ( */ + token_type_rparen, /* ) */ + token_type_comma, /* , */ + token_type_fat_comma, /* => */ + token_type_dereference, /* , */ + token_type_semicolon, /* ; */ + token_type_lbrace, /* { */ + token_type_rbrace, /* } */ + token_type_lbracket, /* [ */ + token_type_rbracket, /* ] */ + token_type_string, /* quote-like */ + token_type_named_op, /* if, unless, while, ... */ + token_type_variable, /* $... */ + token_type_symbol, /* symbol, number */ + token_type_regex_op, /* s, tr, y, m. */ + token_type_keyword_symbol, /* keyword symbol (used by parser) */ + token_type_dot, /* . */ + token_type_other /* regexp, misc. operator */ +}; +typedef enum token_type_ty token_type_ty; + +/* Subtypes for strings, important for interpolation. */ +enum string_type_ty +{ + string_type_verbatim, /* "<<'EOF'", "m'...'", "s'...''...'", + "tr/.../.../", "y/.../.../". */ + string_type_q, /* "'..'", "q/.../". */ + string_type_qq, /* '"..."', "`...`", "qq/.../", "qx/.../", + "". */ + string_type_qr, /* Not supported. */ +}; +typedef enum string_type_ty string_type_ty; + +typedef struct token_ty token_ty; +struct token_ty +{ + token_type_ty type; + string_type_ty string_type; + char *string; /* for token_type_{symbol,string} */ + int line_number; +}; + +static token_ty token_buf[2]; + +#if DEBUG_PERL +static const char* +token2string (token) + token_ty* token; +{ + switch (token->type) + { + case token_type_eof: + return "token_type_eof"; + case token_type_lparen: + return "token_type_lparen"; + case token_type_rparen: + return "token_type_rparen"; + case token_type_comma: + return "token_type_comma"; + case token_type_fat_comma: + return "token_type_fat_comma"; + case token_type_dereference: + return "token_type_dereference"; + case token_type_semicolon: + return "token_type_semicolon"; + case token_type_lbrace: + return "token_type_lbrace"; + case token_type_rbrace: + return "token_type_rbrace"; + case token_type_lbracket: + return "token_type_lbracket"; + case token_type_rbracket: + return "token_type_rbracket"; + case token_type_string: + return "token_type_string"; + case token_type_named_op: + return "token_type_named_op"; + case token_type_variable: + return "token_type_variable"; + case token_type_symbol: + return "token_type_symbol"; + case token_type_regex_op: + return "token_type_regex_op"; + case token_type_keyword_symbol: + return "token_type_keyword_symbol"; + case token_type_dot: + return "token_type_dot"; + case token_type_other: + return "token_type_other"; + default: + return "unknown"; + } +} +#endif + +struct stack_entry +{ + struct stack_entry* next; + struct stack_entry* prev; + void* data; + void (*destroy) PARAMS ((token_ty* data)); +}; + +struct stack +{ + struct stack_entry* first; + struct stack_entry* last; +}; + +struct stack* token_stack; + +/* Prototypes for local functions. Needed to ensure compiler checking of + function argument counts despite of K&R C function definition syntax. */ +static void init_keywords PARAMS ((void)); +static int phase1_getc PARAMS ((void)); +static void phase1_ungetc PARAMS ((int c)); +static char* get_here_document PARAMS ((const char* delimiter)); +static void skip_pod PARAMS ((void)); +static int phase2_getc PARAMS ((void)); +static void phase2_ungetc PARAMS ((int c)); +static inline void free_token PARAMS ((token_ty *tp)); +static void extract_variable PARAMS ((message_list_ty* mlp, + token_ty *tp, int c)); +static void interpolate_keywords PARAMS ((message_list_ty* mlp, + const char* string)); +static void extract_quotelike PARAMS ((token_ty *tp, int delim)); +static void extract_triple_quotelike PARAMS ((message_list_ty* mlp, + token_ty *tp, int delim, + bool interpolate)); +static char* extract_quotelike_pass1 PARAMS ((int delim)); +static void extract_quotelike_pass3 PARAMS ((token_ty* tp, int error_level)); +static char* extract_hex PARAMS ((char* string, size_t len, + unsigned int* result)); +static char* extract_oct PARAMS ((char* string, size_t len, + unsigned int* result)); +static token_ty* x_perl_lex PARAMS ((message_list_ty* mlp)); +static void x_perl_prelex PARAMS ((message_list_ty* mlp, token_ty* tp)); +static void x_perl_unlex PARAMS ((token_ty* tp)); +static char* collect_message PARAMS ((message_list_ty* mlp, token_ty* tp, + int error_level)); +static bool extract_balanced PARAMS ((message_list_ty *mlp, + int arg_sg, int arg_pl, + int state, token_type_ty delim)); +static void stack_push PARAMS ((struct stack* stack, void* data, + void (*destroy) PARAMS ((token_ty* data)))); +static void stack_unshift PARAMS ((struct stack* stack, void* data, + void (*destroy) PARAMS ((token_ty* data)))); +static void* stack_pop PARAMS ((struct stack* stack)); +static void* stack_shift PARAMS ((struct stack* stack)); +static void* stack_head PARAMS ((struct stack* stack)); +static void stack_free PARAMS ((struct stack* stack)); + +#if DEBUG_MEMORY +static void* xmalloc_debug PARAMS ((size_t bytes)); +static void* xrealloc_debug PARAMS ((void* buf, size_t bytes)); +static void* xrealloc_static_debug PARAMS ((void* buf, size_t bytes)); +static void* xcalloc_debug PARAMS ((size_t nitems, size_t bytes)); +static char* xstrdup_debug PARAMS ((const char* string)); +static void free_debug PARAMS ((void* buf)); +static message_ty *remember_a_message_debug PARAMS ((message_list_ty *mlp, + char *string, + lex_pos_ty *pos)); +static void remember_a_message_plural_debug PARAMS ((message_ty *mp, + char *string, + lex_pos_ty *pos)); + + +static message_ty* +remember_a_message_debug (mlp, string, pos) + message_list_ty* mlp; + char* string; + lex_pos_ty* pos; +{ + void* retval; + + fprintf (stderr, "*** remember_a_message (%p): ", string); fflush (stderr); + retval = remember_a_message (mlp, string, pos); + fprintf (stderr, "%p\n", retval); fflush (stderr); + return retval; +} + +static void +remember_a_message_plural_debug (mp, string, pos) + message_ty* mp; + char* string; + lex_pos_ty* pos; +{ + fprintf (stderr, "*** remember_a_message_plural (%p, %p): ", mp, string); + fflush (stderr); + remember_a_message_plural (mp, string, pos); + fprintf (stderr, "done\n"); fflush (stderr); +} + +static void* +xmalloc_debug (bytes) + size_t bytes; +{ + void* retval; + + fprintf (stderr, "*** xmalloc (%u): ", bytes); fflush (stderr); + retval = xmalloc (bytes); + fprintf (stderr, "%p\n", retval); fflush (stderr); + return retval; +} + +static void* +xrealloc_debug (buf, bytes) + void* buf; + size_t bytes; +{ + void* retval; + + fprintf (stderr, "*** xrealloc (%p, %u): ", buf, bytes); fflush (stderr); + retval = xrealloc (buf, bytes); + fprintf (stderr, "%p\n", retval); fflush (stderr); + return retval; +} + +static void* +xrealloc_static_debug (buf, bytes) + void* buf; + size_t bytes; +{ + void* retval; + + fprintf (stderr, "*** xrealloc_static (%p, %u): ", buf, bytes); + fflush (stderr); + retval = xrealloc (buf, bytes); + fprintf (stderr, "%p\n", retval); fflush (stderr); + return retval; +} + +static void* +xcalloc_debug (nitems, bytes) + size_t nitems; + size_t bytes; +{ + void* retval; + + fprintf (stderr, "*** xcalloc (%u, %u): ", nitems, bytes); fflush (stderr); + retval = xcalloc (nitems, bytes); + fprintf (stderr, "%p\n", retval); fflush (stderr); + return retval; +} + +static char* +xstrdup_debug (string) + const char* string; +{ + char* retval; + + fprintf (stderr, "*** xstrdup (%p, %d): ", string, strlen (string)); + fflush (stderr); + retval = xstrdup (string); + fprintf (stderr, "%p\n", retval); fflush (stderr); + return retval; +} + +static void +free_debug (buf) + void* buf; +{ + fprintf (stderr, "*** free (%p): ", buf); fflush (stderr); + free (buf); + fprintf (stderr, "done\n"); fflush (stderr); +} + +# define xmalloc(b) xmalloc_debug (b) +# define xrealloc(b, s) xrealloc_debug (b, s) +# define xcalloc(n, b) xcalloc_debug (n, b) +# define xstrdup(s) xstrdup_debug (s) +# define free(b) free_debug (b) + +# define xrealloc_static(b, s) xrealloc_static_debug (b, s) + +#define remember_a_message(m, s, p) remember_a_message_debug (m, s, p) +#define remember_a_message_plural(m, s, p) \ + remember_a_message_plural_debug (m, s, p) + +#else +# define xrealloc_static(b, s) xrealloc (b, s) +#endif + +#if DEBUG_PERL +/* Dumps all resources allocated by stack STACK. */ +static int +stack_dump (stack) + struct stack* stack; +{ + struct stack_entry* last = stack->last; + + fprintf (stderr, "BEGIN STACK DUMP\n"); + while (last) + { + struct stack_entry* next = last->prev; + + if (last->data) + { + token_ty* token = (token_ty*) last->data; + fprintf (stderr, " [%s]\n", token2string (token)); + fflush (stderr); + switch (token->type) + { + case token_type_named_op: + case token_type_string: + case token_type_symbol: + case token_type_keyword_symbol: + case token_type_variable: + fprintf (stderr, " string: %s\n", token->string); + fflush (stderr); + break; + } + } + last = next; + } + fprintf (stderr, "END STACK DUMP\n"); + return 1; +} +#endif + +/* Pushes the pointer DATA onto the stack STACK. The argument DESTROY + * is a pointer to a function that frees the resources associated with + * DATA or NULL (no destructor). + */ +static void +stack_push (stack, data, destroy) + struct stack* stack; + void* data; + void (*destroy) PARAMS ((token_ty* data)); +{ + struct stack_entry* entry = xmalloc (sizeof (struct stack_entry)); + + if (stack->last == NULL) { + stack->first = entry; + } else { + stack->last->next = entry; + } + + entry->prev = stack->last; + entry->next = NULL; + entry->data = data; + entry->destroy = destroy; + stack->last = entry; +} + +/* Unshifts the pointer DATA onto the stack STACK. The argument DESTROY + * is a pointer to a function that frees the resources associated with + * DATA or NULL (no destructor). + */ +static void +stack_unshift (stack, data, destroy) + struct stack* stack; + void* data; + void (*destroy) PARAMS ((token_ty* data)); +{ + struct stack_entry* entry = xmalloc (sizeof (struct stack_entry)); + + if (stack->first == NULL) + { + stack->last = entry; + } + else + { + stack->first->prev = entry; + } + + entry->next = stack->first; + entry->prev = NULL; + entry->data = data; + entry->destroy = destroy; + + stack->first = entry; +} + +/* Pops the last element from the stack STACK and returns its contents or + * NULL if the stack is empty. + */ +static void* +stack_pop (stack) + struct stack* stack; +{ + struct stack_entry* entry = stack->last; + struct stack_entry* last; + void* data; + + if (!entry) + return NULL; + + last = entry->prev; + if (last) { + stack->last = last; + last->next = NULL; + } else { + stack->first = stack->last = NULL; + } + + data = entry->data; + free (entry); + + return data; +} + +/* Shifts the first element from the stack STACK and returns its contents or + * NULL if the stack is empty. + */ +static void* +stack_shift (stack) + struct stack* stack; +{ + struct stack_entry* entry = stack->first; + void* data; + + if (!entry) + return NULL; + + stack->first = entry->next; + if (!stack->first) + stack->last = NULL; + else + stack->first->prev = NULL; + + data = entry->data; + free (entry); + + return data; +} + +/* Return the bottom of the stack without removing it from the stack or + * NULL if the stack is empty. + */ +static void* +stack_head (stack) + struct stack* stack; +{ + struct stack_entry* entry = stack->first; + struct stack_entry* last; + void* data; + + if (!entry) + return NULL; + + data = entry->data; + + return data; +} + +/* Frees all resources allocated by stack STACK. */ +static void +stack_free (stack) + struct stack* stack; +{ + struct stack_entry* last = stack->last; + + while (last) { + struct stack_entry* next = last->prev; + if (last->data && last->destroy) { + last->destroy (last->data); + } + free (last); + last = next; + } +} + +/* ====================== Keyword set customization. ====================== */ + +/* If true extract all strings. */ +static bool extract_all = false; + +static hash_table keywords; +static bool default_keywords = true; + + +void +x_perl_extract_all () +{ + extract_all = true; +} + + +void +x_perl_keyword (name) + const char *name; +{ + if (name == NULL) + default_keywords = false; + else + { + const char *end; + int argnum1; + int argnum2; + const char *colon; + + if (keywords.table == NULL) + init_hash (&keywords, 100); + + split_keywordspec (name, &end, &argnum1, &argnum2); + + /* The characters between name and end should form a valid C identifier. + A colon means an invalid parse in split_keywordspec(). */ + colon = strchr (name, ':'); + if (colon == NULL || colon >= end) + { + if (argnum1 == 0) + argnum1 = 1; + insert_entry (&keywords, name, end - name, + (void *) (long) (argnum1 + (argnum2 << 10))); + } + } +} + +/* Finish initializing the keywords hash table. + Called after argument processing, before each file is processed. */ +static void +init_keywords () +{ + if (default_keywords) + { + x_perl_keyword ("gettext"); + x_perl_keyword ("%gettext"); + x_perl_keyword ("$gettext"); + x_perl_keyword ("dgettext:2"); + x_perl_keyword ("dcgettext:2"); + x_perl_keyword ("ngettext:1,2"); + x_perl_keyword ("dngettext:2,3"); + x_perl_keyword ("dcngettext:2,3"); + x_perl_keyword ("gettext_noop"); + default_keywords = false; + } +} + + +/* ================== Reading of characters and tokens. =================== */ + +/* Real filename, used in error messages about the input file. */ +static const char *real_file_name; + +/* Logical filename and line number, used to label the extracted messages. */ +static char *logical_file_name; +static int line_number; + +/* The input file stream. */ +static FILE *fp; + +/* These are for tracking whether comments count as immediately before + keyword. */ +static int last_comment_line; +static int last_non_comment_line; + +/* The current line buffer. */ +char* linebuf; + +/* The size of the current line. */ +int linesize; + +/* The position in the current line. */ +int linepos; + +/* The size of the input buffer. */ +size_t linebuf_size; + +/* The last token seen in the token stream. This is important for the + interpretation of '?' and '/'. */ +token_type_ty last_token; + +/* The last string token waiting for a dot operator or finishing. */ +token_ty last_string; + +/* True if LAST_STRING is finished. */ +bool last_string_finished; + +/* Number of lines eaten for here documents. */ +int here_eaten; + +/* Paranoia: EOF marker for __END__ or __DATA__. */ +bool end_of_file; + +/* 1. line_number handling. */ +/* Returns the next character from the input stream or EOF. */ +static int +phase1_getc () +{ + line_number += here_eaten; + here_eaten = 0; + + if (end_of_file) + return EOF; + + if (linepos >= linesize) + { + linesize = getline (&linebuf, &linebuf_size, fp); + + if (linesize == EOF) + { + if (ferror (fp)) + error (EXIT_FAILURE, errno, _("error while reading \"%s\""), + real_file_name); + end_of_file = true; + return EOF; + } + + linepos = 0; + ++line_number; + + /* Undosify. This is important for catching the end of <= 2 && linebuf[linesize - 1] == '\n' + && linebuf[linesize - 2] == '\r') + { + linebuf[linesize - 2] = '\n'; + linebuf[linesize - 1] = '\0'; + --linesize; + } + } + + return linebuf[linepos++]; +} + +static void +phase1_ungetc (c) + int c; +{ + if (c != EOF) + { + if (linepos == 0) + error (EXIT_FAILURE, 0, _("\ +%s:%d: internal error: attempt to ungetc across line boundary"), + real_file_name, line_number); + + --linepos; + } +} + +static char* +get_here_document (delimiter) + const char* delimiter; +{ + static char* buffer; + static size_t bufmax = 0; + size_t bufpos = 0; + static char* my_linebuf = NULL; + static size_t my_linebuf_size = 0; + bool chomp = false; + + if (bufpos >= bufmax) + { + buffer = xrealloc_static (NULL, 1); + buffer[0] = '\0'; + bufmax = 1; + } + + for ever + { + int read_bytes = getline (&my_linebuf, &my_linebuf_size, fp); + + if (read_bytes == EOF) + { + if (ferror (fp)) + { + error (EXIT_FAILURE, errno, _("error while reading \"%s\""), + real_file_name); + } + else + { + error_with_progname = false; + error (EXIT_SUCCESS, 0, _("\ +%s:%d: can\'t find string terminator \"%s\" anywhere before EOF"), + real_file_name, line_number, delimiter); + error_with_progname = true; + fflush (stderr); + + return xstrdup (buffer); + } + } + + ++here_eaten; + + /* Undosify. This is important for catching the end of <= 2 && my_linebuf[read_bytes - 1] == '\n' + && my_linebuf[read_bytes - 2] == '\r') + { + my_linebuf[read_bytes - 2] = '\n'; + my_linebuf[read_bytes - 1] = '\0'; + --read_bytes; + } + + if (read_bytes && my_linebuf[read_bytes - 1] == '\n') + { + chomp = true; + my_linebuf[read_bytes - 1] = '\0'; + } + if (0 == strcmp (my_linebuf, delimiter)) + { + return xstrdup (buffer); + } + if (chomp) + { + my_linebuf[read_bytes - 1] = '\n'; + } + + if (bufpos + read_bytes + 1 >= bufmax) + { + bufmax += read_bytes + 1; + buffer = xrealloc_static (buffer, bufmax); + } + strcpy (buffer + bufpos, my_linebuf); + bufpos += read_bytes; + } +} + +/* Skips pod sections. */ +static void +skip_pod () +{ + line_number += here_eaten; + here_eaten = 0; + linepos = 0; + + for ever + { + linesize = getline (&linebuf, &linebuf_size, fp); + + if (linesize == EOF) + { + if (ferror (fp)) + error (EXIT_FAILURE, errno, _("error while reading \"%s\""), + real_file_name); + return; + } + + ++line_number; + + if (strncmp ("=cut", linebuf, 4) == 0) + { + /* Force reading of a new line on next call to phase1_getc(). */ + linepos = linesize; + return; + } + } +} + +/* 2. Replace each comment that is not inside a string literal or regular + expression with a newline character. We need to remember the comment + for later, because it may be attached to a keyword string. */ + +static int +phase2_getc () +{ + static char *buffer; + static size_t bufmax; + size_t buflen; + int lineno; + int c; + + c = phase1_getc (); + if (c == '#') + { + buflen = 0; + lineno = line_number; + /* Skip leading whitespace. */ + for ever + { + c = phase1_getc (); + if (c == EOF) + break; + if (c != ' ' && c != '\t' && c != '\r' && c != '\f') + { + phase1_ungetc (c); + break; + } + } + for ever + { + c = phase1_getc (); + if (c == '\n' || c == EOF) + break; + if (buflen >= bufmax) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + buffer[buflen++] = c; + } + if (buflen >= bufmax) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + buffer[buflen] = '\0'; + xgettext_comment_add (buffer); + last_comment_line = lineno; + } + return c; +} + +static void +phase2_ungetc (c) + int c; +{ + if (c != EOF) + phase1_ungetc (c); +} + +/* There is an ambiguity about '/': It can start a division operator ('/' or + '/=') or it can start a regular expression. The distinction is important + because inside regular expressions, '#' and '"' lose its special meanings. + If you look at the awk grammar, you see that the operator is only allowed + right after a 'variable' or 'simp_exp' nonterminal, and these nonterminals + can only end in the NAME, LENGTH, YSTRING, YNUMBER, ')', ']' terminals. + So we prefer the division operator interpretation only right after + symbol, string, number, ')', ']', with whitespace but no newline allowed + in between. */ +static bool prefer_division_over_regexp; + +/* Free the memory pointed to by a 'struct token_ty'. */ +static inline void +free_token (tp) + token_ty *tp; +{ + switch (tp->type) + { + case token_type_named_op: + case token_type_string: + case token_type_symbol: + case token_type_keyword_symbol: + case token_type_variable: + free (tp->string); + break; + default: + break; + } + free (tp); +} + +/* Extract an unsigned hexadecimal number from STRING, considering at + most LEN bytes and place the result in RESULT. Returns a pointer + to the first character past the hexadecimal number. */ +static char* +extract_hex (string, len, result) + char* string; + size_t len; + unsigned int* result; +{ + size_t i; + + *result = 0; + + for (i = 0; i < len; ++i) + { + int number; + + if (string[i] >= 'A' && string[i] <= 'F') + number = 10 + string[i] - 'A'; + else if (string[i] >= 'a' && string[i] <= 'f') + number = 10 + string[i] - 'a'; + else if (string[i] >= '0' && string[i] <= '9') + number = string[i] - '0'; + else + break; + + *result <<= 4; + *result |= number; + } + + return string + i; +} + +/* Extract an unsigned octal number from STRING, considering at + most LEN bytes and place the result in RESULT. Returns a pointer + to the first character past the hexadecimal number. */ +static char* +extract_oct (string, len, result) + char* string; + size_t len; + unsigned int* result; +{ + size_t i; + + *result = 0; + + for (i = 0; i < len; ++i) + { + int number; + + if (string[i] >= '0' && string[i] <= '7') + number = string[i] - '0'; + else + break; + + *result <<= 3; + *result |= number; + } + + return string + i; +} + +/* Extract the various quotelike constructs except for <type = token_type_string; + + string[strlen (string) - 1] = '\0'; + tp->string = xstrdup (string + 1); + free (string); + return; +} + +/* Extract the quotelike constructs with double delimiters, like + s/[SEARCH]/[REPLACE]/. This function does not eat up trailing + modifiers (left to the caller). */ +static void +extract_triple_quotelike (mlp, tp, delim, interpolate) + message_list_ty* mlp; + token_ty* tp; + int delim; + bool interpolate; +{ + char* string = extract_quotelike_pass1 (delim); + + tp->type = token_type_regex_op; + if (interpolate && !extract_all && delim != '\'') + interpolate_keywords (mlp, string); + + free (string); + + if (delim == '(' || delim == '<' || delim == '{' || delim == '[') + { + /* Things can change. */ + delim = phase1_getc (); + while (delim == ' ' || delim == '\t' || delim == '\r' + || delim == '\n' || delim == '\f') + { + /* The hash-sign is not a valid delimiter after whitespace, ergo + use phase2_getc() and not phase1_getc() now. */ + delim = phase2_getc (); + } + } + string = extract_quotelike_pass1 (delim); + if (interpolate && !extract_all && delim != '\'') + interpolate_keywords (mlp, string); + free (string); + + return; +} + +/* Pass 1 of extracting quotes: Find the end of the string, regardless + of the semantics of the construct. */ +static char* +extract_quotelike_pass1 (delim) + int delim; +{ + /* This function is called recursively. No way to allocate stuff + statically. Consider using alloca() instead. */ + char *buffer = xmalloc (100); + int bufmax = 100; + int bufpos = 0; + bool nested = true; + int counter_delim; + + buffer[bufpos++] = delim; + + /* Find the closing delimiter. */ + switch (delim) + { + case '(': + counter_delim = ')'; + break; + case '{': + counter_delim = '}'; + break; + case '[': + counter_delim = ']'; + break; + case '<': + counter_delim = '>'; + break; + default: + nested = false; + counter_delim = delim; + break; + } + + for ever + { + int c = phase1_getc (); + + if (bufpos >= bufmax - 1) + { + bufmax += 100; + buffer = xrealloc (buffer, bufmax); + } + + if (c == counter_delim || c == EOF) + { + /* Copying the EOF (actually 255) is not an error. It will + be stripped off later. */ + buffer[bufpos++] = c; + buffer[bufpos++] = '\0'; +#if DEBUG_PERL + fprintf (stderr, "PASS1: %s\n", buffer); +#endif + return buffer; + } + + if (nested && c == delim) + { + char* inner = extract_quotelike_pass1 (delim); + size_t len = strlen (inner); + + if (bufpos + len >= bufmax) + { + bufmax += len; + buffer = xrealloc (buffer, bufmax); + } + strcpy (buffer + bufpos, inner); + free (inner); + bufpos += len; + continue; + } + + if (c == '\\') + { + c = phase1_getc (); + if (c == '\\') + { + buffer[bufpos++] = '\\'; + buffer[bufpos++] = '\\'; + } + else if (c == delim || c == counter_delim) + { + /* This is pass2 in Perl. */ + buffer[bufpos++] = c; + } + else + { + buffer[bufpos++] = '\\'; + phase1_ungetc (c); + } + } + else + { + buffer[bufpos++] = c; + } + } +} + +/* Perform pass 3 of quotelike extraction (interpolation). */ +/* FIXME: Currently may writes null-bytes into the string. */ +static void +extract_quotelike_pass3 (tp, error_level) + token_ty* tp; + int error_level; +{ + static char *buffer; + static int bufmax = 0; + int bufpos = 0; + int delim = tp->string[0]; + char* string = tp->string; + unsigned char* crs = string; + + bool uppercase = false; + bool lowercase = false; + bool quotemeta = false; + +#if DEBUG_PERL + switch (tp->string_type) + { + case string_type_verbatim: + fprintf (stderr, "Interpolating string_type_verbatim:\n"); + break; + case string_type_q: + fprintf (stderr, "Interpolating string_type_q:\n"); + break; + case string_type_qq: + fprintf (stderr, "Interpolating string_type_qq:\n"); + break; + case string_type_qr: + fprintf (stderr, "Interpolating string_type_qr:\n"); + break; + } + fprintf (stderr, "%s\n", tp->string); + if (tp->string_type == string_type_verbatim) + fprintf (stderr, "---> %s\n", tp->string); +#endif + + if (tp->string_type == string_type_verbatim) + return; + + while (*crs) + { + if (bufpos >= bufmax - 6) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + + if (tp->string_type == string_type_q) + { + switch (*crs) + { + case '\\': + if (crs[1] == '\\') + { + ++crs; + buffer[bufpos++] = '\\'; + continue; + } + /* FALLTHROUGH */ + default: + buffer[bufpos++] = *crs++; + break; + } + continue; + } + + /* We only get here for double-quoted strings or regular expressions. + Unescape escape sequences. */ + if (*crs == '\\') + { + switch (crs[1]) + { + case 't': + crs += 2; + buffer[bufpos++] = '\t'; + continue; + case 'n': + crs += 2; + buffer[bufpos++] = '\n'; + continue; + case 'r': + crs += 2; + buffer[bufpos++] = '\r'; + continue; + case 'f': + crs += 2; + buffer[bufpos++] = '\f'; + continue; + case 'b': + crs += 2; + buffer[bufpos++] = '\b'; + continue; + case 'a': + crs += 2; + buffer[bufpos++] = ALERT_CHAR; + continue; + case 'e': + crs += 2; + buffer[bufpos++] = 0x1b; + continue; + case '0': case '1': case '2': case '3': + case '4': case '5': case '6': case '7': + { + unsigned int oct_number; + int length; + + crs = extract_oct (crs + 1, 3, &oct_number); + length = u8_uctomb (buffer + bufpos, oct_number, 3); + if (length > 0) + bufpos += length; + } + continue; + case 'x': + { + unsigned int hex_number = 0; + int length; + + ++crs; + + if (*crs == '{') + { + char* end = strchr (crs, '}'); + if (end == NULL) + { + error_with_progname = false; + error (error_level, 0, _("\ +%s:%d: missing right brace on \\x{HEXNUMBER}"), real_file_name, line_number); + error_with_progname = true; + ++crs; + continue; + } + else + { + ++crs; + (void) extract_hex (crs, 4, &hex_number); + } + } + else + { + crs = extract_hex (crs, 2, &hex_number); + } + + length = u8_uctomb (buffer + bufpos, hex_number, 6); + if (length > 0) + bufpos += length; + } + continue; + case 'c': + /* Perl's notion of control characters. */ + crs += 2; + if (*crs) + { + int the_char = *crs; + if (the_char >= 'a' || the_char <= 'z') + the_char -= 0x20; + buffer[bufpos++] = the_char + (the_char & 0x40 ? -64 : 64); + } + continue; + case 'N': + crs += 2; + if (*crs == '{') + { + char* name = xstrdup (crs + 1); + char* end = strchr (name, '}'); + if (end != NULL) + { + unsigned int unicode; + int length; + + *end = '\0'; + + crs += 2 + strlen (name); + unicode = unicode_name_character (name); + if (unicode != UNINAME_INVALID) + { + length = u8_uctomb (buffer + bufpos, unicode, 6); + if (length > 0) + bufpos += length; + } + } + free (name); + } + continue; + } + } + + /* No escape sequence, go on. */ + if (*crs == '\\') + { + ++crs; + switch (*crs) + { + case 'E': + quotemeta = uppercase = lowercase = false; + ++crs; + continue; + case 'L': + quotemeta = uppercase = false; + lowercase = true; + ++crs; + continue; + case 'U': + quotemeta = lowercase = false; + uppercase = true; + ++crs; + continue; + case 'Q': + uppercase = lowercase = false; + quotemeta = true; + ++crs; + continue; + case 'l': + ++crs; + if (crs[1] >= 'A' && crs[1] <= 'Z') + { + buffer[bufpos++] = crs[1] + 0x20; + ++crs; + } + else if (crs[1] >= 0x80) + { + error_with_progname = false; + error (error_level, 0, _("\ +%s:%d: illegal interpolation (\"\\l\") of 8bit character \"%c\""), + real_file_name, line_number, *crs); + error_with_progname = true; + ++crs; + } + else + ++crs; + continue; + case 'u': + ++crs; + if (crs[1] >= 'a' && crs[1] <= 'z') + { + buffer[bufpos++] = crs[1] - 0x20; + ++crs; + } + else if (crs[1] >= 0x80) + { + error_with_progname = false; + error (error_level, 0, _("\ +%s:%d: illegal interpolation (\"\\u\") of 8bit character \"%c\""), + real_file_name, line_number, *crs); + error_with_progname = true; + ++crs; + } + else + ++crs; + continue; + case '\\': + if (crs[1]) + buffer[bufpos++] = crs[1]; + crs++; + continue; + } + } + + + if (*crs == '$' || *crs == '@') + { + ++crs; + error_with_progname = false; + error (error_level, 0, _("\ +%s:%d: illegal variable interpolation"), + real_file_name, line_number, *crs); + error_with_progname = true; + } + else if (lowercase) + { + if (*crs >= 'A' && *crs <= 'Z') + buffer[bufpos++] = 0x20 + *crs++; + else if (*crs >= 0x80) + { + error_with_progname = false; + error (error_level, 0, _("\ +%s:%d:illegal interpolation (\"\\L\") of 8bit character \"%c\""), + real_file_name, line_number, *crs); + error_with_progname = true; + buffer[bufpos++] = *crs++; + } + else + buffer[bufpos++] = *crs++; + } + else if (uppercase) + { + if (*crs >= 'a' && *crs <= 'z') + buffer[bufpos++] = *crs++ - 0x20; + else if (*crs >= 0x80) + { + error_with_progname = false; + error (error_level, 0, _("\ +%s:%d: illegal interpolation (\"\\U\") of 8bit character \"%c\""), + real_file_name, line_number, *crs); + error_with_progname = true; + buffer[bufpos++] = *crs++; + } + else + buffer[bufpos++] = *crs++; + } + else if (quotemeta) + { + buffer[bufpos++] = *crs++; + } + else + { + buffer[bufpos++] = *crs++; + } + } + + if (bufpos >= bufmax - 1) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + + buffer[bufpos++] = '\0'; + +#if DEBUG_PERL + fprintf (stderr, "---> %s\n", buffer); +#endif + + free (tp->string); + tp->string = xstrdup (buffer); +} + +/* Parse a variable. This is done in several steps: + * + * 1) Consume all leading occurcencies of '$', '@', '%', and '*'. + * 2) Determine the name of the variable from the following input + * 3) Parse possible following hash keys or array indexes. + */ +static void +extract_variable (mlp, tp, first) + message_list_ty* mlp; + token_ty* tp; + int first; +{ + static char* buffer; + static int bufmax = 0; + int bufpos = 0; + int c = first; + size_t varbody_length = 0; + bool maybe_hash_deref = false; + bool maybe_hash_value = false; + + tp->type = token_type_variable; + +#if DEBUG_PERL + fprintf (stderr, "%s:%d: extracting variable type '%c'\n", + real_file_name, line_number, first); +#endif + + /* + * 1) Consume dollars and so on (not euros ...). Unconditionally + * accepting the hash sign (#) will maybe lead to inaccurate + * results. FIXME! + */ + while (c == '$' || c == '*' || c == '#' || c == '@' || c == '%') + { + if (bufpos >= bufmax) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + buffer[bufpos++] = c; + c = phase1_getc (); + } + + if (c == EOF) + { + tp->type = token_type_eof; + return; + } + + /* Hash references are treated in a special way, when looking for + our keywords. */ + if (buffer[0] == '$') + { + if (bufpos == 1) + maybe_hash_value = true; + else if (bufpos == 2 && buffer[1] == '$') + { + if (c != '{' && c != '_' && (!((c >= '0' && c <= '9') + || (c >= 'A' && c <= 'Z') + || (c >= 'a' && c <= 'z') + || c == ':' || c == '\'' + || c >= 0x80))) + { + /* Special variable $$ for pid. */ + if (bufpos >= bufmax) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + buffer[bufpos++] = '\0'; + tp->string = xstrdup (buffer); +#if DEBUG_PERL + fprintf (stderr, "%s:%d: is PID ($$)\n", + real_file_name, line_number); +#endif + + phase1_ungetc (c); + return; + } + + maybe_hash_deref = true; + bufpos = 1; + } + } + + /* + * 2) Get the name of the variable. The first character is practically + * arbitrary. Punctuation and numbers automagically put a variable + * in the global namespace but that subtle difference is not interesting + * for us. + */ + if (bufpos >= bufmax) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + if (c == '{') + { + /* Yuck, we cannot accept ${gettext} as a keyword... Except for + * debugging purposes it is also harmless, that we suppress the + * real name of the variable. + */ +#if DEBUG_PERL + fprintf (stderr, "%s:%d: braced {variable_name}\n", + real_file_name, line_number); +#endif + + if (extract_balanced (mlp, -1, -1, 0, token_type_rbrace)) + return; + buffer[bufpos++] = c; + } + else + { + while ((c >= 'A' && c <= 'Z') || + (c >= 'a' && c <= 'z') || + (c >= '0' && c <= '9') || + c == '_' || c == ':' || c == '\'' || c >= 0x80) + { + ++varbody_length; + if (bufpos >= bufmax) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + buffer[bufpos++] = c; + c = phase1_getc (); + } + phase1_ungetc (c); + } + + if (bufpos >= bufmax - 1) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + buffer[bufpos++] = '\0'; + + /* Probably some strange Perl variable like $`. */ + if (varbody_length == 0) + { + c = phase1_getc (); + if (c == EOF || c == ' ' || c == '\n' || c == '\r' + || c == '\f' || c == '\t') + phase1_ungetc (c); /* Loser. */ + else + { + buffer[bufpos - 1] = c; + buffer[++bufpos] = '\0'; + } + } + tp->string = xstrdup (buffer); + +#if DEBUG_PERL + fprintf (stderr, "%s:%d: complete variable name: %s\n", + real_file_name, line_number, tp->string); +#endif + prefer_division_over_regexp = true; + + /* + * 3) If the following looks strange to you, this is valid Perl syntax: + * + * $var = $$hashref # We can place a + * # comment here and then ... + * {key_into_hashref}; + * + * POD sections are not allowed but we leave complaints about + * that to the compiler/interpreter. + */ + /* We only extract strings from the first hash key (if present). */ + + if (maybe_hash_deref || maybe_hash_value) + { + bool is_dereference = false; + int c = phase2_getc (); + + while (c == ' ' || c == '\t' || c == '\r' || c == '\n' || c == '\f') + c = phase2_getc (); + + if (c == '-') + { + int c2 = phase1_getc (); + + if (c2 == '>') + { + is_dereference = true; + c = phase2_getc (); + while (c == ' ' || c == '\t' || c == '\r' + || c == '\n' || c == '\f') + c = phase2_getc (); + } + else if (c2 != '\n') + { + /* Discarding the newline is harmless here. The only + special character recognized after a minus is greater-than + for dereference. However, the sequence "-\n>" that we + treat incorrectly here, is a syntax error. */ + phase1_ungetc (c2); + } + } + + if (maybe_hash_value && is_dereference) + { +#if DEBUG_PERL + fprintf (stderr, "%s:%d: first keys preceded by \"->\"\n", + real_file_name, line_number); +#endif + } + else if (maybe_hash_value) + { + /* Fake it into a hash. */ + tp->string[0] = '%'; + } + + /* Do NOT change that into else if (see above). */ + if ((maybe_hash_value || maybe_hash_deref) && c == '{') + { + void *keyword_value; + +#if DEBUG_PERL + fprintf (stderr, "%s:%d: first keys preceded by '{'\n", + real_file_name, line_number); +#endif + + if (0 == find_entry (&keywords, tp->string, strlen (tp->string), + &keyword_value)) + { + /* Extract a possible string from the key. Before proceeding + we check whether the open curly is followed by a symbol and + then by a right curly. */ + token_ty* t1 = x_perl_lex (mlp); + +#if DEBUG_PERL + fprintf (stderr, "%s:%d: extracting string key\n", + real_file_name, line_number); +#endif + + if (t1->type == token_type_symbol + || t1->type == token_type_named_op) + { + token_ty* t2 = x_perl_lex (mlp); + if (t2->type == token_type_rbrace) + { + lex_pos_ty pos; + pos.line_number = line_number; + pos.file_name = logical_file_name; + remember_a_message (mlp, xstrdup (t1->string), &pos); + free_token (t2); + free_token (t1); + } + else + { + x_perl_unlex (t2); + } + } + else + { + x_perl_unlex (t1); + if (extract_balanced (mlp, 1, -1, 1, token_type_rbrace)) + return; + } + } + else + { + phase2_ungetc (c); + } + } + else + { + phase2_ungetc (c); + } + } + + /* Now consume "->", "[...]", and "{...}". */ + for ever + { + int c = phase2_getc (); + int c2; + + switch (c) + { + case '{': +#if DEBUG_PERL + fprintf (stderr, "%s:%d: extracting balanced '{' after varname\n", + real_file_name, line_number); +#endif + extract_balanced (mlp, -1, -1, 0, token_type_rbrace); + break; + case '[': +#if DEBUG_PERL + fprintf (stderr, "%s:%d: extracting balanced '[' after varname\n", + real_file_name, line_number); +#endif + extract_balanced (mlp, -1, -1, 0, token_type_rbracket); + break; + case '-': + c2 = phase1_getc (); + if (c2 == '>') + { +#if DEBUG_PERL + fprintf (stderr, "%s:%d: another \"->\" after varname\n", + real_file_name, line_number); +#endif + break; + } + else if (c2 != '\n') + { + /* Discarding the newline is harmless here. The only + special character recognized after a minus is greater-than + for dereference. However, the sequence "-\n>" that we + treat incorrectly here, is a syntax error. */ + phase1_ungetc (c2); + } + /* FALLTHROUGH */ + + default: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: variable finished\n", + real_file_name, line_number); +#endif + phase2_ungetc (c); + return; + } + } +} + +/* Actually a simplified version of extract_variable(). It searches for + * variables inside a double-quoted string that may interpolate to + * some keyword hash (reference). + */ +static void +interpolate_keywords (mlp, string) + message_list_ty* mlp; + const char* string; +{ + static char* buffer; + static int bufmax = 0; + int bufpos = 0; + int c = string[0]; + bool maybe_hash_deref = false; + enum parser_state + { + initial, + one_dollar, + two_dollars, + identifier, + minus, + wait_lbrace, + wait_quote, + dquote, + squote, + barekey, + wait_rbrace, + } state; + token_ty token; + + lex_pos_ty pos; + + /* States are: + * + * initial: initial + * one_dollar: dollar sign seen in state INITIAL + * two_dollars: another dollar-sign has been seen in state ONE_DOLLAR + * identifier: a valid identifier character has been seen in state + * ONE_DOLLAR or TWO_DOLLARS + * minus: a minus-sign has been seen in state IDENTIFIER + * wait_lbrace: a greater-than has been seen in state MINUS + * wait_quote: a left brace has been seen in state IDENTIFIER or in + * state WAIT_LBRACE + * dquote: a double-quote has been seen in state WAIT_QUOTE + * squote: a single-quote has been seen in state WAIT_QUOTE + * barekey: an bareword character has been seen in state WAIT_QUOTE + * wait_rbrace: closing quote has been seen in state DQUOTE or SQUOTE + */ + state = initial; + + token.type = token_type_string; + token.string_type = string_type_qq; + token.line_number = line_number; + pos.file_name = logical_file_name; + pos.line_number = line_number; + + while (c = *string++) + { + void *keyword_value; + + if (state == initial) + bufpos = 0; + + if (bufpos >= bufmax - 1) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + + switch (state) + { + case initial: + switch (c) + { + case '\\': + c = *string++; + if (!c) return; + break; + case '$': + buffer[bufpos++] = '$'; + maybe_hash_deref = false; + state = one_dollar; + break; + default: + break; + } + break; + case one_dollar: + switch (c) + { + case '$': + /* + * This is enough to make us believe later that we dereference + * a hash reference. + */ + maybe_hash_deref = true; + state = two_dollars; + break; + default: + if (c == '_' || c == ':' || c == '\'' || c >= 0x80 + || (c >= 'A' && c <= 'Z') + || (c >= 'a' && c <= 'z') + || (c >= '0' && c <= '9')) + { + buffer[bufpos++] = c; + state = identifier; + } + else + state = initial; + break; + } + break; + case two_dollars: + if (c == '_' || c == ':' || c == '\'' || c >= 0x80 + || (c >= 'A' && c <= 'Z') + || (c >= 'a' && c <= 'z') + || (c >= '0' && c <= '9')) + { + buffer[bufpos++] = c; + state = identifier; + break; + } + else + { + state = initial; + } + break; + case identifier: + switch (c) + { + case '-': + if (0 == find_entry (&keywords, buffer, bufpos, + &keyword_value)) + { + state = minus; + } + else + state = initial; + break; + case '{': + if (!maybe_hash_deref) + { + buffer[0] = '%'; + } + if (0 == find_entry (&keywords, buffer, bufpos, + &keyword_value)) + { + state = wait_quote; + } + else + state = initial; + break; + default: + if (c == '_' || c == ':' || c == '\'' || c >= 0x80 + || (c >= 'A' && c <= 'Z') + || (c >= 'a' && c <= 'z') + || (c >= '0' && c <= '9')) + { + buffer[bufpos++] = c; + } + else + { + state = initial; + } + break; + } + break; + case minus: + switch (c) + { + case '>': + state = wait_lbrace; + break; + default: + state = initial; + break; + } + break; + case wait_lbrace: + switch (c) + { + case '{': + state = wait_quote; + break; + default: + state = initial; + break; + } + break; + case wait_quote: + switch (c) + { + case ' ': + case '\n': + case '\t': + case '\r': + case '\f': + break; + case '\'': + bufpos = 0; + state = squote; + break; + case '"': + bufpos = 0; + state = dquote; + break; + default: + if (c == '_' || (c >= '0' && c <= '9') || c >= 0x80 + || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')) + { + state = barekey; + bufpos = 0; + buffer[bufpos++] = c; + } + else + state = initial; + break; + } + break; + case dquote: + switch (c) + { + case '"': + /* The resulting string has te be interpolated twice. */ + buffer[bufpos] = '\0'; + token.string = xstrdup (buffer); + extract_quotelike_pass3 (&token, EXIT_FAILURE); + /* The string can only shrink with interpolation (because + we ignore \Q). */ + strcpy (buffer, token.string); + free (token.string); + state = wait_rbrace; + break; + case '\\': + if (string[0] == '\"') + { + buffer[bufpos++] = string++[0]; + } + else if (string[0]) + { + buffer[bufpos++] = '\\'; + buffer[bufpos++] = string++[0]; + } + else + { + state = initial; + } + break; + default: + buffer[bufpos++] = c; + break; + } + break; + case squote: + switch (c) + { + case '\'': + state = wait_rbrace; + break; + case '\\': + if (string[0] == '\'') + { + buffer[bufpos++] = string++[0]; + } + else if (string[0]) + { + buffer[bufpos++] = '\\'; + buffer[bufpos++] = string++[0]; + } + else + { + state = initial; + } + break; + default: + buffer[bufpos++] = c; + break; + } + break; + case barekey: + { + if (c == '_' || (c >= '0' && c <= '9') || c >= 0x80 + || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')) + { + buffer[bufpos++] = c; + break; + } + else if (c == ' ' || c == '\n' || c == '\t' + || c == '\r' || c == '\f') + { + state = wait_rbrace; + break; + } + else if (c != '}') + { + state = initial; + break; + } + /* Must be right brace. */ + } + /* FALLTHROUGH */ + case wait_rbrace: + switch (c) + { + case ' ': + case '\n': + case '\t': + case '\r': + case '\f': + break; + case '}': + buffer[bufpos] = '\0'; + token.string = xstrdup (buffer); + extract_quotelike_pass3 (&token, EXIT_FAILURE); + + remember_a_message (mlp, token.string, &pos); + /* FALLTHROUGH */ + default: + state = initial; + break; + } + break; + } + } +} + +/* Combine characters into tokens. Discard whitespace. */ + +static void +x_perl_prelex (mlp, tp) + message_list_ty* mlp; + token_ty *tp; +{ + static char *buffer; + static int bufmax; + int bufpos; + int c; + + for ever + { + c = phase2_getc (); + tp->line_number = line_number; + + switch (c) + { + case EOF: + tp->type = token_type_eof; + return; + + case '\n': + if (last_non_comment_line > last_comment_line) + xgettext_comment_reset (); + /* FALLTHROUGH */ + case '\t': + case ' ': + /* Ignore whitespace. */ + continue; + + case '%': + case '@': + case '*': + case '$': + extract_variable (mlp, tp, c); + prefer_division_over_regexp = true; + return; + } + + last_non_comment_line = tp->line_number; + + switch (c) + { + case '.': + { + int c2 = phase1_getc (); + phase1_ungetc (c2); + if (c2 == '.') + { + tp->type = token_type_other; + prefer_division_over_regexp = false; + return; + } + else if (c2 >= '0' && c2 <= '9') + { + prefer_division_over_regexp = false; + } + else + { + tp->type = token_type_dot; + prefer_division_over_regexp = true; + return; + } + } + /* FALLTHROUGH */ + case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': + case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': + case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': + case 'S': case 'T': case 'U': case 'V': case 'W': case 'X': + case 'Y': case 'Z': + case '_': + case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': + case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': + case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': + case 's': case 't': case 'u': case 'v': case 'w': case 'x': + case 'y': case 'z': + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + /* Symbol, or part of a number. */ + prefer_division_over_regexp = true; + bufpos = 0; + for ever + { + int c2; + + if (bufpos >= bufmax) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + buffer[bufpos++] = c; + c = phase1_getc (); + switch (c) + { + case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': + case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': + case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': + case 'S': case 'T': case 'U': case 'V': case 'W': case 'X': + case 'Y': case 'Z': + case '_': + case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': + case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': + case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': + case 's': case 't': case 'u': case 'v': case 'w': case 'x': + case 'y': case 'z': + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + continue; + + default: + phase1_ungetc (c); + break; + } + break; + } + if (bufpos >= bufmax) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + buffer[bufpos] = '\0'; + + if (strcmp (buffer, "__END__") == 0 + || strcmp (buffer, "__DATA__") == 0) + { + end_of_file = true; + tp->type = token_type_eof; + return; + } + else if (strcmp (buffer, "and") == 0 + || strcmp (buffer, "cmp") == 0 + || strcmp (buffer, "eq") == 0 + || strcmp (buffer, "if") == 0 + || strcmp (buffer, "ge") == 0 + || strcmp (buffer, "gt") == 0 + || strcmp (buffer, "le") == 0 + || strcmp (buffer, "lt") == 0 + || strcmp (buffer, "ne") == 0 + || strcmp (buffer, "not") == 0 + || strcmp (buffer, "or") == 0 + || strcmp (buffer, "unless") == 0 + || strcmp (buffer, "while") == 0 + || strcmp (buffer, "xor") == 0) + { + tp->type = token_type_named_op; + tp->string = xstrdup (buffer); + prefer_division_over_regexp = false; + return; + } + else if (strcmp (buffer, "s") == 0 + || strcmp (buffer, "y") == 0 + || strcmp (buffer, "tr") == 0) + { + int delim = phase1_getc (); + while (delim == ' ' || delim == '\t' || delim == '\r' + || delim == '\n' || delim == '\f') + { + delim = phase2_getc (); + } + if (delim == EOF) + { + tp->type = token_type_eof; + return; + } + if ((delim >= '0' && delim <= '9') + || (delim >= 'A' && delim <= 'Z') + || (delim >= 'a' && delim <= 'z')) + { + /* False positive. */ + tp->type = token_type_symbol; + phase2_ungetc (delim); + tp->string = xstrdup (buffer); + prefer_division_over_regexp = true; + return; + } + extract_triple_quotelike (mlp, tp, delim, buffer[0] == 's'); + + /* Eat the following modifiers. */ + c = phase1_getc (); + while (c >= 'a' && c <= 'z') + c = phase1_getc (); + phase1_ungetc (c); + return; + } + else if (strcmp (buffer, "m") == 0) + { + int delim = phase1_getc (); + + while (delim == ' ' || delim == '\t' || delim == '\r' + || delim == '\n' || delim == '\f') + { + delim = phase2_getc (); + } + if (delim == EOF) + { + tp->type = token_type_eof; + return; + } + if ((delim >= '0' && delim <= '9') + || (delim >= 'A' && delim <= 'Z') + || (delim >= 'a' && delim <= 'z')) + { + /* False positive. */ + tp->type = token_type_symbol; + phase2_ungetc (delim); + tp->string = xstrdup (buffer); + prefer_division_over_regexp = true; + return; + } + extract_quotelike (tp, delim); + if (!extract_all && delim != '\'') + interpolate_keywords (mlp, tp->string); + + free (tp->string); + tp->type = token_type_regex_op; + prefer_division_over_regexp = true; + + /* Eat the following modifiers. */ + c = phase1_getc (); + while (c >= 'a' && c <= 'z') + c = phase1_getc (); + phase1_ungetc (c); + return; + } + else if (strcmp (buffer, "qq") == 0 + || strcmp (buffer, "q") == 0 + || strcmp (buffer, "qx") == 0 + || strcmp (buffer, "qw") == 0 + || strcmp (buffer, "qr") == 0) + { + /* The qw (...) construct is not really a string but we + can treat in the same manner and then pretend it is + a symbol. Rationale: Saying "qw (foo bar)" is the + same as "my @list = ('foo', 'bar'); @list;". */ + + int delim = phase1_getc (); + + while (delim == ' ' || delim == '\t' || delim == '\r' + || delim == '\n' || delim == '\f') + { + delim = phase2_getc (); + } + if (delim == EOF) + { + tp->type = token_type_eof; + return; + } + prefer_division_over_regexp = true; + + if ((delim >= '0' && delim <= '9') + || (delim >= 'A' && delim <= 'Z') + || (delim >= 'a' && delim <= 'z')) + { + /* False positive. */ + tp->type = token_type_symbol; + phase2_ungetc (delim); + tp->string = xstrdup (buffer); + prefer_division_over_regexp = true; + return; + } + + extract_quotelike (tp, delim); + + switch (buffer[1]) + { + case 'q': + case 'x': + tp->string_type = string_type_qq; + tp->type = token_type_string; + if (!extract_all) + interpolate_keywords (mlp, tp->string); + break; + case 'r': + tp->type = token_type_regex_op; + break; + case 'w': + tp->type = token_type_symbol; + break; + default: /* q\000 */ + tp->type = token_type_string; + tp->string_type = string_type_q; + break; + } + return; + } + else if (strcmp (buffer, "grep") == 0 + || strcmp (buffer, "split") == 0) + { + prefer_division_over_regexp = false; + } + tp->string = xstrdup (buffer); + + tp->type = token_type_symbol; + return; + + case '"': + prefer_division_over_regexp = true; + extract_quotelike (tp, c); + tp->string_type = string_type_qq; + if (!extract_all) + interpolate_keywords (mlp, tp->string); + return; + + case '`': + prefer_division_over_regexp = true; + extract_quotelike (tp, c); + tp->string_type = string_type_qq; + if (!extract_all) + interpolate_keywords (mlp, tp->string); + return; + + case '\'': + prefer_division_over_regexp = true; + extract_quotelike (tp, c); + tp->string_type = string_type_q; + return; + + case '(': + c = phase2_getc (); + if (c == ')') + { + continue; /* Ignore empty list. */ + } + else + phase2_ungetc (c); + tp->type = token_type_lparen; + prefer_division_over_regexp = false; + return; + + case ')': + tp->type = token_type_rparen; + prefer_division_over_regexp = true; + return; + + case '{': + tp->type = token_type_lbrace; + prefer_division_over_regexp = false; + return; + + case '}': + tp->type = token_type_rbrace; + prefer_division_over_regexp = false; + return; + + case '[': + tp->type = token_type_lbracket; + prefer_division_over_regexp = false; + return; + + case ']': + tp->type = token_type_rbracket; + prefer_division_over_regexp = false; + return; + + case ';': + tp->type = token_type_semicolon; + prefer_division_over_regexp = false; + return; + + case ',': + tp->type = token_type_comma; + prefer_division_over_regexp = false; + return; + + case '=': + /* Check for fat comma. */ + c = phase1_getc (); + if (c == '>') + { + tp->type = token_type_fat_comma; + return; + } + else if (linepos == 2 + && (last_token == token_type_semicolon + || last_token == token_type_rbrace) + && ((c >= 'A' && c <='Z') + || (c >= 'a' && c <= 'z'))) + { +#if DEBUG_PERL + fprintf (stderr, "%s:%d: start pod section\n", + real_file_name, line_number); +#endif + skip_pod (); +#if DEBUG_PERL + fprintf (stderr, "%s:%d: end pod section\n", + real_file_name, line_number); +#endif + continue; + } + phase1_ungetc (c); + tp->type = token_type_other; + prefer_division_over_regexp = false; + return; + + case '<': + /* Check for <string); + free (tp->string); + tp->string = string; + tp->type = token_type_string; + tp->string_type = string_type_verbatim; + return; + } + else if (c == '"') + { + char* string; + extract_quotelike (tp, c); + string = get_here_document (tp->string); + free (tp->string); + tp->string = string; + tp->type = token_type_string; + tp->string_type = string_type_qq; + if (!extract_all) + interpolate_keywords (mlp, tp->string); + return; + } + else if ((c >= 'A' && c <= 'Z') + || (c >= 'a' && c <= 'z') + || c == '_') + { + bufpos = 0; + while ((c >= 'A' && c <= 'Z') + || (c >= 'a' && c <= 'z') + || (c >= '0' && c <= '9') + || c == '_' || c >= 0x80) + { + if (bufpos >= bufmax) + { + bufmax += 100; + buffer = xrealloc_static (buffer, bufmax); + } + buffer[bufpos++] = c; + c = phase1_getc (); + } + if (c == EOF) + { + tp->type = token_type_eof; + return; + } + else + { + char* string; + + phase1_ungetc (c); + buffer[bufpos++] = '\0'; + string = get_here_document (buffer); + tp->string = string; + tp->type = token_type_string; + tp->string_type = string_type_qq; + if (!extract_all) + interpolate_keywords (mlp, tp->string); + return; + } + } + else + { + tp->type = token_type_other; + return; + } + } + else + { + phase1_ungetc (c); + tp->type = token_type_other; + } + return; /* End of case '>'. */ + + case '-': + /* Check for dereferencing operator. */ + c = phase1_getc (); + if (c == '>') { + tp->type = token_type_dereference; + return; + } + phase1_ungetc (c); + tp->type = token_type_other; + prefer_division_over_regexp = false; + return; + + case '/': + case '?': + if (!prefer_division_over_regexp) + { + extract_quotelike (tp, c); + if (!extract_all) + interpolate_keywords (mlp, tp->string); + free (tp->string); + tp->type = token_type_other; + prefer_division_over_regexp = true; + /* Eat the following modifiers. */ + c = phase1_getc (); + while (c >= 'a' && c <= 'z') + c = phase1_getc (); + phase1_ungetc (c); + return; + } + /* FALLTHROUGH */ + + default: + /* We could carefully recognize each of the 2 and 3 character + operators, but it is not necessary, as we only need to recognize + gettext invocations. Don't bother. */ + tp->type = token_type_other; + prefer_division_over_regexp = false; + return; + } + } +} + +static token_ty* +x_perl_lex (mlp) + message_list_ty* mlp; +{ +#if DEBUG_PERL + int dummy = stack_dump (token_stack); +#endif + token_ty* tp = stack_shift (token_stack); + + if (!tp) + { + tp = xmalloc (sizeof *tp); + x_perl_prelex (mlp, tp); +#if DEBUG_PERL + fprintf (stderr, "%s:%d: x_perl_prelex returned %s\n", + real_file_name, line_number, token2string (tp)); +#endif + } +#if DEBUG_PERL + else + { + fprintf (stderr, "%s:%d: %s recycled from stack\n", + real_file_name, line_number, token2string (tp)); + } +#endif + + /* A symbol followed by a fat comma is really a single-quoted string. */ + if (tp->type == token_type_symbol || tp->type == token_type_named_op) + { + token_ty* next = stack_head (token_stack); + + if (!next) + { +#if DEBUG_PERL + fprintf (stderr, "%s:%d: pre-fetching next token\n", + real_file_name, line_number); + fflush (stderr); +#endif + next = x_perl_lex (mlp); + x_perl_unlex (next); +#if DEBUG_PERL + fprintf (stderr, "%s:%d: unshifted next token\n", + real_file_name, line_number); +#endif + } + +#if DEBUG_PERL + fprintf (stderr, "%s:%d: next token is %s\n", + real_file_name, line_number, token2string (next)); +#endif + + if (next->type == token_type_fat_comma) + { + tp->type = token_type_string; + tp->string_type = string_type_q; +#if DEBUG_PERL + fprintf (stderr, + "%s:%d: token %s mutated to token_type_string\n", + real_file_name, line_number, token2string (tp)); +#endif + } + } + + return tp; +} + +static void +x_perl_unlex (tp) + token_ty* tp; +{ + stack_unshift (token_stack, tp, free_token); +} + +/* ========================= Extracting strings. ========================== */ + +static char* +collect_message (mlp, tp, error_level) + message_list_ty* mlp; + token_ty* tp; + int error_level; +{ + char* string; + size_t len; + + extract_quotelike_pass3 (tp, error_level); + string = xstrdup (tp->string); + len = strlen (tp->string) + 1; + + for ever + { + int c = phase2_getc (); + while (c == ' ' || c == '\t' || c == '\r' || c == '\n' || c == '\f') + c = phase2_getc (); + if (c != '.') + { + phase2_ungetc (c); + return string; + } + + c = phase2_getc (); + while (c == ' ' || c == '\t' || c == '\r' || c == '\n' || c == '\f') + c = phase2_getc (); + phase2_ungetc (c); + + if (c == '"' || c == '\'' || c == '`' + || (!prefer_division_over_regexp && (c == '/' || c == '?')) + || c == 'q') + { + token_ty* qstring = x_perl_lex (mlp); + if (qstring->type != token_type_string) + { + /* assert (qstring->type == token_type_symbol) */ + x_perl_unlex (qstring); + return string; + } + + extract_quotelike_pass3 (qstring, error_level); + len += strlen (qstring->string); + string = xrealloc (string, len); + strcat (string, qstring->string); + free_token (qstring); + } + } +} + +/* The file is broken into tokens. Scan the token stream, looking for + a keyword, followed by a left paren, followed by a string. When we + see this sequence, we have something to remember. We assume we are + looking at a valid C or C++ program, and leave the complaints about + the grammar to the compiler. + + Normal handling: Look for + keyword ( ... msgid ... ) + Plural handling: Look for + keyword ( ... msgid ... msgid_plural ... ) + + We use recursion because the arguments before msgid or between msgid + and msgid_plural can contain subexpressions of the same form. */ + + +/* Extract messages until the next balanced closing parenthesis. + Extracted messages are added to MLP. + + When specific arguments shall be extracted, ARG_SG and ARG_PL are + set to the corresponding argument number or -1 if not applicable. + + Returns the number of requested arguments consumed or -1 for eof. + If - instead of consuming requested arguments - a complete message + has been extracted, the return value will be sufficiently high to + avoid any mis-interpretation. + + States are: + + 0 - initial state + 1 - keyword has been seen + 2 - extractable string has been seen + 3 - a dot operator after an extractable string has been seen + + States 2 and 3 are "fragile", the parser will remain in state 2 + as long as only opening parentheses are seen, a transition to + state 3 is done on appearance of a dot operator, all other tokens + will cause the parser to fall back to state 1 or 0, eventually + with an error message about illegal intermixing of constant and + non-constant strings. + + Likewise, state 3 is fragile. The parser will remain in state 3 + as long as only closing parentheses are seen, a transition to state + 2 is done on appearance of another (literal!) string, all other + tokens will cause a warning. */ +static bool +extract_balanced (mlp, arg_sg, arg_pl, state, delim) + message_list_ty *mlp; + int arg_sg, arg_pl; + int state; + token_type_ty delim; +{ + /* Remember the message containing the msgid, for msgid_plural. */ + message_ty *plural_mp = NULL; + + /* The current argument for a possibly extracted keyword. Counting + starts with 1. */ + int arg_count = 1; + + /* Number of left parentheses seen. */ + int paren_seen = 0; + + /* The current token. */ + token_ty* tp = NULL; + + token_type_ty last_token = token_type_eof; + +#if DEBUG_PERL + static int nesting_level = 0; + + ++nesting_level; +#endif + + for ever + { + int my_last_token = last_token; + + if (tp) + free_token (tp); + + tp = x_perl_lex (mlp); + + last_token = tp->type; + + if (delim == tp->type) + { +#if DEBUG_PERL + fprintf (stderr, "%s:%d: extract_balanced finished (%d)\n", + logical_file_name, tp->line_number, --nesting_level); +#endif + free_token (tp); + return false; + } + + switch (tp->type) + { + case token_type_symbol: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type symbol (%d) \"%s\"\n", + logical_file_name, tp->line_number, nesting_level, tp->string); +#endif + + /* No need to bother if we extract all strings anyway. */ + if (!extract_all) + { + void *keyword_value; + + if (0 == find_entry (&keywords, tp->string, strlen (tp->string), + &keyword_value)) + { + last_token = token_type_keyword_symbol; + + arg_sg = (int) (long) keyword_value & ((1 << 10) - 1); + arg_pl = (int) (long) keyword_value >> 10; + arg_count = 1; + + state = 2; + } + } + continue; + + case token_type_variable: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type variable (%d) \"%s\"\n", + logical_file_name, tp->line_number, nesting_level, tp->string); +#endif + prefer_division_over_regexp = true; + continue; + + case token_type_lparen: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type left parentheses (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + ++paren_seen; + + /* No need to recurse if we extract all strings anyway. */ + if (extract_all) + continue; + else + { + if (extract_balanced (mlp, arg_sg - arg_count + 1, + arg_pl - arg_count + 1, state, + token_type_rparen)) + { + free_token (tp); + return true; + } + if (my_last_token == token_type_keyword_symbol) + arg_sg = arg_pl = -1; + } + continue; + + case token_type_rparen: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type right parentheses(%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + --paren_seen; + + /* No need to return if we extract all strings anyway. */ + if (extract_all) + continue; + + continue; + + case token_type_comma: + case token_type_fat_comma: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type comma (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + /* No need to bother if we extract all strings anyway. */ + if (extract_all) + continue; + ++arg_count; + + if (arg_count > arg_sg && arg_count > arg_pl) + { + /* We have missed the argument. */ + arg_sg = arg_pl = -1; + arg_count = 0; + } +#if DEBUG_PERL + fprintf (stderr, "%s:%d: arg_count: %d, arg_sg: %d, arg_pl: %d\n", + real_file_name, tp->line_number, + arg_count, arg_sg, arg_pl); +#endif + continue; + + case token_type_string: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type string (%d): \"%s\"\n", + logical_file_name, tp->line_number, nesting_level, tp->string); +#endif + + if (extract_all) + { + lex_pos_ty pos; + + pos.file_name = logical_file_name; + pos.line_number = tp->line_number; + remember_a_message (mlp, collect_message (mlp, tp, + EXIT_SUCCESS), + &pos); + } + else if (state) + { + lex_pos_ty pos; + + pos.file_name = logical_file_name; + pos.line_number = tp->line_number; + + if (arg_count == arg_sg) + { + plural_mp = + remember_a_message (mlp, collect_message (mlp, tp, + EXIT_FAILURE), + &pos); + arg_sg = -1; + } + else if (arg_count == arg_pl && plural_mp == NULL) + { + if (plural_mp == NULL) + error (EXIT_FAILURE, 0, _("\ +%s:%d: fatal: plural message seen before singular message\n"), + real_file_name, tp->line_number); + } + else if (arg_count == arg_pl) + { + remember_a_message_plural (plural_mp, + collect_message (mlp, tp, + EXIT_FAILURE), + &pos); + arg_pl = -1; + } + } + + if (arg_sg == -1 && arg_pl == -1) + { + state = 0; + plural_mp = NULL; + } + + break; + + case token_type_eof: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type EOF (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + free_token (tp); + return true; + + case token_type_lbrace: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type lbrace (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + /* No need to recurse if we extract all strings anyway. */ + if (extract_all) + continue; + else + { + if (extract_balanced (mlp, -1, -1, 0, token_type_rbrace)) + { + free_token (tp); + return true; + } + } + continue; + + case token_type_rbrace: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type rbrace (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + state = 0; + continue; + + case token_type_lbracket: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type lbracket (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + /* No need to recurse if we extract all strings anyway. */ + if (extract_all) + continue; + else + { + if (extract_balanced (mlp, -1, -1, 0, token_type_rbracket)) + { + free_token (tp); + return true; + } + } + continue; + + case token_type_rbracket: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type rbracket (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + state = 0; + continue; + + case token_type_semicolon: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type semicolon (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + state = 0; + + /* The ultimate sign. */ + arg_sg = arg_pl = -1; + + continue; + + case token_type_dereference: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type dereference (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + + continue; + + case token_type_dot: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type dot (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + state = 0; + continue; + + case token_type_named_op: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type named operator (%d): %s\n", + logical_file_name, tp->line_number, nesting_level, tp->string); +#endif + state = 0; + continue; + + case token_type_regex_op: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type regex operator (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + continue; + + case token_type_other: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type other (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + state = 0; + continue; + + default: + fprintf (stderr, "%s:%d: unknown token type %d\n", + real_file_name, tp->line_number, tp->type); + abort (); + } + } +} + +void +extract_perl (f, real_filename, logical_filename, mdlp) + FILE *f; + const char *real_filename; + const char *logical_filename; + msgdomain_list_ty *mdlp; +{ + message_list_ty *mlp = mdlp->item[0]->messages; + + fp = f; + real_file_name = real_filename; + logical_file_name = xstrdup (logical_filename); + line_number = 0; + + last_comment_line = -1; + last_non_comment_line = -1; + + last_token = token_type_semicolon; /* Safe assumption. */ + prefer_division_over_regexp = false; + + last_string_finished = false; + + init_keywords (); + + token_stack = xcalloc (1, sizeof *token_stack); + here_eaten = 0; + end_of_file = false; + + /* Eat tokens until eof is seen. When extract_balanced returns + due to an unbalanced closing brace, just restart it. */ + while (!extract_balanced (mlp, -1, -1, 0, token_type_rbrace)) + ; + + fp = NULL; + real_file_name = NULL; + free (logical_file_name); + logical_file_name = NULL; + line_number = 0; + last_token = token_type_semicolon; + last_string_finished = false; + stack_free (token_stack); + free (token_stack); + token_stack = NULL; + here_eaten = 0; + end_of_file = true; +} diff --git a/gettext-tools/src/x-perl.h b/gettext-tools/src/x-perl.h new file mode 100644 index 000000000..7e07a0ebd --- /dev/null +++ b/gettext-tools/src/x-perl.h @@ -0,0 +1,35 @@ +/* xgettext Perl backend. + Copyright (C) 2002-2003 Free Software Foundation, Inc. + Written by Guido Flohr , 2002-2003 + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software Foundation, + Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ + + +#define EXTENSIONS_PERL \ + { "pl", "perl" }, \ + { "pm", "perl" }, \ + { "PL", "perl" }, \ + { "cgi", "perl" }, \ + +#define SCANNERS_PERL \ + { "perl", extract_perl, &formatstring_perl }, \ + +/* Scan a Perl file and add its translatable strings to mdlp. */ +extern void extract_perl PARAMS ((FILE *fp, const char *real_filename, + const char *logical_filename, + msgdomain_list_ty *mdlp)); + +extern void x_perl_keyword PARAMS ((const char *keyword)); +extern void x_perl_extract_all PARAMS ((void)); diff --git a/gettext-tools/src/xgettext.c b/gettext-tools/src/xgettext.c index 231819a1c..bf249219e 100644 --- a/gettext-tools/src/xgettext.c +++ b/gettext-tools/src/xgettext.c @@ -72,6 +72,7 @@ #include "x-java.h" #include "x-properties.h" #include "x-awk.h" +#include "x-perl.h" #include "x-ycp.h" #include "x-tcl.h" #include "x-php.h" @@ -257,6 +258,7 @@ main (int argc, char *argv[]) x_librep_extract_all (); x_java_extract_all (); x_awk_extract_all (); + x_perl_extract_all (); x_tcl_extract_all (); x_php_extract_all (); x_glade_extract_all (); @@ -316,6 +318,7 @@ main (int argc, char *argv[]) x_librep_keyword (optarg); x_java_keyword (optarg); x_awk_keyword (optarg); + x_perl_keyword (optarg); x_tcl_keyword (optarg); x_php_keyword (optarg); x_glade_keyword (optarg); @@ -654,7 +657,8 @@ Choice of input file language:\n")); -L, --language=NAME recognise the specified language\n\ (C, C++, ObjectiveC, PO, Python, Lisp,\n\ EmacsLisp, librep, Smalltalk, Java,\n\ - JavaProperties, awk, YCP, Tcl, PHP, RST, Glade)\n")); + JavaProperties, awk, YCP, Tcl, PHP, RST,\n\ + Glade, Perl)\n")); printf (_("\ -C, --c++ shorthand for --language=C++\n")); printf (_("\ @@ -1456,12 +1460,13 @@ language_to_extractor (const char *name) SCANNERS_JAVA SCANNERS_PROPERTIES SCANNERS_AWK + SCANNERS_PERL SCANNERS_YCP SCANNERS_TCL SCANNERS_PHP SCANNERS_RST SCANNERS_GLADE - /* Here will follow more languages and their scanners: perl, etc... + /* Here will follow more languages and their scanners: VisualBasic, etc... Make sure new scanners honor the --exclude-file option. */ }; @@ -1504,6 +1509,7 @@ extension_to_language (const char *extension) EXTENSIONS_JAVA EXTENSIONS_PROPERTIES EXTENSIONS_AWK + EXTENSIONS_PERL EXTENSIONS_YCP EXTENSIONS_TCL EXTENSIONS_PHP diff --git a/gettext-tools/tests/ChangeLog b/gettext-tools/tests/ChangeLog index f44adac1d..9dc5090ae 100644 --- a/gettext-tools/tests/ChangeLog +++ b/gettext-tools/tests/ChangeLog @@ -1,3 +1,10 @@ +2003-06-11 Guido Flohr + + * format-perl-1: New file. + * format-perl-2: New file. + * lang-perl: New file. + * Makefile.am (TESTS): Add them. + 2003-05-27 Bruno Haible * plural-2: Add test for Slovak plural formula. diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am index 602699334..4251cec0e 100644 --- a/gettext-tools/tests/Makefile.am +++ b/gettext-tools/tests/Makefile.am @@ -59,13 +59,14 @@ TESTS = gettext-1 gettext-2 \ format-php-1 format-php-2 \ format-python-1 format-python-2 \ format-pascal-1 format-pascal-2 \ + format-perl-1 format-perl-2 \ format-tcl-1 format-tcl-2 \ format-ycp-1 format-ycp-2 \ plural-1 plural-2 \ lang-c lang-c++ lang-objc lang-python-1 lang-python-2 lang-clisp \ lang-elisp lang-librep lang-smalltalk lang-java lang-gawk lang-pascal \ lang-ycp lang-tcl lang-php lang-po \ - lang-rst + lang-rst lang-perl EXTRA_DIST += $(TESTS) \ test.mo xg-test1.ok.po mex-test2.ok msguniq-a.in msguniq-a.inp \ diff --git a/gettext-tools/tests/format-perl-1 b/gettext-tools/tests/format-perl-1 new file mode 100755 index 000000000..1bbb5b830 --- /dev/null +++ b/gettext-tools/tests/format-perl-1 @@ -0,0 +1,166 @@ +#! /bin/sh + +# Test recognition of Perl format strings. + +tmpfiles="" +trap 'rm -fr $tmpfiles' 1 2 3 15 + +tmpfiles="$tmpfiles f-a-1.data" +cat <<\EOF > f-a-1.data +# Valid: no argument +'abc%%' +# Valid: one character argument +'abc%c' +# Valid: one string argument +'abc%s' +# Valid: one integer argument +'abc%i' +# Valid: one integer argument +'abc%d' +# Valid: one integer argument +'abc%o' +# Valid: one integer argument +'abc%u' +# Valid: one integer argument +'abc%x' +# Valid: one integer argument +'abc%X' +# Valid: one floating-point argument +'abc%e' +# Valid: one floating-point argument +'abc%E' +# Valid: one floating-point argument +'abc%f' +# Valid: one floating-point argument +'abc%F' +# Valid: one floating-point argument +'abc%g' +# Valid: one floating-point argument +'abc%G' +# Valid: one floating-point argument +'abc%a' +# Valid: one floating-point argument +'abc%A' +# Valid: one pointer argument +'abc%p' +# Valid: one argument with flags +'abc%0#g' +# Valid: one argument with width +'abc%2g' +# Valid: one argument with width +'abc%*g' +# Valid: one argument with precision +'abc%.4g' +# Valid: one argument with precision +'abc%.*g' +# Valid: one argument with width and precision +'abc%14.4g' +# Valid: one argument with width and precision +'abc%14.*g' +# Valid: one argument with width and precision +'abc%*.4g' +# Valid: one argument with width and precision +'abc%*.*g' +# Valid: one argument with size specifier +'abc%hhi' +# Valid: one argument with size specifier +'abc%hi' +# Valid: one argument with size specifier +'abc%li' +# Valid: one argument with size specifier +'abc%lli' +# Valid: one argument with size specifier +'abc%Lg' +# Valid: one argument with size specifier +'abc%qi' +# Valid: one argument with size specifier +'abc%ji' +# Valid: one argument with size specifier +'abc%zi' +# Valid: one argument with size specifier +'abc%ti' +# Invalid: unterminated +'abc%' +# Invalid: unknown format specifier +'abc%y' +# Invalid: flags after width +'abc%*0g' +# Invalid: twice precision +'abc%.4.2g' +# Valid: three arguments +'abc%d%u%u' +# Valid: a numbered argument +'abc%1$d' +# Invalid: zero +'abc%0$d' +# Valid: two-digit numbered arguments +'abc%11$def%10$dgh%9$dij%8$dkl%7$dmn%6$dop%5$dqr%4$dst%3$duv%2$dwx%1$dyz' +# Invalid: unterminated number +'abc%1' +# Invalid: flags before number +'abc%+1$d' +# Valid: three arguments, two with same number +'abc%1$4x,%2$c,%1$u' +# Invalid: argument with conflicting types +'abc%1$4x,%2$c,%1$s' +# Valid: no conflict +'abc%1$4x,%2$c,%1$u' +# Invalid: mixing of numbered and unnumbered arguments +'abc%d%2$x' +# Valid: numbered argument with constant precision +'abc%1$.9x' +# Invalid: mixing of numbered and unnumbered arguments +'abc%1$.*x' +# Invalid: missing non-final argument +'abc%2$x%3$s' +# Valid: permutation +'abc%2$ddef%1$d' +# Valid: multiple uses of same argument +'abc%2$xdef%1$pghi%2$x' +# Valid: one argument with width +'abc%2$#*1$g' +# Valid: one argument with width and precision +'abc%3$*2$.*1$g' +# Invalid: zero +'abc%2$*0$.*1$g' +# Valid: named arguments +'[foo] [bar] [baz]' +EOF + +: ${XGETTEXT=xgettext} +n=0 +while read comment; do + read string + n=`expr $n + 1` + tmpfiles="$tmpfiles f-a-1-$n.in f-a-1-$n.po" + cat < f-a-1-$n.in +gettext(${string}); +EOF + ${XGETTEXT} -L perl -o f-a-1-$n.po f-a-1-$n.in || exit 1 + test -f f-a-1-$n.po || exit 1 + fail= + if echo "$comment" | grep 'Valid:' > /dev/null; then + if grep perl-format f-a-1-$n.po > /dev/null; then + : + else + fail=yes + fi + else + if grep perl-format f-a-1-$n.po > /dev/null; then + fail=yes + else + : + fi + fi + if test -n "$fail"; then + echo "Format string recognition error:" 1>&2 + cat f-a-1-$n.in 1>&2 + echo "Got:" 1>&2 + cat f-a-1-$n.po 1>&2 + exit 1 + fi +done < f-a-1.data + +rm -fr $tmpfiles + +exit 0 diff --git a/gettext-tools/tests/format-perl-2 b/gettext-tools/tests/format-perl-2 new file mode 100755 index 000000000..a8b66441e --- /dev/null +++ b/gettext-tools/tests/format-perl-2 @@ -0,0 +1,159 @@ +#! /bin/sh + +# Test checking of Perl format strings. + +tmpfiles="" +trap 'rm -fr $tmpfiles' 1 2 3 15 + +tmpfiles="$tmpfiles f-a-2.data" +cat <<\EOF > f-a-2.data +# Valid: %% doesn't count +msgid "abc%%def" +msgstr "xyz" +# Invalid: invalid msgstr +msgid "abc%%def" +msgstr "xyz%" +# Valid: same arguments +msgid "abc%s%gdef" +msgstr "xyz%s%g" +# Valid: same arguments, with different widths +msgid "abc%2sdef" +msgstr "xyz%3s" +# Valid: same arguments but in numbered syntax +msgid "abc%s%gdef" +msgstr "xyz%1$s%2$g" +# Valid: permutation +msgid "abc%s%g%cdef" +msgstr "xyz%3$c%2$g%1$s" +# Invalid: too few arguments +msgid "abc%2$udef%1$s" +msgstr "xyz%1$s" +# Invalid: too few arguments +msgid "abc%sdef%u" +msgstr "xyz%s" +# Invalid: too many arguments +msgid "abc%udef" +msgstr "xyz%uvw%c" +# Valid: same numbered arguments, with different widths +msgid "abc%2$5s%1$4s" +msgstr "xyz%2$4s%1$5s" +# Invalid: missing argument +msgid "abc%2$sdef%1$u" +msgstr "xyz%1$u" +# Invalid: missing argument +msgid "abc%1$sdef%2$u" +msgstr "xyz%2$u" +# Invalid: added argument +msgid "abc%1$udef" +msgstr "xyz%1$uvw%2$c" +# Valid: type compatibility +msgid "abc%i" +msgstr "xyz%d" +# Valid: type compatibility +msgid "abc%o" +msgstr "xyz%u" +# Valid: type compatibility +msgid "abc%u" +msgstr "xyz%x" +# Valid: type compatibility +msgid "abc%u" +msgstr "xyz%X" +# Valid: type compatibility +msgid "abc%e" +msgstr "xyz%E" +# Valid: type compatibility +msgid "abc%e" +msgstr "xyz%f" +# Valid: type compatibility +msgid "abc%e" +msgstr "xyz%g" +# Valid: type compatibility +msgid "abc%e" +msgstr "xyz%G" +# Invalid: type incompatibility +msgid "abc%c" +msgstr "xyz%s" +# Invalid: type incompatibility +msgid "abc%c" +msgstr "xyz%i" +# Invalid: type incompatibility +msgid "abc%c" +msgstr "xyz%o" +# Invalid: type incompatibility +msgid "abc%c" +msgstr "xyz%e" +# Invalid: type incompatibility +msgid "abc%s" +msgstr "xyz%i" +# Invalid: type incompatibility +msgid "abc%s" +msgstr "xyz%o" +# Invalid: type incompatibility +msgid "abc%s" +msgstr "xyz%e" +# Invalid: type incompatibility +msgid "abc%i" +msgstr "xyz%o" +# Invalid: type incompatibility +msgid "abc%i" +msgstr "xyz%e" +# Invalid: type incompatibility +msgid "abc%u" +msgstr "xyz%e" +# Invalid: type incompatibility for width +msgid "abc%g%*g" +msgstr "xyz%*g%g" +# Valid: named arguments in arbitrary order +msgid "[foo] [bar] [baz]: 1" +msgstr "[baz] [bar] [foo]: 1" +# Invalid: missing named argument in msgstr +msgid "[foo] [bar] [baz]: 2" +msgstr "[baz] [foo]: 2" +# Invalid: missing named arguments in msgid +msgid "[foo] [baz]: 3" +msgstr "[baz] [bar] [foo]: 3" +# Invalid: number of named arguments in msgstr exceeds msgid +msgid "[foo] [bar] [baz]: 4" +msgstr "[baz] [bar] [foo] [bar]: 4" +# Invalid: number of named arguments in msgid exceeds msgstr +msgid "[baz] [bar] [foo] [bar]: 5" +msgstr "[foo] [bar] [baz]: 5" +EOF + +: ${MSGFMT=msgfmt} +n=0 +while read comment; do + read msgid_line + read msgstr_line + n=`expr $n + 1` + tmpfiles="$tmpfiles f-a-2-$n.po f-a-2-$n.mo" + cat < f-a-2-$n.po +#, perl-format +${msgid_line} +${msgstr_line} +EOF + fail= + if echo "$comment" | grep 'Valid:' > /dev/null; then + if ${MSGFMT} --check-format -o f-a-2-$n.mo f-a-2-$n.po; then + : + else + fail=yes + fi + else + ${MSGFMT} --check-format -o f-a-2-$n.mo f-a-2-$n.po 2> /dev/null + if test $? = 1; then + : + else + fail=yes + fi + fi + if test -n "$fail"; then + echo "Format string checking error:" 1>&2 + cat f-a-2-$n.po 1>&2 + exit 1 + fi +done < f-a-2.data + +rm -fr $tmpfiles + +exit 0 diff --git a/gettext-tools/tests/lang-perl b/gettext-tools/tests/lang-perl new file mode 100755 index 000000000..457519759 --- /dev/null +++ b/gettext-tools/tests/lang-perl @@ -0,0 +1,186 @@ +#! /bin/sh + +# Test of gettext facilities in the Perl language. + +tmpfiles="" +trap 'rm -fr $tmpfiles' 1 2 3 15 + +tmpfiles="$tmpfiles prog.pl" +here="<" # Help St. Emacs +here="$here<" + +cat < prog.pl +use Locale::Messages; + +textdomain "prog"; +bindtextdomain ("./"); + +s/foo/ + # stress test for string extraction /xe; + +print _"'Your command, please?', asked the waiter."; + +printf ngettext ("a piece of cake", "%d pieces of cake", $n), $n; + +printf _"%s is replaced by %s.", "FF", "EUR"; + +# Should be found. +printf dngettext prog => ("one file deleted", "%d files deleted"), $n, $n; + +# Should not be found. +printf dngettext ("prog"), ("one file created", "%d files created"), $n, $n; + +printf dngettext "prog", ${here}PERL, ${here}PERL; +Singular +PERL +Plural +PERL + +print ${here}PERL +tied hash \$__{ Bareword +} +tied hash \$__->{"quoted string"} +tied hash \$__->{ "weird +formatting"} +PERL + +print \$__ # Welcome + -> # to the + { # Republic of + 'Welcome to the Republic of Perl!' # +# Perl! +}; + +\$! ? ?\$__{"pattern match"}? : s # This is no delimiter. +{\$__{substitution}}<\$__-\>{"find me"}>; + +# No interpolation! +m'\$__{secret}'; + +__END__ +gettext "Discarded!"; + +EOF + +tmpfiles="$tmpfiles prog.pot" +: ${XGETTEXT=xgettext} +${XGETTEXT} -k_ -k%__ -k\$__ -o prog.pot --omit-header --no-location prog.pl + +tmpfiles="$tmpfiles prog.ok" +cat < prog.ok +msgid "'Your command, please?', asked the waiter." +msgstr "" + +#, perl-format +msgid "a piece of cake" +msgid_plural "%d pieces of cake" +msgstr[0] "" +msgstr[1] "" + +#, perl-format +msgid "%s is replaced by %s." +msgstr "" + +#, perl-format +msgid "one file deleted" +msgid_plural "%d files deleted" +msgstr[0] "" +msgstr[1] "" + +msgid "Singular\n" +msgid_plural "Plural\n" +msgstr[0] "" +msgstr[1] "" + +msgid "Bareword" +msgstr "" + +msgid "quoted string" +msgstr "" + +msgid "" +"weird\n" +"formatting" +msgstr "" + +msgid "Welcome to the Republic of Perl!" +msgstr "" + +msgid "pattern match" +msgstr "" + +msgid "substitution" +msgstr "" + +msgid "find me" +msgstr "" +EOF + +: ${DIFF=diff} +${DIFF} prog.ok prog.pot || exit 1 + +tmpfiles="$tmpfiles fr.po" +cat <<\EOF > fr.po +msgid "" +msgstr "" +"Content-Type: text/plain; charset=ISO-8859-1\n" +"Plural-Forms: nplurals=2; plural=(n > 1);\n" + +msgid "'Your command, please?', asked the waiter." +msgstr "«Votre commande, s'il vous plaît», dit le garçon." + +# Les gateaux allemands sont les meilleurs du monde. +#, perl-format +msgid "a piece of cake" +msgid_plural "%d pieces of cake" +msgstr[0] "un morceau de gateau" +msgstr[1] "%d morceaux de gateau" + +# Reverse the arguments. +#, perl-format +msgid "%s is replaced by %s." +msgstr "%2$s remplace %1$s." + +#, perl-format +msgid "one file deleted" +msgid_plural "%d files deleted" +msgstr[0] "" +msgstr[1] "" + +msgid "Singular\n" +msgid_plural "Plural\n" +msgstr[0] "" +msgstr[1] "" + +msgid "Bareword" +msgstr "" + +msgid "quoted string" +msgstr "" + +msgid "" +"weird\n" +"formatting" +msgstr "" + +msgid "Welcome to the Republic of Perl!" +msgstr "" + +msgid "pattern match" +msgstr "" + +msgid "substitution" +msgstr "" + +msgid "find me" +msgstr "" +EOF + +tmpfiles="$tmpfiles fr.po.new" +: ${MSGMERGE=msgmerge} +${MSGMERGE} -q -o fr.po.new fr.po prog.pot + +: ${DIFF=diff} +${DIFF} fr.po fr.po.new || exit 1 + +exit 0