xgettext support for Perl. Contributed by Guido Flohr.

author Bruno Haible <bruno@clisp.org>

Thu, 12 Jun 2003 12:09:38 +0000 (12:09 +0000)

committer Bruno Haible <bruno@clisp.org>

Tue, 23 Jun 2009 10:10:39 +0000 (12:10 +0200)
author Bruno Haible <bruno@clisp.org>
Thu, 12 Jun 2003 12:09:38 +0000 (12:09 +0000)
committer Bruno Haible <bruno@clisp.org>
Tue, 23 Jun 2009 10:10:39 +0000 (12:10 +0200)
diff --git a/gettext-tools/doc/ChangeLog b/gettext-tools/doc/ChangeLog

index 1d87cd228b85f75c05df0c308de51f19febc618e..b74d4a973718b4cb153b523f1462825aef39626f 100644 (file)
--- a/gettext-tools/doc/ChangeLog
+++ b/gettext-tools/doc/ChangeLog
@@ -1,3 +1,8 @@
+2003-06-11  Guido Flohr  <guido@imperia.net>
+
+       * gettext.texi (Perl): Extend and update.
+       * xgettext.texi: Mention language Perl.
+
  2003-06-10  Stepan Kasal  <kasal@math.cas.cz>
  
         * gettext.texi: Fix a few typos.
diff --git a/gettext-tools/doc/gettext.texi b/gettext-tools/doc/gettext.texi

index 69237bd89564d4771f94974294302e45ddc8b47c..08de720b74a719d2a3ac33b8e3cf0ea86ac15c54 100644 (file)
--- a/gettext-tools/doc/gettext.texi
+++ b/gettext-tools/doc/gettext.texi
@@ -338,6 +338,7 @@ The Translator's View
  * ycp-format::                  YCP Format Strings
  * tcl-format::                  Tcl Format Strings
  * php-format::                  PHP Format Strings
+* perl-format::                        Perl Format Strings
  
  Individual Programming Languages
  
@@ -7196,6 +7197,7 @@ strings.
  * ycp-format::                  YCP Format Strings
  * tcl-format::                  Tcl Format Strings
  * php-format::                  PHP Format Strings
+* perl-format::                        Perl Format Strings
  @end menu
  
  @node c-format, python-format, Translators for other Languages, Translators for other Languages
@@ -7288,13 +7290,20 @@ or a nonzero digit (@samp{1} to @samp{9}).
  Tcl format strings are described in the @file{format.n} manual page,
  @uref{http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm}.
  
-@node php-format,  , tcl-format, Translators for other Languages
+@node php-format, perl-format, tcl-format, Translators for other Languages
  @subsection PHP Format Strings
  
  PHP format strings are described in the documentation of the PHP function
  @code{sprintf}, in @file{phpdoc/manual/function.sprintf.html} or
  @uref{http://www.php.net/manual/en/function.sprintf.php}.
  
+@node perl-format,  , php-format, Translators for other Languages
+@subsection Perl Format Strings
+
+Perl format strings are described in the @file{Locale::TextDomain}
+manual page, or in
+@uref{http://let.imperia.org/howto/en/perl-i18n/}.
+
  @node Maintainers for other Languages, List of Programming Languages, Translators for other Languages, Programming Languages
  @section The Maintainer's View
  
@@ -8241,19 +8250,46 @@ argument is given.
  
  @table @asis
  @item RPMs
-perl, perl-gettext
+---
  
  @item File extension
-@code{pl}, @code{PL}
+@code{pl}, @code{PL}, @code{pm}, @code{cgi}
  
  @item String syntax
-@code{"abc"}
+@itemize @bullet
+
+@item @code{"abc"}
+
+@item @code{'abc'}
+
+@item @code{qq (abc)}
+
+@item @code{q (abc)}
+
+@item @code{qr /abc/}
+
+@item @code{qx (/bin/date)}
+
+@item @code{/pattern match/}
+
+@item @code{?pattern match?}
+
+@item @code{s/substitution/operators/}
+
+@item @code{$tied_hash@{"message"@}}
+
+@item @code{$tied_hash_reference->@{"message"@}}
+
+@item etc., issue the command @code{perldoc perlsyn} for details
+
+@end itemize
  
  @item gettext shorthand
----
+@code{__} (double underscore)
  
  @item gettext/ngettext functions
-@code{gettext}, @code{dgettext}, @code{dcgettext}
+@code{gettext}, @code{dgettext}, @code{dcgettext}, @code{ngettext},
+@code{dngettext}, @code{dcngettext}
  
  @item textdomain
  @code{textdomain} function
@@ -8261,29 +8297,665 @@ perl, perl-gettext
  @item bindtextdomain
  @code{bindtextdomain} function
  
+@item bind_textdomain_codeset 
+@code{bind_textdomain_codeset} function
+
  @item setlocale
  Use @code{setlocale (LC_ALL, "");}
  
  @item Prerequisite
  @code{use POSIX;}
-@*@code{use Locale::gettext;}
+@*@code{use Locale::TextDomain;} (included in the package libintl-perl
+which is available on the Comprehensive Perl Authors Network CPAN,
+http://www.cpan.org/).
  
  @item Use or emulate GNU gettext
-use
+emulate
  
  @item Extractor
-?
+@code{xgettext -k__ -k\$__ -k%__ -k__x -kN__ -k}
  
  @item Formatting with positions
----
+@code{__x("error opening '[file]': [err]", file => $file, err => $!);}
  
  @item Portability
-?
+The @code{libintl-perl} package is platform-independent but does
+not belong to the Perl core.  The programmer is responsible for
+providing a dummy implementation of the require functions if the 
+package is not installed on the target system.
  
  @item po-mode marking
----
+yes
+
+@item Documentation
+In-depth documentation can be found at
+http://let.imperia.org/howto/en/perl-18n/.
+
  @end table
  
+@cindex marking Perl sources
+
+The @code{xgettext} parser backend for Perl differs significantly from
+the parser backends for other programming languages, just as Perl
+itself differs significantly from other programming languages.  The
+Perl parser backend offers many more string marking facilities than
+the other backends but it also has some Perl-specific limitations, the
+worst probably being its imperfectness.
+
+@menu
+* General Problems::         General Problems Parsing Perl Code
+* Default Keywords::         Which Keywords Will xgettext Look For?
+* Special Keywords::         How to Extract Hash Keys?
+* Quote-like Expressions::   What are Strings And Quote-like Expressions?
+* Interpolation I::          Illegal String Interpolation
+* Interpolation II::         Legal String Interpolation
+* Parentheses::              When To Use Parentheses
+* Long Lines::               How To Grok with Long Lines
+* Perl Pitfalls::            Bugs, Pitfalls, and Things That Do Not Work
+@end menu
+
+@node General Problems, Default Keywords,  , Perl
+@subsubsection General Problems Parsing Perl Code
+
+It is often heard that only Perl can parse Perl.  This is not true.
+Perl cannot be @emph{parsed} at all, it can only be @emph{executed}.
+Perl has various built-in ambiguities that can only be resolved at runtime.
+
+The following example may illustrate one common problem:
+
+@example
+print gettext "Hello World!";
+@end example
+
+Although this example looks like a bullet-proof case of a function
+invocation, it is not:
+
+@example
+open gettext, ">testfile" or die;
+print gettext "Hello world!"
+@end example
+
+In this context, the string @code{gettext} looks more like a
+file handle.  But not necessarily:
+
+@example
+use Locale::Messages;
+open gettext ">testfile" or die;
+print gettext "Hello world!";
+@end example
+
+Now, the file is probably syntactically incorrect, provided that the module
+@code{Locale::Messages} found first in the Perl include path exports a
+function @code{gettext}.  But what if the module
+@code{Locale::Messages} really looks like this?
+
+@example
+use vars qw (*gettext);
+
+1;
+@end example
+
+In this case, the string @code{gettext} will be interpreted as a file
+handle again, and the above example will create a file @file{testfile}
+and write the string ``Hello world!'' into it.  Even advanced
+control flow analysis will not really help:
+
+@example
+if (0.5 < rand) @{
+   eval "use Sane";
+@} else @{
+   eval "use InSane";
+@}
+print gettext "Hello world!";
+@end example
+
+If the module @code{Sane} exports a function @code{gettext} that does
+what we expect, and the module @code{InSane} opens a file for writing
+and associates the @emph{handle} @code{gettext} with this output
+stream, we are clueless again about what will happen at runtime.  It is
+completely unpredictable.  The truth is that Perl has so many ways to
+fill its symbol table at runtime that it is impossible to interpret a
+particular piece of code without executing it.
+
+Of course, @code{xgettext} will not execute your Perl sources while
+scanning for translatable strings, but rather use heuristics in order
+to guess what you meant.
+
+Another problem is the ambiguity of the slash and the question mark.
+Their interpretation depends on the context:
+
+@example
+# A pattern match.
+print "OK\n" if /foobar/;
+
+# A division.
+print 1 / 2;
+
+# Another pattern match.
+print "OK\n" if ?foobar?;
+
+# Conditional.
+print $x ? "foo" : "bar";
+@end example
+
+The slash may either act as the division operator or introduce a
+pattern match, whereas the question mark may act as the ternary
+question mark operator or as a pattern match, too.  Other programming
+languages like awk arise similar problems, but the consequences of a
+misinterpretation are particularly nasty with Perl sources.  In awk
+for instance, a statement can never exceed one line and the parser
+can recover from a parsing error at the next newline and interpret
+the rest of the input stream correctly.  Perl is different, as a
+pattern match is terminated by the next appearance of the delimiter
+(the slash or the question mark) in the input stream, regardless of
+the semantic context.  If a slash is really a division sign but
+mis-interpreted as a pattern match, the rest of the input file is most
+probably parsed incorrectly.
+
+If you find that @code{xgettext} fails to extract strings from
+portions of your sources, you should therefore look out for slashes
+and/or question marks preceding these sections.  You may have come
+across a bug in the Perl parser (and of course you should report that bug).
+
+@node Default Keywords, Special Keywords, General Problems, Perl
+@subsubsection Which keywords will xgettext look for?
+@cindex Perl default keywords
+
+Unless you instruct @code{xgettext} otherwise by invoking it with one
+of the options @code{--keyword} or @code{-k}, it will recognize the
+following keywords in your Perl sources:
+
+@itemize @bullet
+
+@item @code{gettext}
+
+@item @code{dgettext}
+
+@item @code{dcgettext}
+
+@item @code{ngettext:1,2}
+
+The first (singular) and the second (plural) argument will be
+extracted.
+
+@item @code{dngettext:1,2}
+
+The first (singular) and the second (plural) argument will be
+extracted.
+
+@item @code{dcngettext:1,2}
+
+The first (singular) and the second (plural) argument will be
+extracted.
+
+@item @code{gettext_noop}
+
+@item @code{%gettext}
+
+The keys of lookups into the hash @code{%gettext} will be extracted.
+
+@item @code{$gettext}
+
+The keys of lookups into the hash reference @code{$gettext} will be extracted.
+
+@end itemize
+
+@node Special Keywords, Quote-like Expressions, Default Keywords, Perl
+@subsubsection How to Extract Hash Keys?
+@cindex Perl special keywords for hash-lookups
+
+Translating messages at runtime is normally performed by looking up the
+original string in the translation database and returning the
+translated version.  The ``natural'' Perl implementation is a hash
+lookup, and, of course, @code{xgettext} supports such practice.
+
+@example
+print __"Hello world!";
+print $__@{"Hello world!"@};
+print $__->@{"Hello world!"@};
+print $$__@{"Hello world!"@};
+@end example  
+
+The above four lines all do the same thing.  The Perl module 
+@code{Locale::TextDomain} exports by default a hash @code{%__} that
+is tied to the function @code{__()}.  It also exports a reference
+@code{$__} to @code{%__}.
+
+If an argument to the @code{xgettext} option @code{--keyword},
+resp. @code{-k} starts with a percent sign, the rest of the keyword is
+interpreted as the name of a hash.  If it starts with a dollar
+sign, the rest of the keyword is interpreted as a reference to a
+hash.
+
+Note that you can omit the quotation marks (single or double) around
+the hash key (almost) whenever Perl itself allows it:
+
+@example
+print $gettext@{Error@};
+@end example
+
+The exact rule is: You can omit the surrounding quotes, when the hash
+key is a valid C (!) identifier, i. e. when it starts with an
+underscore or a US-ASCII letter and is followed by an arbitrary number
+of underscores, US-ASCII letters or digits.  Other Unicode characters
+are @emph{not} allowed, regardless of the @code{use utf8} pragma.
+
+@node Quote-like Expressions, Interpolation I, Special Keywords, Perl
+@subsubsection What are Strings And Quote-like Expressions?
+@cindex Perl quote-like expressions
+
+Perl offers a plethora of different string constructs.  Those that can
+be used either as arguments to functions or inside braces for hash
+lookups are generally supported by @code{xgettext}.  
+
+@itemize @bullet
+@item @strong{double-quoted strings}
+@*
+@example
+print gettext "Hello World!";
+@end example
+
+@item @strong{single-quoted strings}
+@*
+@example
+print gettext 'Hello World!';
+@end example
+
+@item @strong{the operator qq}
+@*
+@example
+print gettext qq |Hello World!|;
+print gettext qq <E-mail: <guido\@@imperia.net>>;
+@end example
+
+The operator @code{qq} is fully supported.  You can use arbitrary
+delimiters, including the four bracketing delimiters (round, angle,
+square, curly) that nest.
+
+@item @strong{the operator q}
+@*
+@example
+print gettext q |Hello World!|;
+print gettext q <E-mail: <guido@@imperia.net>>;
+@end example
+
+The operator @code{q} is fully supported.  You can use arbitrary
+delimiters, including the four bracketing delimiters (round, angle,
+square, curly) that nest.
+
+@item @strong{the operator qx}
+@*
+@example
+print gettext qx ;LANGUAGE=C /bin/date;
+print gettext qx [/usr/bin/ls | grep '^[A-Z]*'];
+@end example
+
+The operator @code{qx} is fully supported.  You can use arbitrary
+delimiters, including the four bracketing delimiters (round, angle,
+square, curly) that nest.
+
+Use this feature with care!  If you use it, you lose control of the
+external code executed from within your Perl script to your translator!
+
+@item @strong{here documents}
+@group
+@*
+@example
+print ngettext <<EOF, <<"EOF";
+one file deleted
+EOF
+several files deleted
+EOF
+
+print gettext <<'EOF';
+program not found in $PATH
+EOF
+@end example
+@end group
+
+Here-documents are recognized.  If the delimiter is enclosed in single
+quotes, the string is not interpolated.  If it is enclosed in double
+quotes or has no quotes at all, the string is interpolated.
+
+Delimiters that start with a digit are not supported!
+
+@end itemize
+
+@node Interpolation I, Interpolation II, Quote-like Expressions, Perl
+@subsubsection Illegal Uses Of String Interpolation
+@cindex Perl illegal string interpolation
+
+Perl is capable of interpolating variables into strings.  This offers
+some nice features in localized programs but can also lead to
+problems.
+
+A common error is a construct like the following:
+
+@example
+print gettext "This is the program $0!\n";
+@end example
+
+Perl will interpolate at runtime the value of the variable @code{$0}
+into the argument of the @code{gettext()} function.  Hence, this
+argument is not a string constant but a variable argument (@code{$0}
+is a global variable that holds the name of the Perl script being
+executed).  The interpolation is performed by Perl before the string
+argument is passed to @code{gettext()} and will therefore depend on
+the name of the script which can only be determined at runtime.
+Consequently, it is almost impossible that a translation can be looked
+up at runtime (except if by accident the interpolated string is found
+in the message catalog).
+
+The Perl backend will therefore terminate parsing with a fatal error
+if it encounters a variable inside of an extracted string.  In
+general, this will happen for all kinds of string interpolations that
+cannot be safely performed at compile time.  If you absolutely know
+what you are doing, you can always circumvent this behavior:
+
+@example
+my $know_what_i_am_doing = "This is program $0!\n";
+print gettext $know_what_i_am_doing;
+@end example
+
+Since the parser only recognizes strings and quote-like expressions,
+but not variables or other terms, the above construct will be
+accepted.  You will have to find another way, however, to let your
+original string make it into your message catalog.
+
+If invoked with the option @code{--extract-all}, resp. @code{-a},
+variable interpolation will be accepted.  Rationale: You will
+generally use this option in order to prepare your sources for
+internationalization.
+
+Please see the manual page @code{perlop} for details of strings and
+quote-like expressions that are subject to interpolation and those
+that are not.  Safe interpolations (that will not lead to a fatal
+error) are:
+
+@itemize @bullet
+
+@item the escape sequences @code{\t} (tab, HT, TAB), @code{\n}
+(newline, NL), @code{\r} (return, CR), @code{\f} (form feed, FF),
+@code{\b} (backspace, BS), @code{\a} (alarm, bell, BEL), and @code{\e}
+(escape, ESC).
+
+@item octal chars @code{\033}
+@*
+Note that octal escapes in the range of 400-777 are translated into a 
+UTF-8 representation, regardless of the presence of the @code{use utf8} pragma.
+
+@item hex chars @code{\x1b}
+
+@item wide hex chars @code{\x263a}
+@*
+Note that this escape is translated into a UTF-8 representation,
+regardless of the presence of the @code{use utf8} pragma.
+
+@item @code{\c[} (CTRL-[)
+
+@item @code{\N@{LATIN CAPITAL LETTER C WITH CEDILLA@}}
+@*
+Note that this escape is translated into a UTF-8 representation,
+regardless of the presence of the @code{use utf8} pragma.
+@end itemize
+
+The following escapes are considered partially safe:
+
+@itemize @bullet
+
+@item @code{\l} lowercase next char
+
+@item @code{\u} uppercase next char
+
+@item @code{\L} lowercase till \E
+
+@item @code{\U} uppercase till \E
+
+@item @code{\E} end case modification
+
+@item @code{\Q} quote non-word characters till \E
+
+@end itemize
+
+These escapes are only considered safe if the string consists of
+US-ASCII characters only.  Translation of characters outside the range
+defined by US-ASCII is locale-dependent and can only be performed at
+runtime.  Even ASCII characters are not safe in some more exotic
+environment like EBCDIC.
+
+Except for the modifier @code{\Q}, these translations, albeit legal,
+are generally useless and only obfuscate your sources.  If a
+translation can be safely performed at compile time you can just as
+well write what you mean.
+
+@node Interpolation II, Parentheses, Interpolation I, Perl
+@subsubsection Legal Uses Of String Interpolation
+@cindex Perl legal string interpolation
+
+Perl is often used to generate sources for other programming languages
+or arbitrary file formats.  Web applications that output HTML code
+make a prominent example for such usage.
+
+You will often come across situations where you want to intersperse
+code written in the target (programming) language with translatable
+messages, like in the following HTML example:
+
+@example
+print gettext <<EOF;
+<h1>My Homepage</h1>
+<script language="JavaScript"><!--
+for (i = 0; i < 100; ++i) @{
+    alert ("Thank you so much for visiting my homepage!");
+@}
+//--></script>
+EOF
+@end example
+
+If you exaggerate with constructs like the above, you will run the
+risk that the translators of your package will look out for a less
+challenging project.  You should consider an alternative expression:
+
+@example
+print <<EOF;
+<h1>$gettext@{"My Homepage"@}</h1>
+<script language="JavaScript"><!--
+for (i = 0; i < 100; ++i) @{
+    alert ("$gettext@{'Thank you so much for visiting my homepage!'@}");
+@}
+//--></script>
+EOF
+@end example
+
+The resulting PO file will begrudgingly improve in terms of
+readability.
+
+You can interpolate hash lookups in all strings or quote-like
+expressions that are subject to interpolation (see the manual page
+@code{perlop} for details).  Double interpolation is illegal, however:
+
+@example
+# TRANSLATORS: Replace "the earth" with the name of your planet.
+print gettext qq@{Welcome to $gettext->@{"the earth"@}@};
+@end example
+
+The quoted string is recognized as an argument to @code{xgettext} in
+the first place, and checked for illegal variable interpolation.  The
+dollar sign will therefore terminate the parser with an illegal
+interpolation error.
+
+It is legal to interpolate hash lookups in regular expressions:
+
+@example
+if ($var =~ /$gettext@{"the earth"@}/) @{
+   print gettext "Match!\n";
+@}
+s/$gettext@{"U. S. A."@}/$gettext@{"U. S. A."@} $gettext@{"(dial +0)"@}/g;
+@end example
+
+@node Parentheses, Long Lines, Interpolation II, Perl
+@subsubsection When To Use Parentheses
+@cindex Perl parentheses
+
+In Perl, parentheses around function arguments are mostly optional.
+The Perl backend for @code{xgettext} will always assume that all
+recognized keywords (except for hashs and hash references) are names
+of properly prototyped functions, and will (hopefully) only require
+parentheses where Perl itself requires them.  All constructs in the
+following example are therefore ok to use:
+
+@example
+@group
+print gettext ("Hello World!\n");
+print gettext "Hello World!\n";
+print dgettext ($package => "Hello World!\n");
+print dgettext $package, "Hello World!\n";
+
+# The "fat comma" => turns the left-hand side argument into a
+# single-quoted string!
+print dgettext smellovision => "Hello World!\n";
+
+# The following assignment only works with prototyped functions.
+# Otherwise, the functions will act as "greedy" list operators and
+# eat up all following arguments.
+my $anonymous_hash = @{
+   planet => gettext "earth",
+   cakes  => ngettext "one cake", "several cakes", $n,
+   still => $works,
+@};
+# The same w/o fat comma:
+my $other_hash = @{
+   'planet', gettext "earth",
+   'cakes', ngettext "one cake", "several cakes", $n,
+   'still', $works,
+@};
+
+# Parentheses are only significant for the first argument.
+print dngettext 'package', ("one cake", "several cakes", $n), $discarded;
+@end group
+@end example
+
+@node Long Lines, Perl Pitfalls, Parentheses, Perl
+@subsubsection How To Grok with Long Lines
+@cindex Perl long lines
+
+The necessity of long messages can often lead to a cumbersome or
+unreadable coding style.  Perl has several options that may prevent
+you from writing unreadable code, and the Perl backend for
+@code{xgettext} does its best to do likewise.  This is where the dot
+operator (the string concatenation operator) may come in handy:
+
+@example
+@group
+print gettext ("This is a very long"
+               . " message that is still"
+               . " readable, because"
+               . " it is split into"
+               . " multiple lines.\n");
+@end group
+@end example
+
+Perl is smart enough to concatenate these constant string fragments
+into one long string at compile time, and so is the Perl backend to
+@code{xgettext}.  You will only find one long message in the resulting
+PO file.
+
+If embedded newline characters are not an issue, or even desired, you
+may also insert newline characters inside quoted strings wherever you
+feel like it:
+
+@example
+@group
+print gettext ("<em>In HTML output
+embedded newlines are generally no
+problem, since adjacent whitespace
+is always rendered into a single
+space character.</em>");
+@end group
+@end example
+
+You may also consider to use here documents:
+
+@example
+@group
+print gettext <<EOF;
+<em>In HTML output
+embedded newlines are generally no
+problem, since adjacent whitespace
+is always rendered into a single
+space character.</em>
+EOF
+@end group
+@end example
+
+Please do not forget, that the line-breaks are real, i. e. they
+translate into newline characters that will consequently show up in
+the resulting PO file.
+
+@node Perl Pitfalls,  , Long Lines, Perl
+@subsubsection Bugs, Pitfalls, And Things That Do Not Work
+@cindex Perl pitfalls
+
+The foregoing sections should have proven that the Perl backend for
+@code{xgettext} is quite smart in extracting translatable strings from
+Perl sources.  Yet, some more or less exotic constructs that could be
+expected to work, actually do not work.  
+
+One of the more relevant limitations can be found in the
+implementation of variable interpolation inside quoted strings.  Only
+simple hash lookups can be used there:
+
+@example
+print <<EOF;
+$gettext@{"The dot operator"
+          . " does not work"
+          . "here!"@}
+Likewise, you cannot @@@{[ gettext ("interpolate function calls") ]@}
+inside quoted strings or quote-like expressions.
+EOF
+@end example
+
+This is valid Perl code and will actually trigger invocations of the
+@code{gettext} function at runtime.  Yet, the Perl parser in
+@code{xgettext} will fail to recognize the strings.  A less obvious
+example can be found in the interpolation of regular expressions:
+
+@example
+s/<!--START_OF_WEEK-->/gettext ("Sunday")/e;
+@end example
+
+The modifier @code{e} will cause the substitution to be interpreted as
+an evaluable statement.  Consequently, at runtime the function
+@code{gettext()} is called, but again, the parser fails to extract the
+string ``Sunday''.  Use a temporary variable as a simple workaround if
+you really happen to need this feature:
+
+@example
+my $sunday = gettext "Sunday";
+s/<!--START_OF_WEEK-->/$sunday/;
+@end example
+
+Hash slices would also be handy but are not recognized:
+
+@example
+my @@weekdays = @@gettext@{'Sunday', 'Monday', 'Tuesday', 'Wednesday',
+                        'Thursday', 'Friday', 'Saturday'@};
+# Or even:
+@@weekdays = @@gettext@{qw (Sunday Monday Tuesday Wednesday Thursday
+                         Friday Saturday) @};
+@end example
+
+This is perfectly legal usage of the tied hash @code{%gettext} but the
+strings are not recognized and therefore will not be extracted.
+
+Another caveat of the current version is its rudimentary support for
+non-US-ASCII characters in identifiers.  You may encounter seriouos
+problems if you use identifiers with characters outside the range of
+'A'-'Z', 'a'-'z', '0'-'9' and the underscore '_'.
+
+Maybe some of these missing features will be implemented in future
+versions, but since you can always make do without them at minimal effort,
+these todos have very low priority.
+
  @node PHP, Pike, Perl, List of Programming Languages
  @subsection PHP Hypertext Preprocessor
  @cindex PHP
diff --git a/gettext-tools/doc/xgettext.texi b/gettext-tools/doc/xgettext.texi

index 959456296390fadeced1e942f6a61f033c219ae5..a709f072c066aec6b5f393106a8e94d80c8518c2 100644 (file)
--- a/gettext-tools/doc/xgettext.texi
+++ b/gettext-tools/doc/xgettext.texi
@@ -72,7 +72,7 @@ Specifies the language of the input files.  The supported languages
  are @code{C}, @code{C++}, @code{ObjectiveC}, @code{PO}, @code{Python},
  @code{Lisp}, @code{EmacsLisp}, @code{librep}, @code{Smalltalk}, @code{Java},
  @code{JavaProperties}, @code{awk}, @code{YCP}, @code{Tcl}, @code{PHP},
-@code{RST}, @code{Glade}.
+@code{RST}, @code{Glade}, @code{Perl}.
  
  @item -C
  @itemx --c++
diff --git a/gettext-tools/po/ChangeLog b/gettext-tools/po/ChangeLog

index 549c6ebd40242521abbe5c63039d2d441a9ac6bc..e6ff7867d04ffa302c691ad28edbac6bcbe0d1c4 100644 (file)
--- a/gettext-tools/po/ChangeLog
+++ b/gettext-tools/po/ChangeLog
@@ -1,3 +1,7 @@
+2003-06-11  Guido Flohr  <guido@imperia.net>
+
+       * POTFILES.in: Add src/format-perl.c and src/x-perl.c.
+
  2003-06-11  Bruno Haible  <bruno@clisp.org>
  
         * de.po: Update from Karl Eichwalder <ke@suse.de>.
diff --git a/gettext-tools/po/POTFILES.in b/gettext-tools/po/POTFILES.in

index 787a1d162cb1ce26e7d0544a2c6e61fb76d6cfc3..936c36ae4e884aa68af526cac2c6b79d96735ceb 100644 (file)
--- a/gettext-tools/po/POTFILES.in
+++ b/gettext-tools/po/POTFILES.in
@@ -31,6 +31,7 @@ src/format-java.c
  src/format-librep.c
  src/format-lisp.c
  src/format-pascal.c
+src/format-perl.c
  src/format-php.c
  src/format-python.c
  src/format-tcl.c
@@ -40,6 +41,7 @@ src/msgattrib.c
  src/msgcat.c
  src/msgcmp.c
  src/msgcomm.c
+src/format-perl.c
  src/msgconv.c
  src/msgen.c
  src/msgexec.c
@@ -73,6 +75,7 @@ src/x-elisp.c
  src/x-glade.c
  src/x-librep.c
  src/x-lisp.c
+src/x-perl.c
  src/x-php.c
  src/x-po.c
  src/x-python.c
diff --git a/gettext-tools/src/ChangeLog b/gettext-tools/src/ChangeLog

index cf763ada93aef9ed71076d62b938d110e881d11e..9f6753a5f8f0eff443cb33724ed4342dd9499603 100644 (file)
--- a/gettext-tools/src/ChangeLog
+++ b/gettext-tools/src/ChangeLog
@@ -1,4 +1,21 @@
-2003-05-08  Bruno Haible  <bruno@clisp.org>
+2003-06-11  Guido Flohr  <guido@imperia.net>
+
+       * message.h (enum format_type): New enum value 'format_perl'.
+       (NFORMATS): Increment.
+       * message.c (format_language, format_language_pretty): Add entry
+       for perl.
+       * format.h (formatstring_perl): New declaration.
+       * format-perl.c: New file.
+       * format.c (formatstring_parsers): Add entry for perl.
+       * x-perl.h: New file.
+       * x-perl.c: New file.
+       * xgettext.c: Include x-perl.h.
+       (main): Call x_perl_extract_all, x_perl_keyword.
+       (usage): Mention Perl language.
+       (language_to_extractor): Add Perl rule.
+       (extension_to_language): Add Perl rule.
+
+2003-06-08  Bruno Haible  <bruno@clisp.org>
  
         * Makefile.vms (LDADD): Take options from .opt files.
         (libiconv.opt): New rule.
diff --git a/gettext-tools/src/format-perl.c b/gettext-tools/src/format-perl.c

new file mode 100644 (file)

index 0000000..fb48cdd
--- /dev/null
+++ b/gettext-tools/src/format-perl.c
@@ -0,0 +1,327 @@
+/* Perl format strings.
+   Copyright (C) 2002-2003 Free Software Foundation, Inc.
+   Written by Guido Flohr <guido@imperia.net>, 2003.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
+#include <alloca.h>
+
+#include <stdbool.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "format.h"
+#include "xmalloc.h"
+#include "error.h"
+#include "progname.h"
+#include "gettext.h"
+#include "hash.h"
+
+#define _(str) gettext (str)
+
+/* Perl format strings are currently quite simple.  They consist of 
+   place-holders embedded in the string.
+
+   messageFormatPattern := string ("[" messageFormatElement "]" string)*
+   
+   messageFormatElement := [_A-Za-z][_0-9A-Za-z]+
+
+   However, C format strings are also allowed and used.  The following
+   functions are therefore decorators for the C format checker, and
+   will only fall back to Perl format if the C check is negative.  */
+struct spec
+{
+  unsigned int directives;
+  hash_table hash;
+  void* c_format;
+};
+
+static void*
+format_parse (const char* string, char** invalid_reason)
+{
+  char* last_pos = (char*) string;
+  char* pos;
+   struct spec* spec;
+  void* c_format = formatstring_c.parse (string, invalid_reason);
+
+  if (c_format == NULL)
+    return NULL;
+
+  spec = xcalloc (1, sizeof (struct spec));
+
+  spec->c_format = c_format;
+
+  init_hash (&spec->hash, 13);
+  while ((pos = strchr (last_pos, '['))) {
+    char* start = pos + 1;
+    last_pos = start;
+
+    if (!(*last_pos == '_' 
+         || (*last_pos >= 'A' && *last_pos <= 'Z') 
+         || (*last_pos >= 'a' && *last_pos <= 'z')))
+      continue;
+    
+    ++last_pos;
+    
+    while (*last_pos == '_' 
+          || (*last_pos >= '0' && *last_pos <= '9') 
+          || (*last_pos >= 'A' && *last_pos <= 'Z') 
+          || (*last_pos >= 'a' && *last_pos <= 'z'))
+      ++last_pos;
+
+    if (*last_pos == ']') 
+      {
+       size_t len = last_pos - start;
+       int* hits;
+
+       if (0 == find_entry (&spec->hash, start, len, (void**) &hits))
+         {
+           ++(*hits);
+         }
+       else
+         {
+           hits = xmalloc (sizeof *hits);
+           *hits = 1;
+           insert_entry (&spec->hash, start, len, hits);
+         }
+       ++last_pos;
+       ++spec->directives;
+      }
+  }
+
+  return spec;
+}
+
+static void
+format_free (void* description)
+{
+  void* ptr = NULL;
+  const void* key;
+  size_t keylen;
+  void* data;
+
+  struct spec* spec = (struct spec*) description;
+
+  if (spec != NULL)
+    {
+      while (0 == iterate_table (&spec->hash, &ptr, &key, &keylen, &data))
+       free (data);
+
+      delete_hash (&spec->hash);
+      
+      if (spec->c_format)
+       formatstring_c.free (spec->c_format);
+      
+      free (spec);
+    }
+}
+
+static int
+format_get_number_of_directives (void* description)
+{
+  int c_directives = 0;
+
+  struct spec* spec = (struct spec*) description;
+  if (spec->c_format)
+    c_directives = formatstring_c.get_number_of_directives (spec->c_format);
+
+  return c_directives + spec->directives;
+}
+
+static bool 
+format_check (const lex_pos_ty* pos, void* msgid_descr, void* msgstr_descr, 
+             bool equality, bool noisy, const char* pretty_msgstr) 
+{
+  struct spec* spec1 = (struct spec *) msgid_descr;
+  struct spec* spec2 = (struct spec *) msgstr_descr;
+  bool result = false;
+  void* ptr = NULL;
+  const void* key;
+  size_t keylen;
+  int* hits1;
+  int* hits2;
+
+  /* First check the perl arguments.  We are probably faster than format-c.  */
+  if (equality)
+    {
+      /* Pass 1: Check that every format specification in msgid has its
+        counterpart in msgstr.  This is only necessary for equality.  */
+      while (0 == iterate_table (&spec1->hash, &ptr, &key, 
+                                &keylen, (void**) &hits1))
+       {
+         if (0 == find_entry (&spec2->hash, key, keylen, (void**) &hits2))
+           {
+             if (*hits1 != *hits2)
+               {
+                 result = true;
+                 if (noisy)
+                   {
+                     char* argname = alloca (keylen + 1);
+                     memcpy (argname, key, keylen);
+                     argname[keylen] = 0;
+                     
+                     /* The next message shows the limitations of ngettext.
+                        It is not smart enough to say "once", "twice",
+                        "thrice/%d times", "%d times", ..., and it cannot
+                        grok with two variable arguments.  We have to
+                        work around the problem.  */
+                     error_with_progname = false;
+                     error_at_line (0, 0, pos->file_name, pos->line_number,
+                                _("\
+appearances of named argument '[%s]' do not match \
+(%d in original string, %d in '%s')"),
+                                    argname, *hits1, *hits2, pretty_msgstr);
+                     error_with_progname = true;
+                   }
+                 else
+                   return true;
+               }
+           }
+         else
+           {
+             result = true;
+             if (noisy)
+               {
+                 char* argname = alloca (keylen + 1);
+                 memcpy (argname, key, keylen);
+                 argname[keylen] = 0;
+                 
+                 error_with_progname = false;
+                 error_at_line (0, 0, pos->file_name, pos->line_number,
+                                _("\
+named argument '[%s]' appears in original string but not in '%s'"),
+                                argname, pretty_msgstr);
+                 error_with_progname = true;
+               }
+             else
+               return true;
+           }
+       }
+    }
+
+  /* Pass 2: Check that the number of format specifications in msgstr
+     does not exceed the number of appearances in msgid.  */
+  ptr = NULL;
+  while (0 == iterate_table (&spec2->hash, &ptr, &key, &keylen, 
+                            (void**) &hits2))
+    {
+      if (0 != find_entry (&spec1->hash, key, keylen, (void**) &hits1))
+       {
+         result = true;
+         if (noisy)
+           {
+             char* argname = alloca (keylen + 1);
+             memcpy (argname, key, keylen);
+             argname[keylen] = 0;
+             
+             error_with_progname = false;
+             error_at_line (0, 0, pos->file_name, pos->line_number,
+                            _("\
+named argument '[%s]' appears only in '%s' but not in the original string"),
+                            argname, pretty_msgstr);
+             error_with_progname = true;
+           }
+         else
+           return true;
+       }
+    }
+
+  if (spec1->c_format && spec2->c_format)
+    {
+      result |= formatstring_c.check (pos, spec1->c_format, spec2->c_format,
+                                     equality, noisy, pretty_msgstr);
+    }
+
+  return result;
+}
+
+struct formatstring_parser formatstring_perl =
+  {
+    format_parse,
+    format_free,
+    format_get_number_of_directives,
+    format_check
+  };
+
+#ifdef TEST
+
+/* Test program: Print the argument list specification returned by
+   format_parse for strings read from standard input.  */
+
+#include <stdio.h>
+#include "getline.h"
+
+static void
+format_print (descr, line)
+     void* descr;
+     const char* line;
+{
+  struct spec *spec = (struct spec *) descr;
+  char* ptr = NULL;
+  char* key = NULL;
+  size_t keylen;
+  int data;
+  
+  printf ("%s=> ", line);
+  
+  if (spec == NULL)
+    {
+      printf ("INVALID\n");
+      return;
+    }
+  
+  while (iterate_table (&spec->hash, (void**) &ptr, (const void**) &key, 
+                       &keylen, (void**) &data) == 0) 
+    {
+      key[keylen - 1] = '\0';
+      printf (">>>[%s]<<< ", key);
+    }
+  
+  printf ("\n");
+}
+
+int
+main ()
+{
+  char *line = NULL;
+  size_t line_len = 0;
+  void *descr;
+  
+  for (;;)
+    {      
+      if (getline (&line, &line_len, stdin) < 0)
+       break;
+      
+      descr = format_parse (line);
+      
+      format_print (descr, line);
+      
+      format_free (descr);
+    }
+  
+  return 0;
+}
+
+/*
+ * For Emacs M-x compile
+ * Local Variables:
+ * compile-command: "/bin/sh ../libtool --mode=link gcc -o a.out -static -O -g -Wall -I.. -I../lib -I../intl -DHAVE_CONFIG_H -DTEST format-perl.c ../lib/libgettextlib.la"
+ * End:
+ */
+
+#endif /* TEST */
diff --git a/gettext-tools/src/format.c b/gettext-tools/src/format.c

index 00660a78039adef6f9840cf7773ffcad4608396d..02e25216c83f7376e25d5b558d850541f438b060 100644 (file)
--- a/gettext-tools/src/format.c
+++ b/gettext-tools/src/format.c
@@ -34,6 +34,7 @@ struct formatstring_parser *formatstring_parsers[NFORMATS] =
    /* format_smalltalk */       &formatstring_smalltalk,
    /* format_java */            &formatstring_java,
    /* format_awk */             &formatstring_awk,
+  /* format_perl */            &formatstring_perl,
    /* format_pascal */          &formatstring_pascal,
    /* format_ycp */             &formatstring_ycp,
    /* format_tcl */             &formatstring_tcl,
diff --git a/gettext-tools/src/format.h b/gettext-tools/src/format.h

index f2857215c5497d75d7757e762903eab287c2a6fe..bcc2ab3ca83f6c1c899d02560c4f1e8acc18aff8 100644 (file)
--- a/gettext-tools/src/format.h
+++ b/gettext-tools/src/format.h
@@ -63,6 +63,7 @@ extern struct formatstring_parser formatstring_librep;
  extern struct formatstring_parser formatstring_smalltalk;
  extern struct formatstring_parser formatstring_java;
  extern struct formatstring_parser formatstring_awk;
+extern struct formatstring_parser formatstring_perl;
  extern struct formatstring_parser formatstring_pascal;
  extern struct formatstring_parser formatstring_ycp;
  extern struct formatstring_parser formatstring_tcl;
diff --git a/gettext-tools/src/message.c b/gettext-tools/src/message.c

index 6306bc18c5c4554d7085f6e354766467d1a43ff8..2d1a860706367c600993bd6bf5c749b77e60ce06 100644 (file)
--- a/gettext-tools/src/message.c
+++ b/gettext-tools/src/message.c
@@ -42,6 +42,7 @@ const char *const format_language[NFORMATS] =
    /* format_smalltalk */       "smalltalk",
    /* format_java */            "java",
    /* format_awk */             "awk",
+  /* format_perl */            "perl",
    /* format_pascal */          "object-pascal",
    /* format_ycp */             "ycp",
    /* format_tcl */             "tcl",
@@ -58,6 +59,7 @@ const char *const format_language_pretty[NFORMATS] =
    /* format_smalltalk */       "Smalltalk",
    /* format_java */            "Java",
    /* format_awk */             "awk",
+  /* format_perl */            "Perl",
    /* format_pascal */          "Object Pascal",
    /* format_ycp */             "YCP",
    /* format_tcl */             "Tcl",
diff --git a/gettext-tools/src/message.h b/gettext-tools/src/message.h

index 393ede215a5a395d911197b6b10b96186b19e513..5a89f0adf4a04f0c2333255d8264ed9b24c8fdb1 100644 (file)
--- a/gettext-tools/src/message.h
+++ b/gettext-tools/src/message.h
@@ -45,9 +45,10 @@ enum format_type
    format_pascal,
    format_ycp,
    format_tcl,
-  format_php
+  format_php,
+  format_perl
  };
-#define NFORMATS 12    /* Number of format_type enum values.  */
+#define NFORMATS 13    /* Number of format_type enum values.  */
  extern const char *const format_language[NFORMATS];
  extern const char *const format_language_pretty[NFORMATS];
  
diff --git a/gettext-tools/src/x-perl.c b/gettext-tools/src/x-perl.c

new file mode 100644 (file)

index 0000000..e1be7c5
--- /dev/null
+++ b/gettext-tools/src/x-perl.c
@@ -0,0 +1,3152 @@
+/* xgettext Perl backend.
+   Copyright (C) 2002-2003 Free Software Foundation, Inc.
+
+   This file was written by Guido Flohr <guido@imperia.net>, 2002-2003.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+
+#ifdef HAVE_CONFIG_H
+# include "config.h"
+#endif
+
+#include <errno.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "message.h"
+#include "x-perl.h"
+#include "xgettext.h"
+#include "error.h"
+#include "progname.h"
+#include "xmalloc.h"
+#include "exit.h"
+#include "gettext.h"
+#include "ucs4-utf8.h"
+#include "uniname.h"
+
+#define _(s) gettext(s)
+
+#if HAVE_C_BACKSLASH_A
+# define ALERT_CHAR '\a'
+#else
+# define ALERT_CHAR '\7'
+#endif
+
+#define ever (;;)
+
+/* The Perl syntax is defined in perlsyn.pod.  Try the command
+   "perldoc perlsyn".  */
+
+#define DEBUG_PERL 0
+#define DEBUG_MEMORY 0
+
+/* FIXME: All known Perl operators should be listed here.  It does not 
+   cost that much and it may improve the stability of the parser.  */
+enum token_type_ty
+{
+  token_type_eof,
+  token_type_lparen,           /* ( */
+  token_type_rparen,           /* ) */
+  token_type_comma,            /* , */
+  token_type_fat_comma,                /* => */
+  token_type_dereference,      /* , */
+  token_type_semicolon,         /* ; */
+  token_type_lbrace,            /* { */
+  token_type_rbrace,            /* } */
+  token_type_lbracket,          /* [ */
+  token_type_rbracket,          /* ] */
+  token_type_string,           /* quote-like */
+  token_type_named_op,          /* if, unless, while, ... */
+  token_type_variable,          /* $... */
+  token_type_symbol,           /* symbol, number */
+  token_type_regex_op,         /* s, tr, y, m.  */
+  token_type_keyword_symbol,    /* keyword symbol (used by parser) */
+  token_type_dot,               /* . */
+  token_type_other             /* regexp, misc. operator */
+};
+typedef enum token_type_ty token_type_ty;
+
+/* Subtypes for strings, important for interpolation.  */
+enum string_type_ty
+{
+    string_type_verbatim,     /* "<<'EOF'", "m'...'", "s'...''...'", 
+                                "tr/.../.../", "y/.../.../".  */
+    string_type_q,            /* "'..'", "q/.../".  */
+    string_type_qq,           /* '"..."', "`...`", "qq/.../", "qx/.../",
+                                "<file*glob>".  */
+    string_type_qr,           /* Not supported.  */
+};
+typedef enum string_type_ty string_type_ty;
+
+typedef struct token_ty token_ty;
+struct token_ty
+{
+  token_type_ty type;
+  string_type_ty string_type;
+  char *string;                /* for token_type_{symbol,string} */
+  int line_number;
+};
+
+static token_ty token_buf[2];
+
+#if DEBUG_PERL
+static const char*
+token2string (token)
+     token_ty* token;
+{
+  switch (token->type)
+    {
+    case token_type_eof:
+      return "token_type_eof";
+    case token_type_lparen:
+      return "token_type_lparen";
+    case token_type_rparen:
+      return "token_type_rparen";
+    case token_type_comma:
+      return "token_type_comma";
+    case token_type_fat_comma:
+      return "token_type_fat_comma";
+    case token_type_dereference:
+      return "token_type_dereference";
+    case token_type_semicolon:
+      return "token_type_semicolon";
+    case token_type_lbrace:
+      return "token_type_lbrace";
+    case token_type_rbrace:
+      return "token_type_rbrace";
+    case token_type_lbracket:
+      return "token_type_lbracket";
+    case token_type_rbracket:
+      return "token_type_rbracket";
+    case token_type_string:
+      return "token_type_string";
+    case token_type_named_op:
+      return "token_type_named_op";
+    case token_type_variable:
+      return "token_type_variable";
+    case token_type_symbol:
+      return "token_type_symbol";
+    case token_type_regex_op:
+      return "token_type_regex_op";
+    case token_type_keyword_symbol:
+      return "token_type_keyword_symbol";
+    case token_type_dot:
+      return "token_type_dot";
+    case token_type_other:
+      return "token_type_other";
+    default:
+      return "unknown";
+    }
+}
+#endif
+
+struct stack_entry
+{
+  struct stack_entry* next;
+  struct stack_entry* prev;
+  void* data;
+  void (*destroy) PARAMS ((token_ty* data));
+};
+
+struct stack
+{
+  struct stack_entry* first;
+  struct stack_entry* last;
+};
+
+struct stack* token_stack;
+
+/* Prototypes for local functions.  Needed to ensure compiler checking of
+   function argument counts despite of K&R C function definition syntax.  */
+static void init_keywords PARAMS ((void));
+static int phase1_getc PARAMS ((void));
+static void phase1_ungetc PARAMS ((int c));
+static char* get_here_document PARAMS ((const char* delimiter));
+static void skip_pod PARAMS ((void));
+static int phase2_getc PARAMS ((void));
+static void phase2_ungetc PARAMS ((int c));
+static inline void free_token PARAMS ((token_ty *tp));
+static void extract_variable PARAMS ((message_list_ty* mlp, 
+                                     token_ty *tp, int c));
+static void interpolate_keywords PARAMS ((message_list_ty* mlp, 
+                                         const char* string));
+static void extract_quotelike PARAMS ((token_ty *tp, int delim));
+static void extract_triple_quotelike PARAMS ((message_list_ty* mlp,
+                                             token_ty *tp, int delim,
+                                             bool interpolate));
+static char* extract_quotelike_pass1 PARAMS ((int delim));
+static void extract_quotelike_pass3 PARAMS ((token_ty* tp, int error_level));
+static char* extract_hex PARAMS ((char* string, size_t len, 
+                                 unsigned int* result));
+static char* extract_oct PARAMS ((char* string, size_t len, 
+                                 unsigned int* result));
+static token_ty* x_perl_lex PARAMS ((message_list_ty* mlp));
+static void x_perl_prelex PARAMS ((message_list_ty* mlp, token_ty* tp));
+static void x_perl_unlex PARAMS ((token_ty* tp));
+static char* collect_message PARAMS ((message_list_ty* mlp, token_ty* tp, 
+                                     int error_level));
+static bool extract_balanced PARAMS ((message_list_ty *mlp,
+                                     int arg_sg, int arg_pl,
+                                     int state, token_type_ty delim));
+static void stack_push PARAMS ((struct stack* stack, void* data, 
+                               void (*destroy) PARAMS ((token_ty* data))));
+static void stack_unshift PARAMS ((struct stack* stack, void* data, 
+                                  void (*destroy) PARAMS ((token_ty* data))));
+static void* stack_pop PARAMS ((struct stack* stack));
+static void* stack_shift PARAMS ((struct stack* stack));
+static void* stack_head PARAMS ((struct stack* stack));
+static void stack_free PARAMS ((struct stack* stack));
+
+#if DEBUG_MEMORY
+static void* xmalloc_debug PARAMS ((size_t bytes));
+static void* xrealloc_debug PARAMS ((void* buf, size_t bytes));
+static void* xrealloc_static_debug PARAMS ((void* buf, size_t bytes));
+static void* xcalloc_debug PARAMS ((size_t nitems, size_t bytes));
+static char* xstrdup_debug PARAMS ((const char* string));
+static void free_debug PARAMS ((void* buf));
+static message_ty *remember_a_message_debug PARAMS ((message_list_ty *mlp,
+                                                    char *string, 
+                                                    lex_pos_ty *pos));
+static void remember_a_message_plural_debug PARAMS ((message_ty *mp,
+                                                    char *string, 
+                                                    lex_pos_ty *pos));
+
+
+static message_ty*
+remember_a_message_debug (mlp, string, pos)
+     message_list_ty* mlp;
+     char* string;
+     lex_pos_ty* pos;
+{
+  void* retval;
+  
+  fprintf (stderr, "*** remember_a_message (%p): ", string); fflush (stderr);
+  retval = remember_a_message (mlp, string, pos);
+  fprintf (stderr, "%p\n", retval); fflush (stderr);
+  return retval;
+}
+
+static void
+remember_a_message_plural_debug (mp, string, pos)
+     message_ty* mp;
+     char* string;
+     lex_pos_ty* pos;
+{
+  fprintf (stderr, "*** remember_a_message_plural (%p, %p): ", mp, string); 
+  fflush (stderr);
+  remember_a_message_plural (mp, string, pos);
+  fprintf (stderr, "done\n"); fflush (stderr);  
+}
+
+static void*
+xmalloc_debug (bytes)
+     size_t bytes;
+{
+  void* retval;
+
+  fprintf (stderr, "*** xmalloc (%u): ", bytes); fflush (stderr);
+  retval = xmalloc (bytes);
+  fprintf (stderr, "%p\n", retval); fflush (stderr);
+  return retval;
+}
+
+static void*
+xrealloc_debug (buf, bytes)
+     void* buf;
+     size_t bytes;
+{
+  void* retval;
+
+  fprintf (stderr, "*** xrealloc (%p, %u): ", buf, bytes); fflush (stderr);
+  retval = xrealloc (buf, bytes);
+  fprintf (stderr, "%p\n", retval); fflush (stderr);
+  return retval;
+}
+
+static void*
+xrealloc_static_debug (buf, bytes)
+     void* buf;
+     size_t bytes;
+{
+  void* retval;
+
+  fprintf (stderr, "*** xrealloc_static (%p, %u): ", buf, bytes); 
+  fflush (stderr);
+  retval = xrealloc (buf, bytes);
+  fprintf (stderr, "%p\n", retval); fflush (stderr);
+  return retval;
+}
+
+static void*
+xcalloc_debug (nitems, bytes)
+     size_t nitems;
+     size_t bytes;
+{
+  void* retval;
+
+  fprintf (stderr, "*** xcalloc (%u, %u): ", nitems, bytes); fflush (stderr);
+  retval = xcalloc (nitems, bytes);
+  fprintf (stderr, "%p\n", retval); fflush (stderr);
+  return retval;
+}
+
+static char*
+xstrdup_debug (string)
+     const char* string;
+{
+  char* retval;
+
+  fprintf (stderr, "*** xstrdup (%p, %d): ", string, strlen (string)); 
+  fflush (stderr);
+  retval = xstrdup (string);
+  fprintf (stderr, "%p\n", retval); fflush (stderr);
+  return retval;
+}
+
+static void
+free_debug (buf)
+     void* buf;
+{
+  fprintf (stderr, "*** free (%p): ", buf); fflush (stderr);
+  free (buf);
+  fprintf (stderr, "done\n"); fflush (stderr);
+}
+
+# define xmalloc(b) xmalloc_debug (b)
+# define xrealloc(b, s) xrealloc_debug (b, s)
+# define xcalloc(n, b) xcalloc_debug (n, b)
+# define xstrdup(s) xstrdup_debug (s)
+# define free(b) free_debug (b)
+
+# define xrealloc_static(b, s) xrealloc_static_debug (b, s)
+
+#define remember_a_message(m, s, p) remember_a_message_debug (m, s, p)
+#define remember_a_message_plural(m, s, p) \
+    remember_a_message_plural_debug (m, s, p)
+
+#else
+# define xrealloc_static(b, s) xrealloc (b, s)
+#endif
+
+#if DEBUG_PERL
+/* Dumps all resources allocated by stack STACK.  */
+static int
+stack_dump (stack)
+     struct stack* stack;
+{
+  struct stack_entry* last = stack->last;
+
+  fprintf (stderr, "BEGIN STACK DUMP\n");
+  while (last) 
+    {
+      struct stack_entry* next = last->prev;
+
+      if (last->data) 
+       {
+         token_ty* token = (token_ty*) last->data;
+         fprintf (stderr, "  [%s]\n", token2string (token));
+         fflush (stderr);
+         switch (token->type)
+           {
+           case token_type_named_op:
+           case token_type_string:
+           case token_type_symbol:
+           case token_type_keyword_symbol:
+           case token_type_variable:
+             fprintf (stderr, "    string: %s\n", token->string);
+             fflush (stderr);
+             break;
+           }
+       }
+      last = next;
+    }
+  fprintf (stderr, "END STACK DUMP\n");
+  return 1;
+}
+#endif
+
+/* Pushes the pointer DATA onto the stack STACK.  The argument DESTROY
+ * is a pointer to a function that frees the resources associated with 
+ * DATA or NULL (no destructor).
+ */
+static void
+stack_push (stack, data, destroy)
+     struct stack* stack;
+     void* data;
+     void (*destroy) PARAMS ((token_ty* data));
+{
+    struct stack_entry* entry = xmalloc (sizeof (struct stack_entry));
+
+    if (stack->last == NULL) {
+       stack->first = entry;
+    } else {
+       stack->last->next = entry;
+    }
+
+    entry->prev = stack->last;
+    entry->next = NULL;
+    entry->data = data;
+    entry->destroy = destroy;
+    stack->last = entry;
+}
+
+/* Unshifts the pointer DATA onto the stack STACK.  The argument DESTROY
+ * is a pointer to a function that frees the resources associated with 
+ * DATA or NULL (no destructor).
+ */
+static void
+stack_unshift (stack, data, destroy)
+     struct stack* stack;
+     void* data;
+     void (*destroy) PARAMS ((token_ty* data));
+{
+    struct stack_entry* entry = xmalloc (sizeof (struct stack_entry));
+
+    if (stack->first == NULL) 
+      {
+       stack->last = entry;
+      } 
+    else 
+      {
+       stack->first->prev = entry;
+      }
+    
+    entry->next = stack->first;
+    entry->prev = NULL;
+    entry->data = data;
+    entry->destroy = destroy;
+
+    stack->first = entry;
+}
+
+/* Pops the last element from the stack STACK and returns its contents or
+ * NULL if the stack is empty.
+ */
+static void*
+stack_pop (stack)
+    struct stack* stack;
+{
+    struct stack_entry* entry = stack->last;
+    struct stack_entry* last;
+    void* data;
+
+    if (!entry)
+       return NULL;
+
+    last = entry->prev;
+    if (last) {
+       stack->last = last;
+       last->next = NULL;
+    } else {
+       stack->first = stack->last = NULL;
+    }
+
+    data = entry->data;
+    free (entry);
+
+    return data;
+}
+
+/* Shifts the first element from the stack STACK and returns its contents or
+ * NULL if the stack is empty.
+ */
+static void*
+stack_shift (stack)
+    struct stack* stack;
+{
+    struct stack_entry* entry = stack->first;
+    void* data;
+
+    if (!entry)
+       return NULL;
+
+    stack->first = entry->next;
+    if (!stack->first)
+       stack->last = NULL;
+    else 
+       stack->first->prev = NULL;
+
+    data = entry->data;
+    free (entry);
+
+    return data;
+}
+
+/* Return the bottom of the stack without removing it from the stack or
+ * NULL if the stack is empty.
+ */
+static void*
+stack_head (stack)
+    struct stack* stack;
+{
+    struct stack_entry* entry = stack->first;
+    struct stack_entry* last;
+    void* data;
+
+    if (!entry)
+       return NULL;
+
+    data = entry->data;
+
+    return data;
+}
+
+/* Frees all resources allocated by stack STACK.  */
+static void
+stack_free (stack)
+     struct stack* stack;
+{
+    struct stack_entry* last = stack->last;
+
+    while (last) {
+       struct stack_entry* next = last->prev;
+       if (last->data && last->destroy) {
+           last->destroy (last->data);
+       }
+       free (last);
+       last = next;
+    }
+}
+
+/* ====================== Keyword set customization.  ====================== */
+
+/* If true extract all strings.  */
+static bool extract_all = false;
+
+static hash_table keywords;
+static bool default_keywords = true;
+
+
+void
+x_perl_extract_all ()
+{
+  extract_all = true;
+}
+
+
+void
+x_perl_keyword (name)
+     const char *name;
+{
+  if (name == NULL)
+    default_keywords = false;
+  else
+    {
+      const char *end;
+      int argnum1;
+      int argnum2;
+      const char *colon;
+
+      if (keywords.table == NULL)
+       init_hash (&keywords, 100);
+
+      split_keywordspec (name, &end, &argnum1, &argnum2);
+
+      /* The characters between name and end should form a valid C identifier.
+        A colon means an invalid parse in split_keywordspec().  */
+      colon = strchr (name, ':');
+      if (colon == NULL || colon >= end)
+       {
+         if (argnum1 == 0)
+           argnum1 = 1;
+         insert_entry (&keywords, name, end - name,
+                       (void *) (long) (argnum1 + (argnum2 << 10)));
+       }
+    }
+}
+
+/* Finish initializing the keywords hash table.
+   Called after argument processing, before each file is processed.  */
+static void
+init_keywords ()
+{
+  if (default_keywords)
+    {
+      x_perl_keyword ("gettext");
+      x_perl_keyword ("%gettext");
+      x_perl_keyword ("$gettext");
+      x_perl_keyword ("dgettext:2");
+      x_perl_keyword ("dcgettext:2");
+      x_perl_keyword ("ngettext:1,2");
+      x_perl_keyword ("dngettext:2,3");
+      x_perl_keyword ("dcngettext:2,3");
+      x_perl_keyword ("gettext_noop");
+      default_keywords = false;
+    }
+}
+
+
+/* ================== Reading of characters and tokens.  =================== */
+
+/* Real filename, used in error messages about the input file.  */
+static const char *real_file_name;
+
+/* Logical filename and line number, used to label the extracted messages.  */
+static char *logical_file_name;
+static int line_number;
+
+/* The input file stream.  */
+static FILE *fp;
+
+/* These are for tracking whether comments count as immediately before
+   keyword.  */
+static int last_comment_line;
+static int last_non_comment_line;
+
+/* The current line buffer.  */
+char* linebuf;
+
+/* The size of the current line.  */
+int linesize;
+
+/* The position in the current line.  */
+int linepos;
+
+/* The size of the input buffer.  */
+size_t linebuf_size;
+
+/* The last token seen in the token stream.  This is important for the
+   interpretation of '?' and '/'.  */
+token_type_ty last_token;
+
+/* The last string token waiting for a dot operator or finishing.  */
+token_ty last_string;
+
+/* True if LAST_STRING is finished.  */
+bool last_string_finished;
+
+/* Number of lines eaten for here documents.  */
+int here_eaten;
+
+/* Paranoia: EOF marker for __END__ or __DATA__.  */
+bool end_of_file;
+
+/* 1. line_number handling.  */
+/* Returns the next character from the input stream or EOF.  */
+static int
+phase1_getc ()
+{
+  line_number += here_eaten;
+  here_eaten = 0;
+
+  if (end_of_file)
+    return EOF;
+
+  if (linepos >= linesize) 
+    {
+      linesize = getline (&linebuf, &linebuf_size, fp);
+      
+      if (linesize == EOF)
+       {
+         if (ferror (fp))
+           error (EXIT_FAILURE, errno, _("error while reading \"%s\""),
+                  real_file_name);
+         end_of_file = true;
+         return EOF;
+       }
+      
+      linepos = 0;
+      ++line_number;
+      
+      /* Undosify.  This is important for catching the end of <<EOF and
+        <<'EOF'.  We could rely on stdio doing this for us but you
+        it is not uncommon to to come across Perl scripts with CRLF
+        newline conventions on systems that do not follow this 
+        convention.  */
+      if (linesize >= 2 && linebuf[linesize - 1] == '\n'
+         && linebuf[linesize - 2] == '\r') 
+       {
+         linebuf[linesize - 2] = '\n';
+         linebuf[linesize - 1] = '\0';
+         --linesize;
+       }
+    }
+
+  return linebuf[linepos++];
+}
+
+static void
+phase1_ungetc (c)
+     int c;
+{
+  if (c != EOF)
+    {
+      if (linepos == 0)
+       error (EXIT_FAILURE, 0, _("\
+%s:%d: internal error: attempt to ungetc across line boundary"), 
+              real_file_name, line_number);
+
+      --linepos;
+    }
+}
+
+static char*
+get_here_document (delimiter)
+     const char* delimiter;
+{
+  static char* buffer;
+  static size_t bufmax = 0;
+  size_t bufpos = 0;
+  static char* my_linebuf = NULL;
+  static size_t my_linebuf_size = 0;
+  bool chomp = false;
+
+  if (bufpos >= bufmax)
+    {
+      buffer = xrealloc_static (NULL, 1);
+      buffer[0] = '\0';
+      bufmax = 1;
+    }
+
+  for ever
+    {
+      int read_bytes = getline (&my_linebuf, &my_linebuf_size, fp);
+  
+      if (read_bytes == EOF)
+       {
+         if (ferror (fp))
+           {
+             error (EXIT_FAILURE, errno, _("error while reading \"%s\""),
+                    real_file_name);
+           }
+         else
+           {
+             error_with_progname = false;
+             error (EXIT_SUCCESS, 0, _("\
+%s:%d: can\'t find string terminator \"%s\" anywhere before EOF"),
+                    real_file_name, line_number, delimiter);
+             error_with_progname = true;
+             fflush (stderr);
+
+             return xstrdup (buffer);
+           }
+       }
+
+      ++here_eaten;
+
+      /* Undosify.  This is important for catching the end of <<EOF and
+        <<'EOF'.  We could rely on stdio doing this for us but you
+        it is not uncommon to to come across Perl scripts with CRLF
+        newline conventions on systems that do not follow this 
+        convention.  */
+      if (read_bytes >= 2 && my_linebuf[read_bytes - 1] == '\n'
+         && my_linebuf[read_bytes - 2] == '\r') 
+       {
+         my_linebuf[read_bytes - 2] = '\n';
+         my_linebuf[read_bytes - 1] = '\0';
+         --read_bytes;
+       }
+
+      if (read_bytes && my_linebuf[read_bytes - 1] == '\n')
+       {
+         chomp = true;
+         my_linebuf[read_bytes - 1] = '\0';
+       }
+      if (0 == strcmp (my_linebuf, delimiter))
+       {
+         return xstrdup (buffer);
+       }
+      if (chomp)
+       {
+         my_linebuf[read_bytes - 1] = '\n';
+       }
+
+      if (bufpos + read_bytes + 1 >= bufmax)
+       {
+         bufmax += read_bytes + 1;
+         buffer = xrealloc_static (buffer, bufmax);
+       }
+      strcpy (buffer + bufpos, my_linebuf);
+      bufpos += read_bytes;
+    }
+}
+
+/* Skips pod sections.  */
+static void
+skip_pod ()
+{
+  line_number += here_eaten;
+  here_eaten = 0;
+  linepos = 0;
+
+  for ever
+    {
+      linesize = getline (&linebuf, &linebuf_size, fp);
+      
+      if (linesize == EOF)
+       {
+         if (ferror (fp))
+           error (EXIT_FAILURE, errno, _("error while reading \"%s\""),
+                  real_file_name);
+         return;
+       }
+      
+      ++line_number;
+      
+      if (strncmp ("=cut", linebuf, 4) == 0)
+       {
+         /* Force reading of a new line on next call to phase1_getc().  */
+         linepos = linesize;
+         return;
+       }
+    }
+}
+
+/* 2. Replace each comment that is not inside a string literal or regular
+   expression with a newline character.  We need to remember the comment
+   for later, because it may be attached to a keyword string.  */
+
+static int
+phase2_getc ()
+{
+  static char *buffer;
+  static size_t bufmax;
+  size_t buflen;
+  int lineno;
+  int c;
+
+  c = phase1_getc ();
+  if (c == '#')
+    {
+      buflen = 0;
+      lineno = line_number;
+      /* Skip leading whitespace.  */
+      for ever
+       {
+         c = phase1_getc ();
+         if (c == EOF)
+           break;
+         if (c != ' ' && c != '\t' && c != '\r' && c != '\f')
+           {
+             phase1_ungetc (c);
+             break;
+           }
+       }
+      for ever
+       {
+         c = phase1_getc ();
+         if (c == '\n' || c == EOF)
+           break;
+         if (buflen >= bufmax)
+           {
+             bufmax += 100;
+             buffer = xrealloc_static (buffer, bufmax);
+           }
+         buffer[buflen++] = c;
+       }
+      if (buflen >= bufmax)
+       {
+         bufmax += 100;
+         buffer = xrealloc_static (buffer, bufmax);
+       }
+      buffer[buflen] = '\0';
+      xgettext_comment_add (buffer);
+      last_comment_line = lineno;
+    }
+  return c;
+}
+
+static void
+phase2_ungetc (c)
+     int c;
+{
+  if (c != EOF)
+    phase1_ungetc (c);
+}
+
+/* There is an ambiguity about '/': It can start a division operator ('/' or
+   '/=') or it can start a regular expression.  The distinction is important
+   because inside regular expressions, '#' and '"' lose its special meanings.
+   If you look at the awk grammar, you see that the operator is only allowed
+   right after a 'variable' or 'simp_exp' nonterminal, and these nonterminals
+   can only end in the NAME, LENGTH, YSTRING, YNUMBER, ')', ']' terminals.
+   So we prefer the division operator interpretation only right after
+   symbol, string, number, ')', ']', with whitespace but no newline allowed
+   in between.  */
+static bool prefer_division_over_regexp;
+
+/* Free the memory pointed to by a 'struct token_ty'.  */
+static inline void
+free_token (tp)
+     token_ty *tp;
+{
+  switch (tp->type)
+    {
+    case token_type_named_op:
+    case token_type_string:
+    case token_type_symbol:
+    case token_type_keyword_symbol:
+    case token_type_variable:
+      free (tp->string);
+      break;
+    default:
+      break;
+    }
+  free (tp);
+}
+
+/* Extract an unsigned hexadecimal number from STRING, considering at
+   most LEN bytes and place the result in RESULT.  Returns a pointer
+   to the first character past the hexadecimal number.  */
+static char* 
+extract_hex (string, len, result)
+     char* string;
+     size_t len;
+     unsigned int* result;
+{
+  size_t i;
+
+  *result = 0;
+  
+  for (i = 0; i < len; ++i)
+    {
+      int number;
+
+      if (string[i] >= 'A' && string[i] <= 'F')
+       number = 10 + string[i] - 'A';
+      else if (string[i] >= 'a' && string[i] <= 'f')
+       number = 10 + string[i] - 'a';
+      else if (string[i] >= '0' && string[i] <= '9')
+       number = string[i] - '0';
+      else
+       break;
+
+      *result <<= 4;
+      *result |= number;
+    }
+
+  return string + i;
+}
+
+/* Extract an unsigned octal number from STRING, considering at
+   most LEN bytes and place the result in RESULT.  Returns a pointer
+   to the first character past the hexadecimal number.  */
+static char* 
+extract_oct (string, len, result)
+     char* string;
+     size_t len;
+     unsigned int* result;
+{
+  size_t i;
+
+  *result = 0;
+  
+  for (i = 0; i < len; ++i)
+    {
+      int number;
+
+      if (string[i] >= '0' && string[i] <= '7')
+       number = string[i] - '0';
+      else
+       break;
+
+      *result <<= 3;
+      *result |= number;
+    }
+
+  return string + i;
+}
+
+/* Extract the various quotelike constructs except for <<EOF.  See the
+   section "Gory details of parsing quoted constructs" in perlop.pod.  */
+static void
+extract_quotelike (tp, delim)
+     token_ty* tp;
+     int delim;
+{
+  char* string = extract_quotelike_pass1 (delim);
+  tp->type = token_type_string;
+  
+  string[strlen (string) - 1] = '\0';
+  tp->string = xstrdup (string + 1);
+  free (string);
+  return;
+}
+
+/* Extract the quotelike constructs with double delimiters, like
+   s/[SEARCH]/[REPLACE]/.  This function does not eat up trailing
+   modifiers (left to the caller).  */
+static void
+extract_triple_quotelike (mlp, tp, delim, interpolate)
+     message_list_ty* mlp;
+     token_ty* tp;
+     int delim;
+     bool interpolate;
+{
+  char* string = extract_quotelike_pass1 (delim);
+
+  tp->type = token_type_regex_op;
+  if (interpolate && !extract_all && delim != '\'')
+    interpolate_keywords (mlp, string);
+
+  free (string);
+
+  if (delim == '(' || delim == '<' || delim == '{' || delim == '[')
+    {
+      /* Things can change.  */
+      delim = phase1_getc ();
+      while (delim == ' ' || delim == '\t' || delim == '\r' 
+            || delim == '\n' || delim  == '\f')
+       {
+         /* The hash-sign is not a valid delimiter after whitespace, ergo
+            use phase2_getc() and not phase1_getc() now.  */
+         delim = phase2_getc ();
+       }
+    }
+  string = extract_quotelike_pass1 (delim);
+  if (interpolate && !extract_all && delim != '\'')
+    interpolate_keywords (mlp, string);
+  free (string);
+
+  return;
+}
+
+/* Pass 1 of extracting quotes: Find the end of the string, regardless
+   of the semantics of the construct.  */
+static char*
+extract_quotelike_pass1 (delim)
+     int delim;
+{
+  /* This function is called recursively.  No way to allocate stuff
+     statically.  Consider using alloca() instead.  */
+  char *buffer = xmalloc (100);
+  int bufmax = 100;
+  int bufpos = 0;
+  bool nested = true;
+  int counter_delim;
+
+  buffer[bufpos++] = delim;
+
+  /* Find the closing delimiter.  */
+  switch (delim) 
+    {
+    case '(':
+      counter_delim = ')';
+      break;
+    case '{':
+      counter_delim = '}';
+      break;
+    case '[':
+      counter_delim = ']';
+      break;
+    case '<':
+      counter_delim = '>';
+      break;
+    default:
+      nested = false;
+      counter_delim = delim;
+      break;
+    }
+
+  for ever
+    {
+      int c = phase1_getc ();
+
+      if (bufpos >= bufmax - 1)
+       {
+         bufmax += 100;
+         buffer = xrealloc (buffer, bufmax);
+       }
+
+      if (c == counter_delim || c == EOF)
+       {
+         /* Copying the EOF (actually 255) is not an error.  It will
+            be stripped off later.  */
+         buffer[bufpos++] = c;
+         buffer[bufpos++] = '\0';
+#if DEBUG_PERL
+         fprintf (stderr, "PASS1: %s\n", buffer);
+#endif
+         return buffer;
+       }
+     
+      if (nested && c == delim)
+       {
+         char* inner = extract_quotelike_pass1 (delim);
+         size_t len = strlen (inner);
+
+         if (bufpos + len >= bufmax)
+           {
+             bufmax += len;
+             buffer = xrealloc (buffer, bufmax);
+           }
+         strcpy (buffer + bufpos, inner);
+         free (inner);
+         bufpos += len;
+         continue;
+       }
+
+      if (c == '\\')
+       {
+         c = phase1_getc ();
+         if (c == '\\')
+           {
+             buffer[bufpos++] = '\\';
+             buffer[bufpos++] = '\\';
+           }
+         else if (c == delim || c == counter_delim)
+           {
+             /* This is pass2 in Perl.  */
+             buffer[bufpos++] = c;
+           }
+         else
+           {
+             buffer[bufpos++] = '\\';
+             phase1_ungetc (c);
+           }
+       }
+      else
+       {
+         buffer[bufpos++] = c;
+       }
+    }
+}
+
+/* Perform pass 3 of quotelike extraction (interpolation).  */
+/* FIXME: Currently may writes null-bytes into the string.  */
+static void
+extract_quotelike_pass3 (tp, error_level)
+     token_ty* tp;
+     int error_level;
+{
+  static char *buffer;
+  static int bufmax = 0;
+  int bufpos = 0;
+  int delim = tp->string[0];
+  char* string = tp->string;
+  unsigned char* crs = string;
+
+  bool uppercase = false;
+  bool lowercase = false;
+  bool quotemeta = false;
+
+#if DEBUG_PERL
+  switch (tp->string_type)
+    {
+    case string_type_verbatim:
+      fprintf (stderr, "Interpolating string_type_verbatim:\n");
+      break;
+    case string_type_q:
+      fprintf (stderr, "Interpolating string_type_q:\n");
+      break;
+    case string_type_qq:
+      fprintf (stderr, "Interpolating string_type_qq:\n");
+      break;
+    case string_type_qr:
+      fprintf (stderr, "Interpolating string_type_qr:\n");
+      break;
+    }
+  fprintf (stderr, "%s\n", tp->string);
+  if (tp->string_type == string_type_verbatim)
+    fprintf (stderr, "---> %s\n", tp->string);
+#endif
+
+  if (tp->string_type == string_type_verbatim)
+    return;
+
+  while (*crs) 
+    {
+      if (bufpos >= bufmax - 6)
+       {
+         bufmax += 100;
+         buffer = xrealloc_static (buffer, bufmax);
+       }
+      
+      if (tp->string_type == string_type_q) 
+       {
+         switch (*crs)
+           {
+           case '\\':
+             if (crs[1] == '\\') 
+               {
+                 ++crs;
+                 buffer[bufpos++] = '\\';
+                 continue;
+               }
+             /* FALLTHROUGH */
+           default:
+             buffer[bufpos++] = *crs++;
+             break;
+           }
+         continue;
+       }
+      
+      /* We only get here for double-quoted strings or regular expressions.
+        Unescape escape sequences.  */
+      if (*crs == '\\')
+       {
+         switch (crs[1])
+           {
+           case 't':
+             crs += 2;
+             buffer[bufpos++] = '\t';
+             continue;
+           case 'n':
+             crs += 2;
+             buffer[bufpos++] = '\n';
+             continue;
+           case 'r':
+             crs += 2;
+             buffer[bufpos++] = '\r';
+             continue;
+           case 'f':
+             crs += 2;
+             buffer[bufpos++] = '\f';
+             continue;
+           case 'b':
+             crs += 2;
+             buffer[bufpos++] = '\b';
+             continue;
+           case 'a':
+             crs += 2;
+             buffer[bufpos++] = ALERT_CHAR;
+             continue;
+           case 'e':
+             crs += 2;
+             buffer[bufpos++] = 0x1b;
+             continue;
+           case '0': case '1': case '2': case '3':
+           case '4': case '5': case '6': case '7':
+             {
+               unsigned int oct_number;
+               int length;
+               
+               crs = extract_oct (crs + 1, 3, &oct_number);
+               length = u8_uctomb (buffer + bufpos, oct_number, 3);
+               if (length > 0)
+                 bufpos += length;
+             }
+             continue;
+           case 'x':
+             {
+               unsigned int hex_number = 0;
+               int length;
+               
+               ++crs;
+               
+               if (*crs == '{') 
+                 {
+                   char* end = strchr (crs, '}');
+                   if (end == NULL)
+                     {
+                       error_with_progname = false;
+                       error (error_level, 0, _("\
+%s:%d: missing right brace on \\x{HEXNUMBER}"), real_file_name, line_number);
+                       error_with_progname = true;
+                       ++crs;
+                       continue;
+                     }
+                   else
+                     {
+                       ++crs;
+                       (void) extract_hex (crs, 4, &hex_number);
+                     }
+                 }
+               else
+                 {
+                   crs = extract_hex (crs, 2, &hex_number);
+                 }
+               
+               length = u8_uctomb (buffer + bufpos, hex_number, 6);
+               if (length > 0)
+                 bufpos += length;
+             }
+             continue;
+           case 'c':
+             /* Perl's notion of control characters.  */
+             crs += 2;
+             if (*crs)
+               {
+                 int the_char = *crs;
+                 if (the_char >= 'a' || the_char <= 'z')
+                   the_char -= 0x20;
+                 buffer[bufpos++] = the_char + (the_char & 0x40 ? -64 : 64); 
+               }
+             continue;
+           case 'N':
+             crs += 2;
+             if (*crs == '{')
+               {
+                 char* name = xstrdup (crs + 1);
+                 char* end = strchr (name, '}');
+                 if (end != NULL)
+                   {
+                     unsigned int unicode;
+                     int length;
+                     
+                     *end = '\0';
+                     
+                     crs += 2 + strlen (name);
+                     unicode = unicode_name_character (name);
+                     if (unicode != UNINAME_INVALID)
+                       {
+                         length = u8_uctomb (buffer + bufpos, unicode, 6);
+                         if (length > 0)
+                           bufpos += length;
+                       }
+                   }
+                 free (name);
+               }
+             continue;
+           }
+       }
+
+      /* No escape sequence, go on.  */
+      if (*crs == '\\')
+       {
+         ++crs;
+         switch (*crs)
+           {
+           case 'E':
+             quotemeta = uppercase = lowercase = false;
+             ++crs;
+             continue;
+           case 'L':
+             quotemeta = uppercase = false;
+             lowercase = true;
+             ++crs;
+             continue;
+           case 'U':
+             quotemeta = lowercase = false;
+             uppercase = true;
+             ++crs;
+             continue;
+           case 'Q':
+             uppercase = lowercase = false;
+             quotemeta = true;
+             ++crs;
+             continue;
+           case 'l':
+             ++crs;
+             if (crs[1] >= 'A' && crs[1] <= 'Z')
+               {
+                 buffer[bufpos++] = crs[1] + 0x20;
+                 ++crs;
+               }
+             else if (crs[1] >= 0x80)
+               {
+                 error_with_progname = false;
+                 error (error_level, 0, _("\
+%s:%d: illegal interpolation (\"\\l\") of 8bit character \"%c\""), 
+                        real_file_name, line_number, *crs);
+                 error_with_progname = true;
+                 ++crs;
+               }
+             else
+               ++crs;
+             continue;
+           case 'u':
+             ++crs;
+             if (crs[1] >= 'a' && crs[1] <= 'z')
+               {
+                 buffer[bufpos++] = crs[1] - 0x20;
+                 ++crs;
+               }
+             else if (crs[1] >= 0x80)
+               {
+                 error_with_progname = false;
+                 error (error_level, 0, _("\
+%s:%d: illegal interpolation (\"\\u\") of 8bit character \"%c\""), 
+                        real_file_name, line_number, *crs);
+                 error_with_progname = true;
+                 ++crs;
+               }
+             else
+               ++crs;
+             continue;
+           case '\\':
+             if (crs[1])
+               buffer[bufpos++] = crs[1];
+             crs++;
+             continue;
+           }
+       }
+
+      
+      if (*crs == '$' || *crs == '@')
+       {
+         ++crs;
+         error_with_progname = false;
+         error (error_level, 0, _("\
+%s:%d: illegal variable interpolation"), 
+                real_file_name, line_number, *crs); 
+         error_with_progname = true;
+       }
+      else if (lowercase)
+       { 
+         if (*crs >= 'A' && *crs <= 'Z')
+           buffer[bufpos++] = 0x20 + *crs++;
+         else if (*crs >= 0x80)
+           {
+             error_with_progname = false;
+             error (error_level, 0, _("\
+%s:%d:illegal interpolation (\"\\L\") of 8bit character \"%c\""), 
+                    real_file_name, line_number, *crs);
+             error_with_progname = true;
+             buffer[bufpos++] = *crs++;
+           }
+         else
+           buffer[bufpos++] = *crs++;
+       }
+      else if (uppercase)
+       {
+         if (*crs >= 'a' && *crs <= 'z')
+           buffer[bufpos++] = *crs++ - 0x20;
+         else if (*crs >= 0x80)
+           {
+             error_with_progname = false;
+             error (error_level, 0, _("\
+%s:%d: illegal interpolation (\"\\U\") of 8bit character \"%c\""), 
+                    real_file_name, line_number, *crs);
+             error_with_progname = true;
+             buffer[bufpos++] = *crs++;
+           }
+         else
+           buffer[bufpos++] = *crs++;
+       }
+      else if (quotemeta)
+       {
+         buffer[bufpos++] = *crs++;
+       }
+      else
+       {
+         buffer[bufpos++] = *crs++;
+       }
+    }
+
+  if (bufpos >= bufmax - 1)
+  {
+      bufmax += 100;
+      buffer = xrealloc_static (buffer, bufmax);
+  }
+
+  buffer[bufpos++] = '\0';
+
+#if DEBUG_PERL
+  fprintf (stderr, "---> %s\n", buffer);
+#endif
+
+  free (tp->string);
+  tp->string = xstrdup (buffer);
+}
+
+/* Parse a variable.  This is done in several steps:
+ *
+ * 1) Consume all leading occurcencies of '$', '@', '%', and '*'.
+ * 2) Determine the name of the variable from the following input
+ * 3) Parse possible following hash keys or array indexes.
+ */
+static void
+extract_variable (mlp, tp, first)
+     message_list_ty* mlp;
+     token_ty* tp;
+     int first;
+{
+  static char* buffer;
+  static int bufmax = 0;
+  int bufpos = 0;
+  int c = first;
+  size_t varbody_length = 0;
+  bool maybe_hash_deref = false;
+  bool maybe_hash_value = false;
+
+  tp->type = token_type_variable;
+
+#if DEBUG_PERL
+  fprintf (stderr, "%s:%d: extracting variable type '%c'\n",
+          real_file_name, line_number, first);
+#endif
+
+  /* 
+   * 1) Consume dollars and so on (not euros ...).  Unconditionally 
+   *    accepting the hash sign (#) will maybe lead to inaccurate
+   *    results.  FIXME!
+   */
+  while (c == '$' || c == '*' || c == '#' || c == '@' || c == '%')
+    {
+      if (bufpos >= bufmax)
+       {
+         bufmax += 100;
+         buffer = xrealloc_static (buffer, bufmax);
+       }
+      buffer[bufpos++] = c;
+      c = phase1_getc ();
+    }
+
+  if (c == EOF) 
+    {
+      tp->type = token_type_eof;
+      return;
+    }
+  
+  /* Hash references are treated in a special way, when looking for
+     our keywords.  */
+  if (buffer[0] == '$')
+    {
+      if (bufpos == 1)
+       maybe_hash_value = true;
+      else if (bufpos == 2 && buffer[1] == '$')
+       {
+         if (c != '{' && c != '_' && (!((c >= '0' && c <= '9')
+                                        || (c >= 'A' && c <= 'Z')
+                                        || (c >= 'a' && c <= 'z')
+                                        || c == ':' || c == '\'' 
+                                        || c >= 0x80)))
+           {
+             /* Special variable $$ for pid.  */
+             if (bufpos >= bufmax)
+               {
+                 bufmax += 100;
+                 buffer = xrealloc_static (buffer, bufmax);
+               }
+             buffer[bufpos++] = '\0';
+             tp->string = xstrdup (buffer);
+#if DEBUG_PERL
+             fprintf (stderr, "%s:%d: is PID ($$)\n",
+                      real_file_name, line_number);
+#endif
+
+             phase1_ungetc (c);
+             return;
+           }
+
+         maybe_hash_deref = true;
+         bufpos = 1;
+       }
+    }
+
+  /* 
+   * 2) Get the name of the variable.  The first character is practically
+   *    arbitrary.  Punctuation and numbers automagically put a variable
+   *    in the global namespace but that subtle difference is not interesting
+   *    for us.  
+   */
+  if (bufpos >= bufmax)
+    {
+      bufmax += 100;
+      buffer = xrealloc_static (buffer, bufmax);
+    }
+  if (c == '{')
+    {
+      /* Yuck, we cannot accept ${gettext} as a keyword...  Except for
+       * debugging purposes it is also harmless, that we suppress the
+       * real name of the variable.
+       */
+#if DEBUG_PERL
+      fprintf (stderr, "%s:%d: braced {variable_name}\n",
+              real_file_name, line_number);
+#endif
+
+      if (extract_balanced (mlp, -1, -1, 0, token_type_rbrace))
+       return;
+      buffer[bufpos++] = c;
+    }
+  else
+    {
+      while ((c >= 'A' && c <= 'Z') ||
+            (c >= 'a' && c <= 'z') ||
+            (c >= '0' && c <= '9') ||
+            c == '_' || c == ':' || c == '\'' || c >= 0x80)
+       {
+         ++varbody_length;
+         if (bufpos >= bufmax)
+           {
+             bufmax += 100;
+             buffer = xrealloc_static (buffer, bufmax);
+           }
+         buffer[bufpos++] = c;
+         c = phase1_getc ();
+       }
+      phase1_ungetc (c);
+    }
+
+  if (bufpos >= bufmax - 1)
+    {
+      bufmax += 100;
+      buffer = xrealloc_static (buffer, bufmax);
+    }
+  buffer[bufpos++] = '\0';
+
+  /* Probably some strange Perl variable like $`.  */
+  if (varbody_length == 0)
+    {
+      c = phase1_getc ();
+      if (c == EOF || c == ' ' || c == '\n' || c == '\r' 
+         || c == '\f' || c == '\t')
+       phase1_ungetc (c);  /* Loser.  */
+      else
+       {
+         buffer[bufpos - 1] = c;
+         buffer[++bufpos] = '\0';
+       }
+    }
+  tp->string = xstrdup (buffer);
+
+#if DEBUG_PERL
+  fprintf (stderr, "%s:%d: complete variable name: %s\n",
+          real_file_name, line_number, tp->string);
+#endif
+  prefer_division_over_regexp = true;
+
+  /* 
+   * 3) If the following looks strange to you, this is valid Perl syntax:
+   *
+   *      $var = $$hashref    # We can place a
+   *                          # comment here and then ...
+   *             {key_into_hashref};
+   *
+   *    POD sections are not allowed but we leave complaints about
+   *    that to the compiler/interpreter.
+   */
+  /* We only extract strings from the first hash key (if present).  */
+
+  if (maybe_hash_deref || maybe_hash_value)
+    {
+      bool is_dereference = false;
+      int c = phase2_getc ();
+
+      while (c == ' ' || c == '\t' || c == '\r' || c == '\n' || c == '\f')
+         c = phase2_getc ();
+
+      if (c == '-')
+       {
+         int c2 = phase1_getc ();
+
+         if (c2 == '>')
+           {
+             is_dereference = true;
+             c = phase2_getc ();
+             while (c == ' ' || c == '\t' || c == '\r' 
+                    || c == '\n' || c == '\f')
+                 c = phase2_getc ();
+           }
+         else if (c2 != '\n')
+           {
+             /* Discarding the newline is harmless here.  The only 
+                special character recognized after a minus is greater-than
+                for dereference.  However, the sequence "-\n>" that we
+                treat incorrectly here, is a syntax error.  */
+               phase1_ungetc (c2);
+           }
+       }
+
+      if (maybe_hash_value && is_dereference)
+       {
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: first keys preceded by \"->\"\n",
+                  real_file_name, line_number);
+#endif
+       }
+      else if (maybe_hash_value)
+       {
+         /* Fake it into a hash.  */
+         tp->string[0] = '%';
+       }
+
+      /* Do NOT change that into else if (see above).  */
+      if ((maybe_hash_value || maybe_hash_deref) && c == '{')
+       {
+         void *keyword_value;
+
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: first keys preceded by '{'\n",
+                  real_file_name, line_number);
+#endif
+
+         if (0 == find_entry (&keywords, tp->string, strlen (tp->string),
+                              &keyword_value))
+           {
+             /* Extract a possible string from the key.  Before proceeding
+                we check whether the open curly is followed by a symbol and
+                then by a right curly.  */
+             token_ty* t1 = x_perl_lex (mlp);
+
+#if DEBUG_PERL
+             fprintf (stderr, "%s:%d: extracting string key\n",
+                      real_file_name, line_number);
+#endif
+
+             if (t1->type == token_type_symbol 
+                 || t1->type == token_type_named_op) 
+               {
+                 token_ty* t2 = x_perl_lex (mlp);
+                 if (t2->type == token_type_rbrace)
+                   {
+                     lex_pos_ty pos;
+                     pos.line_number = line_number;
+                     pos.file_name = logical_file_name;
+                     remember_a_message (mlp, xstrdup (t1->string), &pos);
+                     free_token (t2);
+                     free_token (t1);
+                   }
+                 else
+                   {
+                     x_perl_unlex (t2);
+                   }
+               }
+             else
+               {
+                 x_perl_unlex (t1);
+                 if (extract_balanced (mlp, 1, -1, 1, token_type_rbrace))
+                   return;
+               }
+           }
+         else
+           {
+             phase2_ungetc (c);
+           }
+       }
+      else
+       {
+         phase2_ungetc (c);
+       }
+    }
+
+  /* Now consume "->", "[...]", and "{...}".  */
+  for ever
+    {
+      int c = phase2_getc ();
+      int c2;
+
+      switch (c)
+       {
+       case '{':
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: extracting balanced '{' after varname\n",
+                  real_file_name, line_number);
+#endif
+         extract_balanced (mlp, -1, -1, 0, token_type_rbrace);
+         break;
+       case '[':
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: extracting balanced '[' after varname\n",
+                  real_file_name, line_number);
+#endif
+         extract_balanced (mlp, -1, -1, 0, token_type_rbracket);
+         break;
+       case '-':
+         c2 = phase1_getc ();
+         if (c2 == '>')
+           {
+#if DEBUG_PERL
+             fprintf (stderr, "%s:%d: another \"->\" after varname\n",
+                      real_file_name, line_number);
+#endif
+             break;
+           }     
+         else if (c2 != '\n')
+           {
+             /* Discarding the newline is harmless here.  The only 
+                special character recognized after a minus is greater-than
+                for dereference.  However, the sequence "-\n>" that we
+                treat incorrectly here, is a syntax error.  */
+             phase1_ungetc (c2);
+           }
+         /* FALLTHROUGH */
+
+       default:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: variable finished\n",
+                  real_file_name, line_number);
+#endif
+         phase2_ungetc (c);
+         return;
+       }
+    }
+}
+
+/* Actually a simplified version of extract_variable().  It searches for
+ * variables inside a double-quoted string that may interpolate to
+ * some keyword hash (reference).
+ */
+static void
+interpolate_keywords (mlp, string)
+     message_list_ty* mlp;
+     const char* string;
+{
+  static char* buffer;
+  static int bufmax = 0;
+  int bufpos = 0;
+  int c = string[0];
+  bool maybe_hash_deref = false;
+  enum parser_state
+    {
+      initial,
+      one_dollar,
+      two_dollars,
+      identifier,
+      minus,
+      wait_lbrace,
+      wait_quote,
+      dquote,
+      squote,
+      barekey,
+      wait_rbrace,
+    } state;
+  token_ty token;
+
+  lex_pos_ty pos;
+             
+  /* States are:
+   * 
+   * initial:      initial
+   * one_dollar:   dollar sign seen in state INITIAL
+   * two_dollars:  another dollar-sign has been seen in state ONE_DOLLAR
+   * identifier:   a valid identifier character has been seen in state 
+   *               ONE_DOLLAR or TWO_DOLLARS
+   * minus:        a minus-sign has been seen in state IDENTIFIER
+   * wait_lbrace:  a greater-than has been seen in state MINUS
+   * wait_quote:   a left brace has been seen in state IDENTIFIER or in
+   *               state WAIT_LBRACE
+   * dquote:       a double-quote has been seen in state WAIT_QUOTE
+   * squote:       a single-quote has been seen in state WAIT_QUOTE
+   * barekey:      an bareword character has been seen in state WAIT_QUOTE
+   * wait_rbrace:  closing quote has been seen in state DQUOTE or SQUOTE
+   */
+  state = initial;
+  
+  token.type = token_type_string;
+  token.string_type = string_type_qq;
+  token.line_number = line_number;
+  pos.file_name = logical_file_name;
+  pos.line_number = line_number;
+
+  while (c = *string++)
+    {
+      void *keyword_value;
+
+      if (state == initial)
+       bufpos = 0;
+
+      if (bufpos >= bufmax - 1)
+       {
+         bufmax += 100;
+         buffer = xrealloc_static (buffer, bufmax);
+       }
+      
+      switch (state)
+       {
+       case initial:
+         switch (c)
+           {
+           case '\\':
+             c = *string++;
+             if (!c) return;
+             break;
+           case '$':
+             buffer[bufpos++] = '$';
+             maybe_hash_deref = false;
+             state = one_dollar;
+             break;
+           default:
+             break;
+           }
+         break;
+       case one_dollar:
+         switch (c)
+           {
+           case '$':
+             /* 
+              * This is enough to make us believe later that we dereference
+              * a hash reference.
+              */
+             maybe_hash_deref = true;
+             state = two_dollars;
+             break;
+           default:
+             if (c == '_' || c == ':' || c == '\'' || c >= 0x80
+                 || (c >= 'A' && c <= 'Z')
+                 || (c >= 'a' && c <= 'z')
+                 || (c >= '0' && c <= '9'))
+               {
+                 buffer[bufpos++] = c;
+                 state = identifier;
+               }
+             else
+               state = initial;
+             break;
+           }
+         break;
+       case two_dollars:
+         if (c == '_' || c == ':' || c == '\'' || c >= 0x80
+             || (c >= 'A' && c <= 'Z')
+             || (c >= 'a' && c <= 'z')
+             || (c >= '0' && c <= '9'))
+           {
+             buffer[bufpos++] = c;
+             state = identifier;
+             break;
+           }
+         else
+           {
+             state = initial;
+           }
+         break;
+       case identifier:
+         switch (c)
+           {
+           case '-':
+             if (0 == find_entry (&keywords, buffer, bufpos,
+                                  &keyword_value))
+               {
+                 state = minus;
+               }
+             else
+               state = initial;
+             break;
+           case '{':
+             if (!maybe_hash_deref)
+               {
+                 buffer[0] = '%';
+               }
+             if (0 == find_entry (&keywords, buffer, bufpos,
+                                  &keyword_value))
+               {
+                 state = wait_quote;
+               }
+             else
+               state = initial;
+             break;
+           default: 
+             if (c == '_' || c == ':' || c == '\'' || c >= 0x80
+                 || (c >= 'A' && c <= 'Z')
+                 || (c >= 'a' && c <= 'z')
+                 || (c >= '0' && c <= '9'))
+               {
+                 buffer[bufpos++] = c;
+               }
+             else
+               {
+                 state = initial;
+               }
+             break;
+           }
+         break;
+       case minus:
+         switch (c)
+           {
+           case '>':
+             state = wait_lbrace;
+             break;
+           default:
+             state = initial;
+             break;
+           }
+         break;
+       case wait_lbrace:
+         switch (c)
+           {
+           case '{':
+             state = wait_quote;
+             break;
+           default:
+             state = initial;
+             break;
+           }
+         break;
+       case wait_quote:
+         switch (c)
+           {
+           case ' ':
+           case '\n':
+           case '\t':
+           case '\r':
+           case '\f':
+             break;
+           case '\'':
+             bufpos = 0;
+             state = squote;
+             break;
+           case '"':
+             bufpos = 0;
+             state = dquote;
+             break;
+           default:
+             if (c == '_' || (c >= '0' && c <= '9') || c >= 0x80
+                 || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'))
+               {
+                 state = barekey;
+                 bufpos = 0;
+                 buffer[bufpos++] = c;
+               }
+             else
+               state = initial;
+             break;
+           }
+         break;
+       case dquote:
+         switch (c)
+           {
+           case '"':
+             /* The resulting string has te be interpolated twice.  */
+             buffer[bufpos] = '\0';
+             token.string = xstrdup (buffer);
+             extract_quotelike_pass3 (&token, EXIT_FAILURE);
+             /* The string can only shrink with interpolation (because
+                we ignore \Q).  */
+             strcpy (buffer, token.string);
+             free (token.string);
+             state = wait_rbrace;
+             break;
+           case '\\':
+             if (string[0] == '\"')
+               {
+                 buffer[bufpos++] = string++[0];
+               }
+             else if (string[0])
+               {
+                 buffer[bufpos++] = '\\';
+                 buffer[bufpos++] = string++[0];
+               }
+             else
+               {
+                 state = initial;
+               }
+             break;
+           default:
+             buffer[bufpos++] = c;
+             break;
+           }
+         break;
+       case squote:
+         switch (c)
+           {
+           case '\'':
+             state = wait_rbrace;
+             break;
+           case '\\':
+             if (string[0] == '\'')
+               {
+                 buffer[bufpos++] = string++[0];
+               }
+             else if (string[0])
+               {
+                 buffer[bufpos++] = '\\';
+                 buffer[bufpos++] = string++[0];
+               }
+             else
+               {
+                 state = initial;
+               }
+             break;
+           default:
+             buffer[bufpos++] = c;
+             break;
+           }
+         break;
+       case barekey:
+         {
+           if (c == '_' || (c >= '0' && c <= '9') || c >= 0x80
+               || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'))
+             {
+               buffer[bufpos++] = c;
+               break;
+             }
+           else if (c == ' ' || c == '\n' || c == '\t' 
+                    || c == '\r' || c == '\f')
+             {
+               state = wait_rbrace;
+               break;
+             }
+           else if (c != '}')
+             {
+               state = initial;
+               break;
+             }
+           /* Must be right brace.  */
+         }
+         /* FALLTHROUGH */
+       case wait_rbrace:
+         switch (c)
+           {
+           case ' ':
+           case '\n':
+           case '\t':
+           case '\r':
+           case '\f':
+             break;
+           case '}':
+             buffer[bufpos] = '\0';
+             token.string = xstrdup (buffer);
+             extract_quotelike_pass3 (&token, EXIT_FAILURE);
+
+             remember_a_message (mlp, token.string, &pos);
+             /* FALLTHROUGH */
+           default:
+             state = initial;
+             break;
+           }
+         break;
+       }
+    }
+}
+
+/* Combine characters into tokens.  Discard whitespace.  */
+
+static void
+x_perl_prelex (mlp, tp)
+     message_list_ty* mlp;
+     token_ty *tp;
+{
+  static char *buffer;
+  static int bufmax;
+  int bufpos;
+  int c;
+
+  for ever
+    {
+      c = phase2_getc ();
+      tp->line_number = line_number;
+
+      switch (c)
+       {
+       case EOF:
+         tp->type = token_type_eof;
+         return;
+
+       case '\n':
+         if (last_non_comment_line > last_comment_line)
+           xgettext_comment_reset ();
+         /* FALLTHROUGH */
+       case '\t':
+       case ' ':
+         /* Ignore whitespace.  */
+         continue;
+
+       case '%':
+       case '@':
+       case '*':
+       case '$':
+         extract_variable (mlp, tp, c);
+         prefer_division_over_regexp = true;
+         return;
+       }
+
+      last_non_comment_line = tp->line_number;
+
+      switch (c)
+       {
+       case '.':
+         {
+           int c2 = phase1_getc ();
+           phase1_ungetc (c2);
+           if (c2 == '.')
+             {
+               tp->type = token_type_other;
+               prefer_division_over_regexp = false;
+               return;
+             }
+           else if (c2 >= '0' && c2 <= '9')
+             {
+               prefer_division_over_regexp = false;
+             }
+           else
+             {
+               tp->type = token_type_dot;
+               prefer_division_over_regexp = true;
+               return;
+             }
+         }
+         /* FALLTHROUGH */
+       case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
+       case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
+       case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
+       case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
+       case 'Y': case 'Z':
+       case '_':
+       case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
+       case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
+       case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
+       case 's': case 't': case 'u': case 'v': case 'w': case 'x':
+       case 'y': case 'z':
+       case '0': case '1': case '2': case '3': case '4':
+       case '5': case '6': case '7': case '8': case '9':
+         /* Symbol, or part of a number.  */
+         prefer_division_over_regexp = true;
+         bufpos = 0;
+         for ever
+           {
+             int c2;
+
+             if (bufpos >= bufmax)
+               {
+                 bufmax += 100;
+                 buffer = xrealloc_static (buffer, bufmax);
+               }
+             buffer[bufpos++] = c;
+             c = phase1_getc ();
+             switch (c)
+               {
+               case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
+               case 'G': case 'H': case 'I': case 'J': case 'K': case 'L':
+               case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R':
+               case 'S': case 'T': case 'U': case 'V': case 'W': case 'X':
+               case 'Y': case 'Z':
+               case '_':
+               case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
+               case 'g': case 'h': case 'i': case 'j': case 'k': case 'l':
+               case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
+               case 's': case 't': case 'u': case 'v': case 'w': case 'x':
+               case 'y': case 'z':
+               case '0': case '1': case '2': case '3': case '4':
+               case '5': case '6': case '7': case '8': case '9':
+                 continue;
+
+               default:
+                 phase1_ungetc (c);
+                 break;
+               }
+             break;
+           }
+         if (bufpos >= bufmax)
+           {
+             bufmax += 100;
+             buffer = xrealloc_static (buffer, bufmax);
+           }
+         buffer[bufpos] = '\0';
+
+         if (strcmp (buffer, "__END__") == 0
+             || strcmp (buffer, "__DATA__") == 0)
+           {
+             end_of_file = true;
+             tp->type = token_type_eof;
+             return;
+           }
+         else if (strcmp (buffer, "and") == 0
+                  || strcmp (buffer, "cmp") == 0
+                  || strcmp (buffer, "eq") == 0
+                  || strcmp (buffer, "if") == 0
+                  || strcmp (buffer, "ge") == 0
+                  || strcmp (buffer, "gt") == 0
+                  || strcmp (buffer, "le") == 0
+                  || strcmp (buffer, "lt") == 0
+                  || strcmp (buffer, "ne") == 0
+                  || strcmp (buffer, "not") == 0
+                  || strcmp (buffer, "or") == 0
+                  || strcmp (buffer, "unless") == 0
+                  || strcmp (buffer, "while") == 0
+                  || strcmp (buffer, "xor") == 0)
+           {
+             tp->type = token_type_named_op;
+             tp->string = xstrdup (buffer);
+             prefer_division_over_regexp = false;
+             return;
+           }
+         else if (strcmp (buffer, "s") == 0
+                || strcmp (buffer, "y") == 0
+                || strcmp (buffer, "tr") == 0)
+           {
+             int delim = phase1_getc ();
+             while (delim == ' ' || delim == '\t' || delim == '\r'
+                    || delim == '\n' || delim == '\f')
+               {
+                 delim = phase2_getc ();
+               }
+             if (delim == EOF)
+               {
+                 tp->type = token_type_eof;
+                 return;
+               }
+             if ((delim >= '0' && delim <= '9')
+                 || (delim >= 'A' && delim <= 'Z')
+                 || (delim >= 'a' && delim <= 'z'))
+               {
+                 /* False positive.  */
+                 tp->type = token_type_symbol;
+                 phase2_ungetc (delim);
+                 tp->string = xstrdup (buffer);
+                 prefer_division_over_regexp = true;
+                 return;
+               }
+             extract_triple_quotelike (mlp, tp, delim, buffer[0] == 's');
+
+             /* Eat the following modifiers.  */
+             c = phase1_getc ();
+             while (c >= 'a' && c <= 'z')
+               c = phase1_getc ();
+             phase1_ungetc (c);
+             return;
+           }
+         else if (strcmp (buffer, "m") == 0)
+           {
+             int delim = phase1_getc ();
+
+             while (delim == ' ' || delim == '\t' || delim == '\r'
+                    || delim == '\n' || delim == '\f')
+               {
+                 delim = phase2_getc ();
+               }
+             if (delim == EOF)
+               {
+                 tp->type = token_type_eof;
+                 return;
+               }
+             if ((delim >= '0' && delim <= '9')
+                 || (delim >= 'A' && delim <= 'Z')
+                 || (delim >= 'a' && delim <= 'z'))
+               {
+                 /* False positive.  */
+                 tp->type = token_type_symbol;
+                 phase2_ungetc (delim);
+                 tp->string = xstrdup (buffer);
+                 prefer_division_over_regexp = true;
+                 return;
+               }
+             extract_quotelike (tp, delim);
+             if (!extract_all && delim != '\'')
+                 interpolate_keywords (mlp, tp->string);
+
+             free (tp->string);
+             tp->type = token_type_regex_op;
+             prefer_division_over_regexp = true;
+
+             /* Eat the following modifiers.  */
+             c = phase1_getc ();
+             while (c >= 'a' && c <= 'z')
+               c = phase1_getc ();
+             phase1_ungetc (c);
+             return;
+           }
+         else if (strcmp (buffer, "qq") == 0
+                  || strcmp (buffer, "q") == 0
+                  || strcmp (buffer, "qx") == 0
+                  || strcmp (buffer, "qw") == 0
+                  || strcmp (buffer, "qr") == 0)
+           {
+             /* The qw (...) construct is not really a string but we
+                can treat in the same manner and then pretend it is 
+                a symbol.  Rationale: Saying "qw (foo bar)" is the
+                same as "my @list = ('foo', 'bar'); @list;".  */
+
+             int delim = phase1_getc ();
+
+             while (delim == ' ' || delim == '\t' || delim == '\r'
+                    || delim == '\n' || delim == '\f')
+               {
+                 delim = phase2_getc ();
+               }
+             if (delim == EOF)
+               {
+                 tp->type = token_type_eof;
+                 return;
+               }
+             prefer_division_over_regexp = true;
+
+             if ((delim >= '0' && delim <= '9')
+                 || (delim >= 'A' && delim <= 'Z')
+                 || (delim >= 'a' && delim <= 'z'))
+               {
+                 /* False positive.  */
+                 tp->type = token_type_symbol;
+                 phase2_ungetc (delim);
+                 tp->string = xstrdup (buffer);
+                 prefer_division_over_regexp = true;
+                 return;
+               }
+
+             extract_quotelike (tp, delim);
+
+             switch (buffer[1])
+               {
+               case 'q':
+               case 'x':
+                 tp->string_type = string_type_qq;
+                 tp->type = token_type_string;
+                 if (!extract_all)
+                   interpolate_keywords (mlp, tp->string);
+                 break;
+               case 'r':
+                 tp->type = token_type_regex_op;
+                 break;
+               case 'w':
+                 tp->type = token_type_symbol;
+                 break;
+               default: /* q\000 */
+                 tp->type = token_type_string;
+                 tp->string_type = string_type_q;
+                 break;
+               }
+             return;
+           }
+         else if (strcmp (buffer, "grep") == 0
+                  || strcmp (buffer, "split") == 0)
+           {
+             prefer_division_over_regexp = false;
+           }
+         tp->string = xstrdup (buffer);
+
+         tp->type = token_type_symbol;
+         return;
+
+       case '"':
+         prefer_division_over_regexp = true;
+         extract_quotelike (tp, c);
+         tp->string_type = string_type_qq;
+         if (!extract_all)
+           interpolate_keywords (mlp, tp->string);
+         return;
+
+       case '`':
+         prefer_division_over_regexp = true;
+         extract_quotelike (tp, c);
+         tp->string_type = string_type_qq;
+         if (!extract_all)
+           interpolate_keywords (mlp, tp->string);
+         return;
+
+       case '\'':
+         prefer_division_over_regexp = true;
+         extract_quotelike (tp, c);
+         tp->string_type = string_type_q;
+         return;
+
+       case '(':
+         c = phase2_getc ();
+         if (c == ')')
+           {
+             continue;  /* Ignore empty list.  */
+           }
+         else
+           phase2_ungetc (c);
+         tp->type = token_type_lparen;
+         prefer_division_over_regexp = false;
+         return;
+
+       case ')':
+         tp->type = token_type_rparen;
+         prefer_division_over_regexp = true;
+         return;
+
+       case '{':
+         tp->type = token_type_lbrace;
+         prefer_division_over_regexp = false;
+         return;
+
+       case '}':
+         tp->type = token_type_rbrace;
+         prefer_division_over_regexp = false;
+         return;
+
+       case '[':
+         tp->type = token_type_lbracket;
+         prefer_division_over_regexp = false;
+         return;
+
+       case ']':
+         tp->type = token_type_rbracket;
+         prefer_division_over_regexp = false;
+         return;
+
+       case ';':
+         tp->type = token_type_semicolon;
+         prefer_division_over_regexp = false;
+         return;
+
+       case ',':
+         tp->type = token_type_comma;
+         prefer_division_over_regexp = false;
+         return;
+
+       case '=':
+         /* Check for fat comma.  */
+         c = phase1_getc ();
+         if (c == '>') 
+           {
+             tp->type = token_type_fat_comma;
+             return;
+           }
+         else if (linepos == 2
+                  && (last_token == token_type_semicolon
+                      || last_token == token_type_rbrace)
+                  && ((c >= 'A' && c <='Z') 
+                      || (c >= 'a' && c <= 'z')))
+           {
+#if DEBUG_PERL
+             fprintf (stderr, "%s:%d: start pod section\n",
+                      real_file_name, line_number);
+#endif
+             skip_pod ();
+#if DEBUG_PERL
+             fprintf (stderr, "%s:%d: end pod section\n",
+                      real_file_name, line_number);
+#endif
+             continue;
+           }
+         phase1_ungetc (c);
+         tp->type = token_type_other;
+         prefer_division_over_regexp = false;
+         return;
+
+       case '<':
+         /* Check for <<EOF and friends.  */
+         prefer_division_over_regexp = false;
+         c = phase1_getc ();
+         if (c == '<') 
+           {
+             c = phase1_getc ();
+             if (c == '\'')
+               {
+                 char* string;
+                 extract_quotelike (tp, c);
+                 string = get_here_document (tp->string);
+                 free (tp->string);
+                 tp->string = string;
+                 tp->type = token_type_string;
+                 tp->string_type = string_type_verbatim;
+                 return;
+               }
+             else if (c == '"')
+               {
+                 char* string;
+                 extract_quotelike (tp, c);
+                 string = get_here_document (tp->string);
+                 free (tp->string);
+                 tp->string = string;
+                 tp->type = token_type_string;
+                 tp->string_type = string_type_qq;
+                 if (!extract_all)
+                   interpolate_keywords (mlp, tp->string);
+                 return;
+               }
+             else if ((c >= 'A' && c <= 'Z')
+                      || (c >= 'a' && c <= 'z')
+                      || c == '_')
+               {
+                 bufpos = 0;
+                 while ((c >= 'A' && c <= 'Z')
+                        || (c >= 'a' && c <= 'z')
+                        || (c >= '0' && c <= '9')
+                        || c == '_' || c >= 0x80)
+                   {
+                     if (bufpos >= bufmax)
+                       {
+                         bufmax += 100;
+                         buffer = xrealloc_static (buffer, bufmax);
+                       }
+                     buffer[bufpos++] = c;
+                     c = phase1_getc ();
+                   }
+                 if (c == EOF)
+                   {
+                     tp->type = token_type_eof;
+                     return;
+                   }
+                 else
+                   {
+                     char* string;
+
+                     phase1_ungetc (c);
+                     buffer[bufpos++] = '\0';
+                     string = get_here_document (buffer);
+                     tp->string = string;
+                     tp->type = token_type_string;
+                     tp->string_type = string_type_qq;
+                     if (!extract_all)
+                       interpolate_keywords (mlp, tp->string);
+                     return;
+                   }
+               }
+             else
+               {
+                 tp->type = token_type_other;
+                 return;
+               }
+           }
+         else
+           {
+             phase1_ungetc (c);
+             tp->type = token_type_other;
+           }
+         return;  /* End of case '>'.  */
+
+       case '-':
+         /* Check for dereferencing operator.  */
+         c = phase1_getc ();
+         if (c == '>') {
+           tp->type = token_type_dereference;
+           return;
+         }
+         phase1_ungetc (c);
+         tp->type = token_type_other;
+         prefer_division_over_regexp = false;
+         return;
+
+       case '/':
+       case '?':
+         if (!prefer_division_over_regexp)
+           {
+             extract_quotelike (tp, c);
+             if (!extract_all)
+               interpolate_keywords (mlp, tp->string);
+             free (tp->string);
+             tp->type = token_type_other;
+             prefer_division_over_regexp = true;
+             /* Eat the following modifiers.  */
+             c = phase1_getc ();
+             while (c >= 'a' && c <= 'z')
+               c = phase1_getc ();
+             phase1_ungetc (c);
+             return;
+           }
+         /* FALLTHROUGH */
+
+       default:
+         /* We could carefully recognize each of the 2 and 3 character
+            operators, but it is not necessary, as we only need to recognize
+            gettext invocations.  Don't bother.  */
+         tp->type = token_type_other;
+         prefer_division_over_regexp = false;
+         return;
+       }
+    }
+}
+
+static token_ty*
+x_perl_lex (mlp)
+     message_list_ty* mlp;
+{
+#if DEBUG_PERL
+  int dummy = stack_dump (token_stack);
+#endif
+  token_ty* tp = stack_shift (token_stack);
+
+  if (!tp)
+    {
+      tp = xmalloc (sizeof *tp);
+      x_perl_prelex (mlp, tp);
+#if DEBUG_PERL
+      fprintf (stderr, "%s:%d: x_perl_prelex returned %s\n",
+              real_file_name, line_number, token2string (tp));
+#endif
+    }
+#if DEBUG_PERL
+  else
+    {
+      fprintf (stderr, "%s:%d: %s recycled from stack\n",
+              real_file_name, line_number, token2string (tp));
+    }
+#endif
+
+  /* A symbol followed by a fat comma is really a single-quoted string.  */
+  if (tp->type == token_type_symbol || tp->type == token_type_named_op)
+    {
+      token_ty* next = stack_head (token_stack);
+
+      if (!next)
+       {
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: pre-fetching next token\n",
+                  real_file_name, line_number);
+         fflush (stderr);
+#endif      
+         next = x_perl_lex (mlp);
+         x_perl_unlex (next);
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: unshifted next token\n",
+                  real_file_name, line_number);
+#endif
+       }
+
+#if DEBUG_PERL
+      fprintf (stderr, "%s:%d: next token is %s\n",
+              real_file_name, line_number, token2string (next));
+#endif
+      
+      if (next->type == token_type_fat_comma)
+       {
+         tp->type = token_type_string;
+         tp->string_type = string_type_q;
+#if DEBUG_PERL
+         fprintf (stderr, 
+                  "%s:%d: token %s mutated to token_type_string\n",
+                  real_file_name, line_number, token2string (tp));
+#endif
+       }
+    }
+
+  return tp;
+}
+
+static void
+x_perl_unlex (tp)
+     token_ty* tp;
+{
+  stack_unshift (token_stack, tp, free_token);
+}
+
+/* ========================= Extracting strings.  ========================== */
+
+static char*
+collect_message (mlp, tp, error_level)
+     message_list_ty* mlp;
+     token_ty* tp;
+     int error_level;
+{
+  char* string;
+  size_t len;
+
+  extract_quotelike_pass3 (tp, error_level);
+  string = xstrdup (tp->string);
+  len = strlen (tp->string) + 1;
+
+  for ever
+    {
+      int c = phase2_getc ();
+      while (c == ' ' || c == '\t' || c == '\r' || c == '\n' || c == '\f')
+       c = phase2_getc ();
+      if (c != '.')
+       {
+         phase2_ungetc (c);
+         return string;
+       }
+
+      c = phase2_getc ();
+      while (c == ' ' || c == '\t' || c == '\r' || c == '\n' || c == '\f')
+       c = phase2_getc ();
+      phase2_ungetc (c);
+
+      if (c == '"' || c == '\'' || c == '`'
+         || (!prefer_division_over_regexp && (c == '/' || c == '?'))
+         || c == 'q')
+       {
+         token_ty* qstring = x_perl_lex (mlp);
+         if (qstring->type != token_type_string)
+           {
+             /* assert (qstring->type == token_type_symbol) */
+             x_perl_unlex (qstring);
+             return string;
+           }
+         
+         extract_quotelike_pass3 (qstring, error_level);
+         len += strlen (qstring->string);
+         string = xrealloc (string, len);
+         strcat (string, qstring->string);
+         free_token (qstring);
+       }
+    }
+}
+
+/* The file is broken into tokens.  Scan the token stream, looking for
+   a keyword, followed by a left paren, followed by a string.  When we
+   see this sequence, we have something to remember.  We assume we are
+   looking at a valid C or C++ program, and leave the complaints about
+   the grammar to the compiler.
+
+     Normal handling: Look for
+       keyword ( ... msgid ... )
+     Plural handling: Look for
+       keyword ( ... msgid ... msgid_plural ... )
+
+   We use recursion because the arguments before msgid or between msgid
+   and msgid_plural can contain subexpressions of the same form.  */
+
+
+/* Extract messages until the next balanced closing parenthesis.
+   Extracted messages are added to MLP.
+
+   When specific arguments shall be extracted, ARG_SG and ARG_PL are
+   set to the corresponding argument number or -1 if not applicable.
+
+   Returns the number of requested arguments consumed or -1 for eof.
+   If - instead of consuming requested arguments - a complete message
+   has been extracted, the return value will be sufficiently high to
+   avoid any mis-interpretation.  
+
+   States are:
+   
+   0 - initial state
+   1 - keyword has been seen
+   2 - extractable string has been seen
+   3 - a dot operator after an extractable string has been seen
+
+   States 2 and 3 are "fragile", the parser will remain in state 2
+   as long as only opening parentheses are seen, a transition to 
+   state 3 is done on appearance of a dot operator, all other tokens
+   will cause the parser to fall back to state 1 or 0, eventually
+   with an error message about illegal intermixing of constant and
+   non-constant strings.
+
+   Likewise, state 3 is fragile.  The parser will remain in state 3
+   as long as only closing parentheses are seen, a transition to state
+   2 is done on appearance of another (literal!) string, all other
+   tokens will cause a warning.  */
+static bool
+extract_balanced (mlp, arg_sg, arg_pl, state, delim)
+     message_list_ty *mlp;
+     int arg_sg, arg_pl;
+     int state;
+     token_type_ty delim;
+{
+  /* Remember the message containing the msgid, for msgid_plural.  */
+  message_ty *plural_mp = NULL;
+  
+  /* The current argument for a possibly extracted keyword.  Counting
+     starts with 1.  */
+  int arg_count = 1;
+  
+  /* Number of left parentheses seen.  */
+  int paren_seen = 0;
+
+  /* The current token.  */
+  token_ty* tp = NULL;
+
+  token_type_ty last_token = token_type_eof;
+
+#if DEBUG_PERL
+  static int nesting_level = 0;
+
+  ++nesting_level;
+#endif
+
+  for ever
+    {
+      int my_last_token = last_token;
+
+      if (tp)
+       free_token (tp);
+      
+      tp = x_perl_lex (mlp);
+
+      last_token = tp->type;
+
+      if (delim == tp->type)
+       {
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: extract_balanced finished (%d)\n",
+                  logical_file_name, tp->line_number, --nesting_level);
+#endif
+         free_token (tp);
+         return false;
+       }
+
+      switch (tp->type)
+       {
+       case token_type_symbol:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type symbol (%d) \"%s\"\n", 
+                  logical_file_name, tp->line_number, nesting_level, tp->string);
+#endif
+
+         /* No need to bother if we extract all strings anyway.  */
+         if (!extract_all)
+           {
+             void *keyword_value;
+
+             if (0 == find_entry (&keywords, tp->string, strlen (tp->string),
+                                  &keyword_value))
+               {
+                 last_token = token_type_keyword_symbol;
+
+                 arg_sg = (int) (long) keyword_value & ((1 << 10) - 1);
+                 arg_pl = (int) (long) keyword_value >> 10;
+                 arg_count = 1;
+                 
+                 state = 2;
+               }
+           }
+         continue;
+
+       case token_type_variable:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type variable (%d) \"%s\"\n", 
+                  logical_file_name, tp->line_number, nesting_level, tp->string);
+#endif
+         prefer_division_over_regexp = true;
+         continue;
+
+       case token_type_lparen:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type left parentheses (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         ++paren_seen;
+
+         /* No need to recurse if we extract all strings anyway.  */
+         if (extract_all)
+           continue;
+         else
+           {
+             if (extract_balanced (mlp, arg_sg - arg_count + 1, 
+                                   arg_pl - arg_count + 1, state, 
+                                   token_type_rparen))
+               {
+                 free_token (tp);
+                 return true;
+               }
+             if (my_last_token == token_type_keyword_symbol)
+               arg_sg = arg_pl = -1;
+           }
+         continue;
+
+       case token_type_rparen:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type right parentheses(%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         --paren_seen;
+
+         /* No need to return if we extract all strings anyway.  */
+         if (extract_all)
+           continue;
+         
+         continue;
+
+       case token_type_comma:
+       case token_type_fat_comma:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type comma (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         /* No need to bother if we extract all strings anyway.  */
+         if (extract_all)
+           continue;
+         ++arg_count;
+
+         if (arg_count > arg_sg && arg_count > arg_pl)
+           {
+             /* We have missed the argument.  */
+             arg_sg = arg_pl = -1;
+             arg_count = 0;
+           }
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: arg_count: %d, arg_sg: %d, arg_pl: %d\n",
+                  real_file_name, tp->line_number, 
+                  arg_count, arg_sg, arg_pl);
+#endif
+         continue;
+
+       case token_type_string:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type string (%d): \"%s\"\n", 
+                  logical_file_name, tp->line_number, nesting_level, tp->string);
+#endif
+
+         if (extract_all)
+           {
+             lex_pos_ty pos;
+             
+             pos.file_name = logical_file_name;
+             pos.line_number = tp->line_number;
+             remember_a_message (mlp, collect_message (mlp, tp, 
+                                                       EXIT_SUCCESS), 
+                                 &pos);
+           }
+         else if (state)
+           {
+             lex_pos_ty pos;
+             
+             pos.file_name = logical_file_name;
+             pos.line_number = tp->line_number;
+             
+             if (arg_count == arg_sg)
+               {
+                 plural_mp = 
+                   remember_a_message (mlp, collect_message (mlp, tp, 
+                                                             EXIT_FAILURE),
+                                       &pos);
+                 arg_sg = -1;
+               }
+             else if (arg_count == arg_pl && plural_mp == NULL)
+               {
+                 if (plural_mp == NULL)
+                   error (EXIT_FAILURE, 0, _("\
+%s:%d: fatal: plural message seen before singular message\n"),
+                          real_file_name, tp->line_number);
+               }
+             else if (arg_count == arg_pl)
+               {
+                 remember_a_message_plural (plural_mp, 
+                                            collect_message (mlp, tp, 
+                                                             EXIT_FAILURE),
+                                            &pos);
+                 arg_pl = -1;
+               }
+           }
+
+         if (arg_sg == -1 && arg_pl == -1)
+           {
+             state = 0;
+             plural_mp = NULL;
+           }
+
+         break;
+
+       case token_type_eof:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type EOF (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         free_token (tp);
+         return true;
+
+       case token_type_lbrace:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type lbrace (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         /* No need to recurse if we extract all strings anyway.  */
+         if (extract_all)
+           continue;
+         else
+           {
+             if (extract_balanced (mlp, -1, -1, 0, token_type_rbrace))
+               {
+                 free_token (tp);
+                 return true;
+               }
+           }
+         continue;
+
+       case token_type_rbrace:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type rbrace (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         state = 0;
+         continue;
+
+       case token_type_lbracket:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type lbracket (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         /* No need to recurse if we extract all strings anyway.  */
+         if (extract_all)
+           continue;
+         else
+           {
+             if (extract_balanced (mlp, -1, -1, 0, token_type_rbracket))
+               {
+                 free_token (tp);
+                 return true;
+               }
+           }
+         continue;
+
+       case token_type_rbracket:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type rbracket (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         state = 0;
+         continue;
+
+       case token_type_semicolon:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type semicolon (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         state = 0;
+
+         /* The ultimate sign.  */
+         arg_sg = arg_pl = -1;
+         
+         continue;
+
+       case token_type_dereference:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type dereference (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         
+         continue;
+
+       case token_type_dot:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type dot (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         state = 0;
+         continue;
+
+       case token_type_named_op:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type named operator (%d): %s\n", 
+                  logical_file_name, tp->line_number, nesting_level, tp->string);
+#endif
+         state = 0;
+         continue;
+
+       case token_type_regex_op:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type regex operator (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         continue;
+
+       case token_type_other:
+#if DEBUG_PERL
+         fprintf (stderr, "%s:%d: type other (%d)\n", 
+                  logical_file_name, tp->line_number, nesting_level);
+#endif
+         state = 0;
+         continue;
+
+       default:
+         fprintf (stderr, "%s:%d: unknown token type %d\n",
+                  real_file_name, tp->line_number, tp->type);
+         abort ();
+       }
+    }
+}
+
+void
+extract_perl (f, real_filename, logical_filename, mdlp)
+     FILE *f;
+     const char *real_filename;
+     const char *logical_filename;
+     msgdomain_list_ty *mdlp;
+{
+  message_list_ty *mlp = mdlp->item[0]->messages;
+
+  fp = f;
+  real_file_name = real_filename;
+  logical_file_name = xstrdup (logical_filename);
+  line_number = 0;
+
+  last_comment_line = -1;
+  last_non_comment_line = -1;
+
+  last_token = token_type_semicolon;  /* Safe assumption.  */
+  prefer_division_over_regexp = false;
+
+  last_string_finished = false;
+
+  init_keywords ();
+
+  token_stack = xcalloc (1, sizeof *token_stack);
+  here_eaten = 0;
+  end_of_file = false;
+
+  /* Eat tokens until eof is seen.  When extract_balanced returns
+     due to an unbalanced closing brace, just restart it.  */
+  while (!extract_balanced (mlp, -1, -1, 0, token_type_rbrace))
+    ;
+
+  fp = NULL;
+  real_file_name = NULL;
+  free (logical_file_name);
+  logical_file_name = NULL;
+  line_number = 0;
+  last_token = token_type_semicolon;
+  last_string_finished = false;
+  stack_free (token_stack);
+  free (token_stack);
+  token_stack = NULL;
+  here_eaten = 0;
+  end_of_file = true;
+}
diff --git a/gettext-tools/src/x-perl.h b/gettext-tools/src/x-perl.h

new file mode 100644 (file)

index 0000000..7e07a0e
--- /dev/null
+++ b/gettext-tools/src/x-perl.h
@@ -0,0 +1,35 @@
+/* xgettext Perl backend.
+   Copyright (C) 2002-2003 Free Software Foundation, Inc.
+   Written by Guido Flohr <guido@imperia.net>, 2002-2003
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
+
+
+#define EXTENSIONS_PERL \
+  { "pl",    "perl"   },                                               \
+  { "pm",    "perl"   },                                               \
+  { "PL",    "perl"   },                                               \
+  { "cgi",   "perl"   },                                               \
+
+#define SCANNERS_PERL \
+  { "perl",        extract_perl, &formatstring_perl },                 \
+
+/* Scan a Perl file and add its translatable strings to mdlp.  */
+extern void extract_perl PARAMS ((FILE *fp, const char *real_filename,
+                                 const char *logical_filename,
+                                 msgdomain_list_ty *mdlp));
+
+extern void x_perl_keyword PARAMS ((const char *keyword));
+extern void x_perl_extract_all PARAMS ((void));
diff --git a/gettext-tools/src/xgettext.c b/gettext-tools/src/xgettext.c

index 231819a1c7ffb86c7c6bef5a12e51a3cc65e4b61..bf249219e24cb8b3f22940b953aa98198a1781ef 100644 (file)
--- a/gettext-tools/src/xgettext.c
+++ b/gettext-tools/src/xgettext.c
@@ -72,6 +72,7 @@
  #include "x-java.h"
  #include "x-properties.h"
  #include "x-awk.h"
+#include "x-perl.h"
  #include "x-ycp.h"
  #include "x-tcl.h"
  #include "x-php.h"
@@ -257,6 +258,7 @@ main (int argc, char *argv[])
         x_librep_extract_all ();
         x_java_extract_all ();
         x_awk_extract_all ();
+       x_perl_extract_all ();
         x_tcl_extract_all ();
         x_php_extract_all ();
         x_glade_extract_all ();
@@ -316,6 +318,7 @@ main (int argc, char *argv[])
             x_librep_keyword (optarg);
             x_java_keyword (optarg);
             x_awk_keyword (optarg);
+           x_perl_keyword (optarg);
             x_tcl_keyword (optarg);
             x_php_keyword (optarg);
             x_glade_keyword (optarg);
@@ -654,7 +657,8 @@ Choice of input file language:\n"));
    -L, --language=NAME         recognise the specified language\n\
                                  (C, C++, ObjectiveC, PO, Python, Lisp,\n\
                                  EmacsLisp, librep, Smalltalk, Java,\n\
-                                JavaProperties, awk, YCP, Tcl, PHP, RST, Glade)\n"));
+                                JavaProperties, awk, YCP, Tcl, PHP, RST,\n\
+                                Glade, Perl)\n"));
        printf (_("\
    -C, --c++                   shorthand for --language=C++\n"));
        printf (_("\
@@ -1456,12 +1460,13 @@ language_to_extractor (const char *name)
      SCANNERS_JAVA
      SCANNERS_PROPERTIES
      SCANNERS_AWK
+    SCANNERS_PERL
      SCANNERS_YCP
      SCANNERS_TCL
      SCANNERS_PHP
      SCANNERS_RST
      SCANNERS_GLADE
-    /* Here will follow more languages and their scanners: perl, etc...
+    /* Here will follow more languages and their scanners: VisualBasic, etc...
         Make sure new scanners honor the --exclude-file option.  */
    };
  
@@ -1504,6 +1509,7 @@ extension_to_language (const char *extension)
      EXTENSIONS_JAVA
      EXTENSIONS_PROPERTIES
      EXTENSIONS_AWK
+    EXTENSIONS_PERL
      EXTENSIONS_YCP
      EXTENSIONS_TCL
      EXTENSIONS_PHP
diff --git a/gettext-tools/tests/ChangeLog b/gettext-tools/tests/ChangeLog

index f44adac1d658beeb0734aff68100dd7430965ec0..9dc5090aecccad7da6f0a7cc744a482c6814103c 100644 (file)
--- a/gettext-tools/tests/ChangeLog
+++ b/gettext-tools/tests/ChangeLog
@@ -1,3 +1,10 @@
+2003-06-11  Guido Flohr  <guido@imperia.net>
+
+       * format-perl-1: New file.
+       * format-perl-2: New file.
+       * lang-perl: New file.
+       * Makefile.am (TESTS): Add them.
+
  2003-05-27  Bruno Haible  <bruno@clisp.org>
  
         * plural-2: Add test for Slovak plural formula.
diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am

index 60269933442af391104da7749d4744609a4e7f3b..4251cec0e20b7490c05f41b5f1d43ad135d05a09 100644 (file)
--- a/gettext-tools/tests/Makefile.am
+++ b/gettext-tools/tests/Makefile.am
@@ -59,13 +59,14 @@ TESTS = gettext-1 gettext-2 \
         format-php-1 format-php-2 \
         format-python-1 format-python-2 \
         format-pascal-1 format-pascal-2 \
+       format-perl-1 format-perl-2 \
         format-tcl-1 format-tcl-2 \
         format-ycp-1 format-ycp-2 \
         plural-1 plural-2 \
         lang-c lang-c++ lang-objc lang-python-1 lang-python-2 lang-clisp \
         lang-elisp lang-librep lang-smalltalk lang-java lang-gawk lang-pascal \
         lang-ycp lang-tcl lang-php lang-po \
-       lang-rst
+       lang-rst lang-perl
  
  EXTRA_DIST += $(TESTS) \
         test.mo xg-test1.ok.po mex-test2.ok msguniq-a.in msguniq-a.inp \
diff --git a/gettext-tools/tests/format-perl-1 b/gettext-tools/tests/format-perl-1

new file mode 100755 (executable)

index 0000000..1bbb5b8
--- /dev/null
+++ b/gettext-tools/tests/format-perl-1
@@ -0,0 +1,166 @@
+#! /bin/sh
+
+# Test recognition of Perl format strings.
+
+tmpfiles=""
+trap 'rm -fr $tmpfiles' 1 2 3 15
+
+tmpfiles="$tmpfiles f-a-1.data"
+cat <<\EOF > f-a-1.data
+# Valid: no argument
+'abc%%'
+# Valid: one character argument
+'abc%c'
+# Valid: one string argument
+'abc%s'
+# Valid: one integer argument
+'abc%i'
+# Valid: one integer argument
+'abc%d'
+# Valid: one integer argument
+'abc%o'
+# Valid: one integer argument
+'abc%u'
+# Valid: one integer argument
+'abc%x'
+# Valid: one integer argument
+'abc%X'
+# Valid: one floating-point argument
+'abc%e'
+# Valid: one floating-point argument
+'abc%E'
+# Valid: one floating-point argument
+'abc%f'
+# Valid: one floating-point argument
+'abc%F'
+# Valid: one floating-point argument
+'abc%g'
+# Valid: one floating-point argument
+'abc%G'
+# Valid: one floating-point argument
+'abc%a'
+# Valid: one floating-point argument
+'abc%A'
+# Valid: one pointer argument
+'abc%p'
+# Valid: one argument with flags
+'abc%0#g'
+# Valid: one argument with width
+'abc%2g'
+# Valid: one argument with width
+'abc%*g'
+# Valid: one argument with precision
+'abc%.4g'
+# Valid: one argument with precision
+'abc%.*g'
+# Valid: one argument with width and precision
+'abc%14.4g'
+# Valid: one argument with width and precision
+'abc%14.*g'
+# Valid: one argument with width and precision
+'abc%*.4g'
+# Valid: one argument with width and precision
+'abc%*.*g'
+# Valid: one argument with size specifier
+'abc%hhi'
+# Valid: one argument with size specifier
+'abc%hi'
+# Valid: one argument with size specifier
+'abc%li'
+# Valid: one argument with size specifier
+'abc%lli'
+# Valid: one argument with size specifier
+'abc%Lg'
+# Valid: one argument with size specifier
+'abc%qi'
+# Valid: one argument with size specifier
+'abc%ji'
+# Valid: one argument with size specifier
+'abc%zi'
+# Valid: one argument with size specifier
+'abc%ti'
+# Invalid: unterminated
+'abc%'
+# Invalid: unknown format specifier
+'abc%y'
+# Invalid: flags after width
+'abc%*0g'
+# Invalid: twice precision
+'abc%.4.2g'
+# Valid: three arguments
+'abc%d%u%u'
+# Valid: a numbered argument
+'abc%1$d'
+# Invalid: zero
+'abc%0$d'
+# Valid: two-digit numbered arguments
+'abc%11$def%10$dgh%9$dij%8$dkl%7$dmn%6$dop%5$dqr%4$dst%3$duv%2$dwx%1$dyz'
+# Invalid: unterminated number
+'abc%1'
+# Invalid: flags before number
+'abc%+1$d'
+# Valid: three arguments, two with same number
+'abc%1$4x,%2$c,%1$u'
+# Invalid: argument with conflicting types
+'abc%1$4x,%2$c,%1$s'
+# Valid: no conflict
+'abc%1$4x,%2$c,%1$u'
+# Invalid: mixing of numbered and unnumbered arguments
+'abc%d%2$x'
+# Valid: numbered argument with constant precision
+'abc%1$.9x'
+# Invalid: mixing of numbered and unnumbered arguments
+'abc%1$.*x'
+# Invalid: missing non-final argument
+'abc%2$x%3$s'
+# Valid: permutation
+'abc%2$ddef%1$d'
+# Valid: multiple uses of same argument
+'abc%2$xdef%1$pghi%2$x'
+# Valid: one argument with width
+'abc%2$#*1$g'
+# Valid: one argument with width and precision
+'abc%3$*2$.*1$g'
+# Invalid: zero
+'abc%2$*0$.*1$g'
+# Valid: named arguments
+'[foo] [bar] [baz]'
+EOF
+
+: ${XGETTEXT=xgettext}
+n=0
+while read comment; do
+  read string
+  n=`expr $n + 1`
+  tmpfiles="$tmpfiles f-a-1-$n.in f-a-1-$n.po"
+  cat <<EOF > f-a-1-$n.in
+gettext(${string});
+EOF
+  ${XGETTEXT} -L perl -o f-a-1-$n.po f-a-1-$n.in || exit 1
+  test -f f-a-1-$n.po || exit 1
+  fail=
+  if echo "$comment" | grep 'Valid:' > /dev/null; then
+    if grep perl-format f-a-1-$n.po > /dev/null; then
+      :
+    else
+      fail=yes
+    fi
+  else
+    if grep perl-format f-a-1-$n.po > /dev/null; then
+      fail=yes
+    else
+      :
+    fi
+  fi
+  if test -n "$fail"; then
+    echo "Format string recognition error:" 1>&2
+    cat f-a-1-$n.in 1>&2
+    echo "Got:" 1>&2
+    cat f-a-1-$n.po 1>&2
+    exit 1
+  fi
+done < f-a-1.data
+
+rm -fr $tmpfiles
+
+exit 0
diff --git a/gettext-tools/tests/format-perl-2 b/gettext-tools/tests/format-perl-2

new file mode 100755 (executable)

index 0000000..a8b6644
--- /dev/null
+++ b/gettext-tools/tests/format-perl-2
@@ -0,0 +1,159 @@
+#! /bin/sh
+
+# Test checking of Perl format strings.
+
+tmpfiles=""
+trap 'rm -fr $tmpfiles' 1 2 3 15
+
+tmpfiles="$tmpfiles f-a-2.data"
+cat <<\EOF > f-a-2.data
+# Valid: %% doesn't count
+msgid  "abc%%def"
+msgstr "xyz"
+# Invalid: invalid msgstr
+msgid  "abc%%def"
+msgstr "xyz%"
+# Valid: same arguments
+msgid  "abc%s%gdef"
+msgstr "xyz%s%g"
+# Valid: same arguments, with different widths
+msgid  "abc%2sdef"
+msgstr "xyz%3s"
+# Valid: same arguments but in numbered syntax
+msgid  "abc%s%gdef"
+msgstr "xyz%1$s%2$g"
+# Valid: permutation
+msgid  "abc%s%g%cdef"
+msgstr "xyz%3$c%2$g%1$s"
+# Invalid: too few arguments
+msgid  "abc%2$udef%1$s"
+msgstr "xyz%1$s"
+# Invalid: too few arguments
+msgid  "abc%sdef%u"
+msgstr "xyz%s"
+# Invalid: too many arguments
+msgid  "abc%udef"
+msgstr "xyz%uvw%c"
+# Valid: same numbered arguments, with different widths
+msgid  "abc%2$5s%1$4s"
+msgstr "xyz%2$4s%1$5s"
+# Invalid: missing argument
+msgid  "abc%2$sdef%1$u"
+msgstr "xyz%1$u"
+# Invalid: missing argument
+msgid  "abc%1$sdef%2$u"
+msgstr "xyz%2$u"
+# Invalid: added argument
+msgid  "abc%1$udef"
+msgstr "xyz%1$uvw%2$c"
+# Valid: type compatibility
+msgid  "abc%i"
+msgstr "xyz%d"
+# Valid: type compatibility
+msgid  "abc%o"
+msgstr "xyz%u"
+# Valid: type compatibility
+msgid  "abc%u"
+msgstr "xyz%x"
+# Valid: type compatibility
+msgid  "abc%u"
+msgstr "xyz%X"
+# Valid: type compatibility
+msgid  "abc%e"
+msgstr "xyz%E"
+# Valid: type compatibility
+msgid  "abc%e"
+msgstr "xyz%f"
+# Valid: type compatibility
+msgid  "abc%e"
+msgstr "xyz%g"
+# Valid: type compatibility
+msgid  "abc%e"
+msgstr "xyz%G"
+# Invalid: type incompatibility
+msgid  "abc%c"
+msgstr "xyz%s"
+# Invalid: type incompatibility
+msgid  "abc%c"
+msgstr "xyz%i"
+# Invalid: type incompatibility
+msgid  "abc%c"
+msgstr "xyz%o"
+# Invalid: type incompatibility
+msgid  "abc%c"
+msgstr "xyz%e"
+# Invalid: type incompatibility
+msgid  "abc%s"
+msgstr "xyz%i"
+# Invalid: type incompatibility
+msgid  "abc%s"
+msgstr "xyz%o"
+# Invalid: type incompatibility
+msgid  "abc%s"
+msgstr "xyz%e"
+# Invalid: type incompatibility
+msgid  "abc%i"
+msgstr "xyz%o"
+# Invalid: type incompatibility
+msgid  "abc%i"
+msgstr "xyz%e"
+# Invalid: type incompatibility
+msgid  "abc%u"
+msgstr "xyz%e"
+# Invalid: type incompatibility for width
+msgid  "abc%g%*g"
+msgstr "xyz%*g%g"
+# Valid: named arguments in arbitrary order
+msgid "[foo] [bar] [baz]: 1"
+msgstr "[baz] [bar] [foo]: 1"
+# Invalid: missing named argument in msgstr
+msgid "[foo] [bar] [baz]: 2"
+msgstr "[baz] [foo]: 2"
+# Invalid: missing named arguments in msgid
+msgid "[foo] [baz]: 3"
+msgstr "[baz] [bar] [foo]: 3"
+# Invalid: number of named arguments in msgstr exceeds msgid
+msgid "[foo] [bar] [baz]: 4"
+msgstr "[baz] [bar] [foo] [bar]: 4"
+# Invalid: number of named arguments in msgid exceeds msgstr
+msgid "[baz] [bar] [foo] [bar]: 5"
+msgstr "[foo] [bar] [baz]: 5"
+EOF
+
+: ${MSGFMT=msgfmt}
+n=0
+while read comment; do
+  read msgid_line
+  read msgstr_line
+  n=`expr $n + 1`
+  tmpfiles="$tmpfiles f-a-2-$n.po f-a-2-$n.mo"
+  cat <<EOF > f-a-2-$n.po
+#, perl-format
+${msgid_line}
+${msgstr_line}
+EOF
+  fail=
+  if echo "$comment" | grep 'Valid:' > /dev/null; then
+    if ${MSGFMT} --check-format -o f-a-2-$n.mo f-a-2-$n.po; then
+      :
+    else
+      fail=yes
+    fi
+  else
+    ${MSGFMT} --check-format -o f-a-2-$n.mo f-a-2-$n.po 2> /dev/null
+    if test $? = 1; then
+      :
+    else
+      fail=yes
+    fi
+  fi
+  if test -n "$fail"; then
+    echo "Format string checking error:" 1>&2
+    cat f-a-2-$n.po 1>&2
+    exit 1
+  fi
+done < f-a-2.data
+
+rm -fr $tmpfiles
+
+exit 0
diff --git a/gettext-tools/tests/lang-perl b/gettext-tools/tests/lang-perl

new file mode 100755 (executable)

index 0000000..4575197
--- /dev/null
+++ b/gettext-tools/tests/lang-perl
@@ -0,0 +1,186 @@
+#! /bin/sh
+
+# Test of gettext facilities in the Perl language.
+
+tmpfiles=""
+trap 'rm -fr $tmpfiles' 1 2 3 15
+
+tmpfiles="$tmpfiles prog.pl"
+here="<" # Help St. Emacs
+here="$here<"
+
+cat <<EOF > prog.pl
+use Locale::Messages;
+
+textdomain "prog";
+bindtextdomain ("./");
+
+s/foo/
+          # stress test for string extraction /xe;
+
+print _"'Your command, please?', asked the waiter.";
+
+printf ngettext ("a piece of cake", "%d pieces of cake", $n), $n;
+
+printf _"%s is replaced by %s.", "FF", "EUR";
+
+# Should be found.
+printf dngettext prog => ("one file deleted", "%d files deleted"), $n, $n;
+
+# Should not be found.
+printf dngettext ("prog"), ("one file created", "%d files created"), $n, $n;
+
+printf dngettext "prog", ${here}PERL, ${here}PERL;
+Singular
+PERL
+Plural
+PERL
+
+print ${here}PERL
+tied hash \$__{   Bareword
+}
+tied hash \$__->{"quoted string"}
+tied hash \$__->{  "weird
+formatting"}
+PERL
+
+print \$__  # Welcome
+   ->   # to the
+ { # Republic of
+ 'Welcome to the Republic of Perl!' # 
+# Perl!
+}; 
+
+\$! ? ?\$__{"pattern match"}? : s  # This is no delimiter.
+{\$__{substitution}}<\$__-\>{"find me"}>;
+
+# No interpolation!
+m'\$__{secret}';
+
+__END__
+gettext "Discarded!";
+         
+EOF
+
+tmpfiles="$tmpfiles prog.pot"
+: ${XGETTEXT=xgettext}
+${XGETTEXT} -k_ -k%__ -k\$__ -o prog.pot --omit-header --no-location prog.pl
+
+tmpfiles="$tmpfiles prog.ok"
+cat <<EOF > prog.ok
+msgid "'Your command, please?', asked the waiter."
+msgstr ""
+
+#, perl-format
+msgid "a piece of cake"
+msgid_plural "%d pieces of cake"
+msgstr[0] ""
+msgstr[1] ""
+
+#, perl-format
+msgid "%s is replaced by %s."
+msgstr ""
+
+#, perl-format
+msgid "one file deleted"
+msgid_plural "%d files deleted"
+msgstr[0] ""
+msgstr[1] ""
+
+msgid "Singular\n"
+msgid_plural "Plural\n"
+msgstr[0] ""
+msgstr[1] ""
+
+msgid "Bareword"
+msgstr ""
+
+msgid "quoted string"
+msgstr ""
+
+msgid ""
+"weird\n"
+"formatting"
+msgstr ""
+
+msgid "Welcome to the Republic of Perl!"
+msgstr ""
+
+msgid "pattern match"
+msgstr ""
+
+msgid "substitution"
+msgstr ""
+
+msgid "find me"
+msgstr ""
+EOF
+
+: ${DIFF=diff}
+${DIFF} prog.ok prog.pot || exit 1
+
+tmpfiles="$tmpfiles fr.po"
+cat <<\EOF > fr.po
+msgid ""
+msgstr ""
+"Content-Type: text/plain; charset=ISO-8859-1\n"
+"Plural-Forms: nplurals=2; plural=(n > 1);\n"
+
+msgid "'Your command, please?', asked the waiter."
+msgstr "«Votre commande, s'il vous plaît», dit le garçon."
+
+# Les gateaux allemands sont les meilleurs du monde.
+#, perl-format
+msgid "a piece of cake"
+msgid_plural "%d pieces of cake"
+msgstr[0] "un morceau de gateau"
+msgstr[1] "%d morceaux de gateau"
+
+# Reverse the arguments.
+#, perl-format
+msgid "%s is replaced by %s."
+msgstr "%2$s remplace %1$s."
+
+#, perl-format
+msgid "one file deleted"
+msgid_plural "%d files deleted"
+msgstr[0] ""
+msgstr[1] ""
+
+msgid "Singular\n"
+msgid_plural "Plural\n"
+msgstr[0] ""
+msgstr[1] ""
+
+msgid "Bareword"
+msgstr ""
+
+msgid "quoted string"
+msgstr ""
+
+msgid ""
+"weird\n"
+"formatting"
+msgstr ""
+
+msgid "Welcome to the Republic of Perl!"
+msgstr ""
+
+msgid "pattern match"
+msgstr ""
+
+msgid "substitution"
+msgstr ""
+
+msgid "find me"
+msgstr ""
+EOF
+
+tmpfiles="$tmpfiles fr.po.new"
+: ${MSGMERGE=msgmerge}
+${MSGMERGE} -q -o fr.po.new fr.po prog.pot
+
+: ${DIFF=diff}
+${DIFF} fr.po fr.po.new || exit 1
+
+exit 0
author	Bruno Haible <bruno@clisp.org>
	Thu, 12 Jun 2003 12:09:38 +0000 (12:09 +0000)
committer	Bruno Haible <bruno@clisp.org>
	Tue, 23 Jun 2009 10:10:39 +0000 (12:10 +0200)
gettext-tools/doc/ChangeLog		patch \| blob \| blame \| history
gettext-tools/doc/gettext.texi		patch \| blob \| blame \| history
gettext-tools/doc/xgettext.texi		patch \| blob \| blame \| history
gettext-tools/po/ChangeLog		patch \| blob \| blame \| history
gettext-tools/po/POTFILES.in		patch \| blob \| blame \| history
gettext-tools/src/ChangeLog		patch \| blob \| blame \| history
gettext-tools/src/format-perl.c	[new file with mode: 0644]	patch \| blob
gettext-tools/src/format.c		patch \| blob \| blame \| history
gettext-tools/src/format.h		patch \| blob \| blame \| history
gettext-tools/src/message.c		patch \| blob \| blame \| history
gettext-tools/src/message.h		patch \| blob \| blame \| history
gettext-tools/src/x-perl.c	[new file with mode: 0644]	patch \| blob
gettext-tools/src/x-perl.h	[new file with mode: 0644]	patch \| blob
gettext-tools/src/xgettext.c		patch \| blob \| blame \| history
gettext-tools/tests/ChangeLog		patch \| blob \| blame \| history
gettext-tools/tests/Makefile.am		patch \| blob \| blame \| history
gettext-tools/tests/format-perl-1	[new file with mode: 0755]	patch \| blob
gettext-tools/tests/format-perl-2	[new file with mode: 0755]	patch \| blob
gettext-tools/tests/lang-perl	[new file with mode: 0755]	patch \| blob