From: Guido Flohr Date: Thu, 1 Apr 2010 10:25:06 +0000 (+0200) Subject: Improve how xgettext handles Perl syntax ambiguities. X-Git-Tag: v0.18~42 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=e53359672a6c99adeee2c0234dae0d0fbc47feed;p=thirdparty%2Fgettext.git Improve how xgettext handles Perl syntax ambiguities. --- diff --git a/gettext-tools/doc/ChangeLog b/gettext-tools/doc/ChangeLog index 6105b937d..4afbf4f77 100644 --- a/gettext-tools/doc/ChangeLog +++ b/gettext-tools/doc/ChangeLog @@ -1,3 +1,9 @@ +2010-03-31 Guido Flohr + + More explanations about how xgettext handles Perl syntax ambiguities. + * gettext.texi (General Problems): Explain how xgettext disambiguates + conditional operator vs. regular expression. + 2010-03-13 Bruno Haible New options --color, --style for many programs. diff --git a/gettext-tools/doc/gettext.texi b/gettext-tools/doc/gettext.texi index 04b9ee322..d342601f3 100644 --- a/gettext-tools/doc/gettext.texi +++ b/gettext-tools/doc/gettext.texi @@ -10997,6 +10997,48 @@ the semantic context. If a slash is really a division sign but mis-interpreted as a pattern match, the rest of the input file is most probably parsed incorrectly. +There are certain cases, where the ambiguity cannot be resolved at all: + +@example +$x = wantarray ? 1 : 0; +@end example + +The Perl built-in function @code{wantarray} does not accept any arguments. +The Perl parser therefore knows that the question mark does not start +a regular expression but is the ternary conditional operator. + +@example +sub wantarrays @{@} +$x = wantarrays ? 1 : 0; +@end example + +Now the situation is different. The function @code{wantarrays} takes +a variable number of arguments (like any non-prototyped Perl function). +The question mark is now the delimiter of a pattern match, and hence +the piece of code does not compile. + +@example +sub wantarrays() @{@} +$x = wantarrays ? 1 : 0; +@end example + +Now the function is prototyped, Perl knows that it does not accept any +arguments, and the question mark is therefore interpreted as the +ternaray operator again. But that unfortunately outsmarts @code{xgettext}. + +The Perl parser in @code{xgettext} cannot know whether a function has +a prototype and what that prototype would look like. It therefore makes +an educated guess. If a function is known to be a Perl built-in and +this function does not accept any arguments, a following question mark +or slash is treated as an operator, otherwise as the delimiter of a +following regular expression. The Perl built-ins that do not accept +arguments are @code{wantarray}, @code{fork}, @code{time}, @code{times}, +@code{getlogin}, @code{getppid}, @code{getpwent}, @code{getgrent}, +@code{gethostent}, @code{getnetent}, @code{getprotoent}, @code{getservent}, +@code{setpwent}, @code{setgrent}, @code{endpwent}, @code{endgrent}, +@code{endhostent}, @code{endnetent}, @code{endprotoent}, and +@code{endservent}. + If you find that @code{xgettext} fails to extract strings from portions of your sources, you should therefore look out for slashes and/or question marks preceding these sections. You may have come @@ -11004,6 +11046,17 @@ across a bug in @code{xgettext}'s Perl parser (and of course you should report that bug). In the meantime you should consider to reformulate your code in a manner less challenging to @code{xgettext}. +In particular, if the parser is too dumb to see that a function +does not accept arguments, use parentheses: + +@example +$x = somefunc() ? 1 : 0; +$y = (somefunc) ? 1 : 0; +@end example + +In fact the Perl parser itself has similar problems and warns you +about such constructs. + @node Default Keywords, Special Keywords, General Problems, Perl @subsubsection Which keywords will xgettext look for? @cindex Perl default keywords diff --git a/gettext-tools/src/ChangeLog b/gettext-tools/src/ChangeLog index 47183dd53..cb00dc6ba 100644 --- a/gettext-tools/src/ChangeLog +++ b/gettext-tools/src/ChangeLog @@ -1,5 +1,28 @@ 2010-03-31 Guido Flohr + Improve how xgettext handles Perl syntax ambiguities. + * x-perl.c(enum token_type_ty): New enumeration items + token_type_number, token_type_object. + (struct token_ty): New field 'last_type'. + (token2string): Handle token_type_number, token_type_object. + (free_token): Likewise. + (prefer_division_over_regexp): Remove variable. + (extract_variable): Recognize token of type token_type_object. + (prefer_regexp_over_division): New function. + (last_token_type): Renamed from last_token. + (x_perl_prelex): Assign the token's last_type. Recognize token of type + token_type_number. Don't special-case "grep" and "split". Invoke + prefer_regexp_over_division for disambiguation. + (token_stack_dump): Handle token_type_number, token_type_object. + (x_perl_lex): Assign the token's last_type. Update last_token_type + intelligently. + (collect_message): Invoke prefer_regexp_over_division for + disambiguation. + (extract_balanced): Don't set last_token_type here. Handle + token_type_number, token_type_object. + (extract_perl): Initialize last_token_type here. + Reported by Guillem Jover via Santiago Vila. + * x-perl.c (x_perl_prelex): Clarify interpolate_keywords arguments. * x-perl.c (eaten_here): Renamed from here_eaten. diff --git a/gettext-tools/src/x-perl.c b/gettext-tools/src/x-perl.c index c2e9fee96..c7843c405 100644 --- a/gettext-tools/src/x-perl.c +++ b/gettext-tools/src/x-perl.c @@ -1,7 +1,7 @@ /* xgettext Perl backend. Copyright (C) 2002-2010 Free Software Foundation, Inc. - This file was written by Guido Flohr , 2002-2003. + This file was written by Guido Flohr , 2002-2010. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -234,7 +234,7 @@ phase1_getc () ++line_number; /* Undosify. This is important for catching the end of < */ - token_type_dereference, /* , */ + token_type_dereference, /* -> */ token_type_semicolon, /* ; */ token_type_lbrace, /* { */ token_type_rbrace, /* } */ token_type_lbracket, /* [ */ token_type_rbracket, /* ] */ token_type_string, /* quote-like */ + token_type_number, /* starting with a digit o dot */ token_type_named_op, /* if, unless, while, ... */ token_type_variable, /* $... */ + token_type_object, /* A dereferenced variable, maybe a blessed + object. */ token_type_symbol, /* symbol, number */ token_type_regex_op, /* s, tr, y, m. */ token_type_dot, /* . */ @@ -548,12 +551,14 @@ typedef struct token_ty token_ty; struct token_ty { token_type_ty type; + token_type_ty last_type; int sub_type; /* for token_type_string, token_type_symbol */ char *string; /* for: in encoding: token_type_named_op ASCII token_type_string UTF-8 token_type_symbol ASCII token_type_variable global_source_encoding + token_type_object global_source_encoding */ refcounted_string_list_ty *comment; /* for token_type_string */ int line_number; @@ -589,10 +594,14 @@ token2string (const token_ty *token) return "token_type_rbracket"; case token_type_string: return "token_type_string"; + case token_type_number: + return "token type number"; case token_type_named_op: return "token_type_named_op"; case token_type_variable: return "token_type_variable"; + case token_type_object: + return "token_type_object"; case token_type_symbol: return "token_type_symbol"; case token_type_regex_op: @@ -617,6 +626,7 @@ free_token (token_ty *tp) case token_type_string: case token_type_symbol: case token_type_variable: + case token_type_object: free (tp->string); break; default: @@ -748,14 +758,6 @@ extract_quotelike_pass1_utf8 (int delim) /* ========= Reading of tokens and commands. Extracting strings. ========= */ -/* There is an ambiguity about '/': It can start a division operator ('/' or - '/=') or it can start a regular expression. The distinction is important - because inside regular expressions, '#' loses its special meaning. - The distinction is possible depending on the parsing state: After a - variable or simple expression, it's a division operator; at the beginning - of an expression, it's a regexp. */ -static bool prefer_division_over_regexp; - /* Context lookup table. */ static flag_context_list_table_ty *flag_context_list_table; @@ -1436,8 +1438,6 @@ extract_variable (message_list_ty *mlp, token_ty *tp, int first) real_file_name, line_number, tp->string); #endif - prefer_division_over_regexp = true; - /* * 3) If the following looks strange to you, this is valid Perl syntax: * @@ -1483,6 +1483,7 @@ extract_variable (message_list_ty *mlp, token_ty *tp, int first) if (maybe_hash_value && is_dereference) { + tp->type = token_type_object; #if DEBUG_PERL fprintf (stderr, "%s:%d: first keys preceded by \"->\"\n", real_file_name, line_number); @@ -2002,9 +2003,94 @@ interpolate_keywords (message_list_ty *mlp, const char *string, int lineno) } } -/* The last token seen in the token stream. This is important for the - interpretation of '?' and '/'. */ -static token_type_ty last_token; +/* There is an ambiguity about '/' and '?': They can start an operator + (division operator '/' or '/=' or the conditional operator '?'), or they can + start a regular expression. The distinction is important because inside + regular expressions, '#' loses its special meaning. This function helps + making the decision (a heuristic). See the documentation for details. */ +static bool +prefer_regexp_over_division (token_type_ty type) +{ + bool retval = true; + + switch (type) + { + case token_type_eof: + retval = true; + break; + case token_type_lparen: + retval = true; + break; + case token_type_rparen: + retval = false; + break; + case token_type_comma: + retval = true; + break; + case token_type_fat_comma: + retval = true; + break; + case token_type_dereference: + retval = true; + break; + case token_type_semicolon: + retval = true; + break; + case token_type_lbrace: + retval = true; + break; + case token_type_rbrace: + retval = false; + break; + case token_type_lbracket: + retval = true; + break; + case token_type_rbracket: + retval = false; + break; + case token_type_string: + retval = false; + break; + case token_type_number: + retval = false; + break; + case token_type_named_op: + retval = true; + break; + case token_type_variable: + retval = false; + break; + case token_type_object: + retval = false; + break; + case token_type_symbol: + case token_type_keyword_symbol: + retval = true; + break; + case token_type_regex_op: + retval = false; + break; + case token_type_dot: + retval = true; + break; + case token_type_other: + retval = true; + break; + } + +#if DEBUG_PERL + token_ty ty; + ty.type = type; + fprintf (stderr, "Prefer regexp over division after %s: %s\n", + token2string (&ty), retval ? "true" : "false"); +#endif + + return retval; +} + +/* Last token type seen in the stream. Important for the interpretation + of slash and question mark. */ +static token_type_ty last_token_type; /* Combine characters into tokens. Discard whitespace. */ @@ -2020,6 +2106,7 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) { c = phase2_getc (); tp->line_number = line_number; + tp->last_type = last_token_type; switch (c) { @@ -2043,7 +2130,6 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) if (!extract_all) { extract_variable (mlp, tp, c); - prefer_division_over_regexp = true; return; } break; @@ -2060,17 +2146,11 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) if (c2 == '.') { tp->type = token_type_other; - prefer_division_over_regexp = false; return; } - else if (c2 >= '0' && c2 <= '9') - { - prefer_division_over_regexp = false; - } - else + else if (!(c2 >= '0' && c2 <= '9')) { tp->type = token_type_dot; - prefer_division_over_regexp = true; return; } } @@ -2089,7 +2169,6 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': /* Symbol, or part of a number. */ - prefer_division_over_regexp = true; bufpos = 0; for (;;) { @@ -2154,7 +2233,6 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) { tp->type = token_type_named_op; tp->string = xstrdup (buffer); - prefer_division_over_regexp = false; return; } else if (strcmp (buffer, "s") == 0 @@ -2180,7 +2258,6 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) tp->type = token_type_symbol; tp->sub_type = symbol_type_none; tp->string = xstrdup (buffer); - prefer_division_over_regexp = true; return; } extract_triple_quotelike (mlp, tp, delim, @@ -2214,7 +2291,6 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) tp->type = token_type_symbol; tp->sub_type = symbol_type_none; tp->string = xstrdup (buffer); - prefer_division_over_regexp = true; return; } extract_quotelike (tp, delim); @@ -2223,7 +2299,6 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) free (tp->string); drop_reference (tp->comment); tp->type = token_type_regex_op; - prefer_division_over_regexp = true; /* Eat the following modifiers. */ do @@ -2253,7 +2328,6 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) tp->type = token_type_eof; return; } - prefer_division_over_regexp = true; if ((delim >= '0' && delim <= '9') || (delim >= 'A' && delim <= 'Z') @@ -2264,7 +2338,6 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) tp->type = token_type_symbol; tp->sub_type = symbol_type_none; tp->string = xstrdup (buffer); - prefer_division_over_regexp = true; return; } @@ -2296,10 +2369,10 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) } return; } - else if (strcmp (buffer, "grep") == 0 - || strcmp (buffer, "split") == 0) + else if ((buffer[0] >= '0' && buffer[0] <= '9') || buffer[0] == '.') { - prefer_division_over_regexp = false; + tp->type = token_type_number; + return; } tp->type = token_type_symbol; tp->sub_type = (strcmp (buffer, "sub") == 0 @@ -2309,21 +2382,18 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) return; case '"': - prefer_division_over_regexp = true; extract_quotelike (tp, c); tp->sub_type = string_type_qq; interpolate_keywords (mlp, tp->string, line_number); return; case '`': - prefer_division_over_regexp = true; extract_quotelike (tp, c); tp->sub_type = string_type_qq; interpolate_keywords (mlp, tp->string, line_number); return; case '\'': - prefer_division_over_regexp = true; extract_quotelike (tp, c); tp->sub_type = string_type_q; return; @@ -2336,42 +2406,34 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) else phase2_ungetc (c); tp->type = token_type_lparen; - prefer_division_over_regexp = false; return; case ')': tp->type = token_type_rparen; - prefer_division_over_regexp = true; return; case '{': tp->type = token_type_lbrace; - prefer_division_over_regexp = false; return; case '}': tp->type = token_type_rbrace; - prefer_division_over_regexp = false; return; case '[': tp->type = token_type_lbracket; - prefer_division_over_regexp = false; return; case ']': tp->type = token_type_rbracket; - prefer_division_over_regexp = false; return; case ';': tp->type = token_type_semicolon; - prefer_division_over_regexp = false; return; case ',': tp->type = token_type_comma; - prefer_division_over_regexp = false; return; case '=': @@ -2383,8 +2445,8 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) return; } else if (linepos == 2 - && (last_token == token_type_semicolon - || last_token == token_type_rbrace) + && (last_token_type == token_type_semicolon + || last_token_type == token_type_rbrace) && ((c >= 'A' && c <='Z') || (c >= 'a' && c <= 'z'))) { @@ -2401,12 +2463,10 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) } phase1_ungetc (c); tp->type = token_type_other; - prefer_division_over_regexp = false; return; case '<': /* Check for <type = token_type_other; - prefer_division_over_regexp = false; return; case '/': case '?': - if (!prefer_division_over_regexp) + if (prefer_regexp_over_division (tp->last_type)) { extract_quotelike (tp, c); interpolate_keywords (mlp, tp->string, line_number); free (tp->string); drop_reference (tp->comment); - tp->type = token_type_other; - prefer_division_over_regexp = true; + tp->type = token_type_regex_op; /* Eat the following modifiers. */ do c = phase1_getc (); @@ -2544,7 +2602,6 @@ x_perl_prelex (message_list_ty *mlp, token_ty *tp) as we only need to recognize gettext invocations. Don't bother. */ tp->type = token_type_other; - prefer_division_over_regexp = false; return; } } @@ -2583,6 +2640,10 @@ token_stack_dump (token_stack_ty *stack) case token_type_variable: fprintf (stderr, " string: %s\n", token->string); break; + case token_type_object: + fprintf (stderr, " string: %s->\n", token->string); + default: + break; } } fprintf (stderr, "END STACK DUMP\n"); @@ -2651,10 +2712,69 @@ x_perl_lex (message_list_ty *mlp) { tp = XMALLOC (token_ty); x_perl_prelex (mlp, tp); + tp->last_type = last_token_type; + last_token_type = tp->type; + #if DEBUG_PERL fprintf (stderr, "%s:%d: x_perl_prelex returned %s\n", real_file_name, line_number, token2string (tp)); #endif + + /* The interpretation of a slash or question mark after a function call + depends on the prototype of that function. If the function expects + at least one argument, a regular expression is preferred, otherwise + an operator. With our limited means, we can only guess here. If + the function is a builtin that takes no arguments, we prefer an + operator by silently turning the last symbol into a variable instead + of a symbol. + + Method calls without parentheses are not ambiguous. After them, an + operator must follow. Due to some ideosyncrasies in this parser + they are treated in two different manners. If the call is + chained ($foo->bar->baz) the token left of the symbol is a + dereference operator. If it is not chained ($foo->bar) the + dereference operator is consumed with the extracted variable. The + latter case is handled below. */ + if (tp->type == token_type_symbol) + { + if (tp->last_type == token_type_dereference) + { + /* Class method call or chained method call (with at least + two arrow operators). */ + last_token_type = token_type_variable; + } + else if (tp->last_type == token_type_object) + { + /* Instance method, not chained. */ + last_token_type = token_type_variable; + } + else if (strcmp (tp->string, "wantarray") == 0 + || strcmp (tp->string, "fork") == 0 + || strcmp (tp->string, "getlogin") == 0 + || strcmp (tp->string, "getppid") == 0 + || strcmp (tp->string, "getpwent") == 0 + || strcmp (tp->string, "getgrent") == 0 + || strcmp (tp->string, "gethostent") == 0 + || strcmp (tp->string, "getnetent") == 0 + || strcmp (tp->string, "getprotoent") == 0 + || strcmp (tp->string, "getservent") == 0 + || strcmp (tp->string, "setpwent") == 0 + || strcmp (tp->string, "setgrent") == 0 + || strcmp (tp->string, "endpwent") == 0 + || strcmp (tp->string, "endgrent") == 0 + || strcmp (tp->string, "endhostent") == 0 + || strcmp (tp->string, "endnetent") == 0 + || strcmp (tp->string, "endprotoent") == 0 + || strcmp (tp->string, "endservent") == 0 + || strcmp (tp->string, "time") == 0 + || strcmp (tp->string, "times") == 0 + || strcmp (tp->string, "wait") == 0 + || strcmp (tp->string, "wantarray") == 0) + { + /* A Perl built-in function that does not accept arguments. */ + last_token_type = token_type_variable; + } + } } #if DEBUG_PERL else @@ -2789,7 +2909,8 @@ collect_message (message_list_ty *mlp, token_ty *tp, int error_level) phase2_ungetc (c); if (c == '"' || c == '\'' || c == '`' - || (!prefer_division_over_regexp && (c == '/' || c == '?')) + || ((c == '/' || c == '?') + && prefer_regexp_over_division (tp->last_type)) || c == 'q') { token_ty *qstring = x_perl_lex (mlp); @@ -2916,9 +3037,6 @@ extract_balanced (message_list_ty *mlp, ++nesting_level; #endif - last_token = token_type_semicolon; /* Safe assumption. */ - prefer_division_over_regexp = false; - for (;;) { /* The current token. */ @@ -2926,8 +3044,6 @@ extract_balanced (message_list_ty *mlp, tp = x_perl_lex (mlp); - last_token = tp->type; - if (delim == tp->type) { xgettext_current_source_encoding = po_charset_utf8; @@ -3016,6 +3132,7 @@ extract_balanced (message_list_ty *mlp, switch (tp->type) { case token_type_symbol: + case token_type_keyword_symbol: #if DEBUG_PERL fprintf (stderr, "%s:%d: type symbol (%d) \"%s\"\n", logical_file_name, tp->line_number, nesting_level, @@ -3031,7 +3148,6 @@ extract_balanced (message_list_ty *mlp, const struct callshapes *shapes = (const struct callshapes *) keyword_value; - last_token = token_type_keyword_symbol; next_shapes = shapes; next_argparser = arglist_parser_alloc (mlp, shapes); } @@ -3052,9 +3168,22 @@ extract_balanced (message_list_ty *mlp, case token_type_variable: #if DEBUG_PERL fprintf (stderr, "%s:%d: type variable (%d) \"%s\"\n", - logical_file_name, tp->line_number, nesting_level, tp->string); + logical_file_name, tp->line_number, nesting_level, + tp->string); +#endif + next_is_argument = false; + if (next_argparser != NULL) + free (next_argparser); + next_argparser = NULL; + next_context_iter = null_context_list_iterator; + break; + + case token_type_object: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type object (%d) \"%s->\"\n", + logical_file_name, tp->line_number, nesting_level, + tp->string); #endif - prefer_division_over_regexp = true; next_is_argument = false; if (next_argparser != NULL) free (next_argparser); @@ -3217,6 +3346,18 @@ extract_balanced (message_list_ty *mlp, next_context_iter = null_context_list_iterator; break; + case token_type_number: +#if DEBUG_PERL + fprintf (stderr, "%s:%d: type number (%d)\n", + logical_file_name, tp->line_number, nesting_level); +#endif + next_is_argument = false; + if (next_argparser != NULL) + free (next_argparser); + next_argparser = NULL; + next_context_iter = null_context_list_iterator; + break; + case token_type_eof: #if DEBUG_PERL fprintf (stderr, "%s:%d: type EOF (%d)\n", @@ -3429,6 +3570,9 @@ extract_perl (FILE *f, const char *real_filename, const char *logical_filename, eaten_here = 0; end_of_file = false; + /* Safe assumption. */ + last_token_type = token_type_semicolon; + /* Eat tokens until eof is seen. When extract_balanced returns due to an unbalanced closing brace, just restart it. */ while (!extract_balanced (mlp, token_type_rbrace, true, false, @@ -3441,7 +3585,7 @@ extract_perl (FILE *f, const char *real_filename, const char *logical_filename, free (logical_file_name); logical_file_name = NULL; line_number = 0; - last_token = token_type_semicolon; + last_token_type = token_type_semicolon; token_stack_free (&token_stack); eaten_here = 0; end_of_file = true; diff --git a/gettext-tools/tests/ChangeLog b/gettext-tools/tests/ChangeLog index 4ca6a394f..deb0123f4 100644 --- a/gettext-tools/tests/ChangeLog +++ b/gettext-tools/tests/ChangeLog @@ -1,3 +1,10 @@ +2010-03-31 Guido Flohr + + Improve how xgettext handles Perl syntax ambiguities. + * xgettext-perl-8: New file. + * Makefile.am (TESTS): Add it. + Reported by Guillem Jover via Santiago Vila. + 2009-12-26 Bruno Haible Enable the gettext-6 and gettext-7 tests also on MacOS X. diff --git a/gettext-tools/tests/Makefile.am b/gettext-tools/tests/Makefile.am index 74dc13f58..5a89a6e02 100644 --- a/gettext-tools/tests/Makefile.am +++ b/gettext-tools/tests/Makefile.am @@ -1,5 +1,5 @@ ## Makefile for the gettext-tools/tests subdirectory of GNU gettext -## Copyright (C) 1995-1997, 2001-2009 Free Software Foundation, Inc. +## Copyright (C) 1995-1997, 2001-2010 Free Software Foundation, Inc. ## ## This program is free software: you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by @@ -86,7 +86,7 @@ TESTS = gettext-1 gettext-2 gettext-3 gettext-4 gettext-5 gettext-6 gettext-7 \ xgettext-lisp-1 xgettext-lisp-2 \ xgettext-objc-1 xgettext-objc-2 \ xgettext-perl-1 xgettext-perl-2 xgettext-perl-3 xgettext-perl-4 \ - xgettext-perl-5 xgettext-perl-6 xgettext-perl-7 \ + xgettext-perl-5 xgettext-perl-6 xgettext-perl-7 xgettext-perl-8 \ xgettext-php-1 xgettext-php-2 xgettext-php-3 xgettext-php-4 \ xgettext-po-1 \ xgettext-properties-1 \ diff --git a/gettext-tools/tests/xgettext-perl-8 b/gettext-tools/tests/xgettext-perl-8 new file mode 100755 index 000000000..74247cd34 --- /dev/null +++ b/gettext-tools/tests/xgettext-perl-8 @@ -0,0 +1,133 @@ +#! /bin/sh + +# The slash (/) and the question mark (?) serve a double-purpose in Perl. +# Depending on the context they can either be an operator (division +# or ternary respectively) or they are regex delimiters for pattern +# matches. This test case checks the proper recognition. + +tmpfiles="" +trap 'rm -fr $tmpfiles' 1 2 3 15 + +tmpfiles="$tmpfiles xg-pl-8.pl" +cat <<\EOF > xg-pl-8.pl +info(__("using %s."), ($a->b() eq "auto" ? "" : "")); + +print __"Question mark after string is an operator!\n"; +# ?; Re-sync. + +@times = sort {$a - $b} split /,\s*/, $options + if (defined $options && $options); + +print __"First slash in a an argument to a function starts a pattern match."; +# /; Re-sync. + +$0 =~ /xyz/ ? 'foo' : 'bar'; + +print __"Question mark after a regular pattern match is an operator!"; +# ?; Re-sync. + +$0 =~ m{xyz} ? 'foo' : 'bar'; + +print __"Question mark after a nesting pattern match is an operator!"; +# ?; Re-sync. + +$0 =~ m|xyz| ? 'foo' : 'bar'; + +print __"Question mark after a non-nesting pattern match is an operator!"; +# ?; Re-sync. + +print __(<method ? 1 : 0; + +print __"Class method calls without parentheses do not accept arguments!"; +# ?; Re-sync. + +$foo = $Something->method ? 1 : 0; + +print __"Instance method calls without parentheses do not accept arguments!"; +# ?; Re-sync. + +$foo = $Some->thing->method ? 1 : 0; + +print __"Chained method calls without parentheses do not accept arguments!"; +# ?; Re-sync. + +print __"Synching works."; +EOF + +tmpfiles="$tmpfiles xg-pl-8.tmp.po xg-pl-8.po" +: ${XGETTEXT=xgettext} +${XGETTEXT} --omit-header -n \ + -k__ \ + -d xg-pl-8.tmp xg-pl-8.pl +test $? = 0 || { rm -fr $tmpfiles; exit 1; } +LC_ALL=C tr -d '\r' < xg-pl-8.tmp.po > xg-pl-8.po +test $? = 0 || { rm -fr $tmpfiles; exit 1; } + +tmpfiles="$tmpfiles xg-pl-8.ok" +cat <<\EOF > xg-pl-8.ok +#: xg-pl-8.pl:1 +#, perl-format +msgid "using %s." +msgstr "" + +#: xg-pl-8.pl:3 +msgid "Question mark after string is an operator!\n" +msgstr "" + +#: xg-pl-8.pl:9 +msgid "First slash in a an argument to a function starts a pattern match." +msgstr "" + +#: xg-pl-8.pl:14 +msgid "Question mark after a regular pattern match is an operator!" +msgstr "" + +#: xg-pl-8.pl:19 +msgid "Question mark after a nesting pattern match is an operator!" +msgstr "" + +#: xg-pl-8.pl:24 +msgid "Question mark after a non-nesting pattern match is an operator!" +msgstr "" + +#: xg-pl-8.pl:28 +msgid "Line number for here documents is not correct.\n" +msgstr "" + +#: xg-pl-8.pl:33 +msgid "The function wantarray does not take arguments!" +msgstr "" + +#: xg-pl-8.pl:38 +msgid "Class method calls without parentheses do not accept arguments!" +msgstr "" + +#: xg-pl-8.pl:43 +msgid "Instance method calls without parentheses do not accept arguments!" +msgstr "" + +#: xg-pl-8.pl:48 +msgid "Chained method calls without parentheses do not accept arguments!" +msgstr "" + +#: xg-pl-8.pl:51 +msgid "Synching works." +msgstr "" +EOF + +: ${DIFF=diff} +${DIFF} xg-pl-8.ok xg-pl-8.po +result=$? + +rm -fr $tmpfiles + +exit $result