Plural form handling.

author Bruno Haible <bruno@clisp.org>

Fri, 12 Jan 2001 17:00:31 +0000 (17:00 +0000)

committer Bruno Haible <bruno@clisp.org>

Fri, 12 Jan 2001 17:00:31 +0000 (17:00 +0000)
author Bruno Haible <bruno@clisp.org>
Fri, 12 Jan 2001 17:00:31 +0000 (17:00 +0000)
committer Bruno Haible <bruno@clisp.org>
Fri, 12 Jan 2001 17:00:31 +0000 (17:00 +0000)
diff --git a/Admin/plans b/Admin/plans

index fd891267ed87f577c9273bf41b8b0d660aabb65c..5d335f5254e36f95434f06efe81d13047fe8f908 100644 (file)
--- a/Admin/plans
+++ b/Admin/plans
@@ -15,24 +15,12 @@ Things we plan to do. Comments welcome.
  
  - Merge clisp specific changes.
  
-- Update documentation for plural features and bind_textdomain_codeset.
-
-- Treatment of plurals in pot-files: Use the following pattern:
-    msgid "a piece of cake"
-    msgid_plural "%d pieces of cake"
-    msgstr0 "un morceau de gateau"
-    msgstr1 "%d morceaux de gateau"
-  or possibly:
-    msgid "a piece of cake"
-    msgid_plural "%d pieces of cake"
-    msgstr[0] "un morceau de gateau"
-    msgstr[1] "%d morceaux de gateau"
-
  - Work towards integration with automake.
  
  - Stop documenting AM_WITH_NLS. AM_GNU_GETTEXT is the right macro to use.
  
-- Unify intlh.inst.in and libgettext.h.
+- Unify intlh.inst.in and libgettext.h. libgettext.h depends on
+  HAVE_LC_MESSAGES and on <locale.h> being already included.
  
  - What about gettext_noop? Kill it or maybe apply this:
  
diff --git a/doc/ChangeLog b/doc/ChangeLog

index d67275ce8c7e881eeea4dcfd9509c1a1e7c51711..228bd6e8e3d62c695559a69967dfb6a3d7c2a5a7 100644 (file)
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,19 @@
+2001-01-01  Bruno Haible  <haible@clisp.cons.org>
+
+       Implement plural form handling.
+       * gettext.texi: Fix menus.
+       (PO Files): Document entries for plural forms.
+       (xgettext Invocation): Extended --keyword argument syntax. More
+       default keywords.
+       (MO Files): Document format of entries for plural forms.
+       (Charset conversion): New node, mostly from glibc-2.2 manual.
+       (Plural forms): Likewise.
+       (GUI program problems): Likewise, without the GCC function macro.
+       (Optimized gettext): Remove section about dcgettext macro. All
+       caching is now done inside the *gettext functions.
+       (Comparison): Move the example about Polish to node "Plural forms".
+       Remove the print_month_info example.
+
  2000-11-12  Bruno Haible  <haible@clisp.cons.org>
  
         * matrix.texi: Update.
diff --git a/doc/gettext.texi b/doc/gettext.texi

index 870db5bc4b2024f09f2915fb7868ebd806f14cee..137c1ee0981b0cab5a6616cb61b999bbbbedc242 100644 (file)
--- a/doc/gettext.texi
+++ b/doc/gettext.texi
@@ -20,7 +20,7 @@
  This file provides documentation for GNU @code{gettext} utilities.
  It also serves as a reference for the free Translation Project.
  
-Copyright (C) 1995, 1996, 1997, 1998 Free Software Foundation, Inc.
+Copyright (C) 1995, 1996, 1997, 1998, 2001 Free Software Foundation, Inc.
  
  Permission is granted to make and distribute verbatim copies of
  this manual provided the copyright notice and this permission notice
@@ -54,7 +54,7 @@ by the Foundation.
  
  @page
  @vskip 0pt plus 1filll
-Copyright @copyright{} 1995, 1996, 1997, 1998 Free Software Foundation, Inc.
+Copyright @copyright{} 1995, 1996, 1997, 1998, 2001 Free Software Foundation, Inc.
  
  Permission is granted to make and distribute verbatim copies of
  this manual provided the copyright notice and this permission notice
@@ -132,6 +132,7 @@ Updating Existing PO Files
  * Obsolete Entries::            Obsolete Entries
  * Modifying Translations::      Modifying Translations
  * Modifying Comments::          Modifying Comments
+* Subedit::                     Mode for Editing Translations
  * Auxiliary::                   Consulting Auxiliary PO Files
  
  Producing Binary MO Files
@@ -164,6 +165,9 @@ About @code{gettext}
  * Interface to gettext::        The interface
  * Ambiguities::                 Solving ambiguities
  * Locating Catalogs::           Locating message catalog files
+* Charset conversion::          How to request conversion to Unicode
+* Plural forms::                Additional functions for handling plurals
+* GUI program problems::        Another technique for solving ambiguities
  * Optimized gettext::           Optimization of the *gettext functions
  
  Temporary Notes for the Programmers Chapter
@@ -899,6 +903,22 @@ does some more tests to check to validity of the translation.
  
  @end table
  
+A different kind of entries is used for translations which involve
+plural forms.
+
+@example
+@var{white-space}
+#  @var{translator-comments}
+#. @var{automatic-comments}
+#: @var{reference}@dots{}
+#, @var{flag}@dots{}
+msgid @var{untranslated-string-singular}
+msgid_plural @var{untranslated-string-plural}
+msgstr[0] @var{translated-string-case-0}
+...
+msgstr[N] @var{translated-string-case-n}
+@end example
+
  It happens that some lines, usually whitespace or comments, follow the
  very last entry of a PO file.  Such lines are not part of any entry,
  and PO mode is unable to take action on those lines.  By using the
@@ -1867,11 +1887,16 @@ If @var{keywordspec} is a C identifer @var{id}, @code{xgettext} looks
  for strings in the first argument of each call to the function or macro
  @var{id}.  If @var{keywordspec} is of the form
  @samp{@var{id}:@var{argnum}}, @code{xgettext} looks for strings in the
-@var{argnum}th argument of the call.
+@var{argnum}th argument of the call.  If @var{keywordspec} is of the form
+@samp{@var{id}:@var{argnum1},@var{argnum2}}, @code{xgettext} looks for
+strings in the @var{argnum1}st argument and in the @var{argnum2}nd argument
+of the call, and treats them as singular/plural variants for a message
+with plural handling.
  
  The default keyword specifications, which are always looked for if not
  explicitly disabled, are @code{gettext}, @code{dgettext:2},
-@code{dcgettext:2} and @code{gettext_noop}.
+@code{dcgettext:2}, @code{ngettext:1,2}, @code{dngettext:2,3},
+@code{dcngettext:2,3}, and @code{gettext_noop}.
  
  @item -m [@var{string}]
  @itemx --msgstr-prefix[=@var{string}]
@@ -2826,13 +2851,25 @@ With this option, each string is separately aligned so it starts at
  an offset which is a multiple of the alignment value.  On some RISC
  machines, a correct alignment will speed things up.
  
+Plural forms are stored by letting the plural of the original string
+follow the singular of the original string, separated through a
+@key{NUL} byte.  The length which appears in the string descriptor
+includes both.  However, only the singular of the original string
+takes part in the hash table lookup.  The plural variants of the
+translation are all stored consecutively, separated through a
+@key{NUL} byte.  Here also, the length in the string descriptor
+includes all of them.
+
  Nothing prevents a MO file from having embedded @key{NUL}s in strings.
  However, the program interface currently used already presumes
  that strings are @key{NUL} terminated, so embedded @key{NUL}s are
  somewhat useless.  But MO file format is general enough so other
  interfaces would be later possible, if for example, we ever want to
  implement wide characters right in MO files, where @key{NUL} bytes may
-accidently appear.
+accidently appear.  (No, we don't want to have wide characters in MO
+files.  They would make the file unnecessarily large, and the
+@samp{wchar_t} type being platform dependent, MO files would be
+platform dependent as well.)
  
  This particular issue has been strongly debated in the GNU
  @code{gettext} development forum, and it is expectable that MO file
@@ -3133,6 +3170,9 @@ in using this library will be interested in this description.
  * Interface to gettext::        The interface
  * Ambiguities::                 Solving ambiguities
  * Locating Catalogs::           Locating message catalog files
+* Charset conversion::          How to request conversion to Unicode
+* Plural forms::                Additional functions for handling plurals
+* GUI program problems::        Another technique for solving ambiguities
  * Optimized gettext::           Optimization of the *gettext functions
  @end menu
  
@@ -3244,7 +3284,7 @@ achieved when the program executes a @code{chdir} command.  Relative
  paths should always be avoided to avoid dependencies and
  unreliabilities.
  
-@node Locating Catalogs, Optimized gettext, Ambiguities, gettext
+@node Locating Catalogs, Charset conversion, Ambiguities, gettext
  @subsection Locating Message Catalog Files
  
  Because many different languages for many different packages have to be
@@ -3275,7 +3315,471 @@ less arbitrary value for it.} @footnote{When the system does not support
  @code{setlocale} its behavior in setting the locale values is simulated
  by looking at the environment variables.}
  
-@node Optimized gettext,  , Locating Catalogs, gettext
+@node Charset conversion, Plural forms, Locating Catalogs, gettext
+@subsection How to specify the output character set @code{gettext} uses
+
+@code{gettext} not only looks up a translation in a message catalog.  It
+also converts the translation on the fly to the desired output character
+set.  This is useful if the user is working in a different character set
+than the translator who created the message catalog, because it avoids
+distributing variants of message catalogs which differ only in the
+character set.
+
+The output character set is, by default, the value of @code{nl_langinfo
+(CODESET)}, which depends on the @code{LC_CTYPE} part of the current
+locale.  But programs which store strings in a locale independent way
+(e.g. UTF-8) can request that @code{gettext} and related functions
+return the translations in that encoding, by use of the
+@code{bind_textdomain_codeset} function.
+
+Note that the @var{msgid} argument to @code{gettext} is not subject to
+character set conversion.  Also, when @code{gettext} does not find a
+translation for @var{msgid}, it returns @var{msgid} unchanged --
+independently of the current output character set.  It is therefore
+recommended that all @var{msgid}s be US-ASCII strings.
+
+@deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
+The @code{bind_textdomain_codeset} function can be used to specify the
+output character set for message catalogs for domain @var{domainname}.
+The @var{codeset} argument must be a valid codeset name which can be used
+for the @code{iconv_open} function, or a null pointer.
+
+If the @var{codeset} parameter is the null pointer,
+@code{bind_textdomain_codeset} returns the currently selected codeset
+for the domain with the name @var{domainname}. It returns @code{NULL} if
+no codeset has yet been selected.
+
+The @code{bind_textdomain_codeset} function can be used several times. 
+If used multiple times with the same @var{domainname} argument, the
+later call overrides the settings made by the earlier one.
+
+The @code{bind_textdomain_codeset} function returns a pointer to a
+string containing the name of the selected codeset.  The string is
+allocated internally in the function and must not be changed by the
+user.  If the system went out of core during the execution of
+@code{bind_textdomain_codeset}, the return value is @code{NULL} and the
+global variable @var{errno} is set accordingly.
+@end deftypefun
+
+@node Plural forms, GUI program problems, Charset conversion, gettext
+@subsection Additional functions for plural forms
+
+The functions of the @code{gettext} family described so far (and all the
+@code{catgets} functions as well) have one problem in the real world
+which have been neglected completely in all existing approaches.  What
+is meant here is the handling of plural forms.
+
+Looking through Unix source code before the time anybody thought about
+internationalization (and, sadly, even afterwards) one can often find
+code similar to the following:
+
+@smallexample
+   printf ("%d file%s deleted", n, n == 1 ? "" : "s");
+@end smallexample
+
+@noindent
+After the first complaints from people internationalizing the code people
+either completely avoided formulations like this or used strings like
+@code{"file(s)"}.  Both look unnatural and should be avoided.  First
+tries to solve the problem correctly looked like this:
+
+@smallexample
+   if (n == 1)
+     printf ("%d file deleted", n);
+   else
+     printf ("%d files deleted", n);
+@end smallexample
+
+But this does not solve the problem.  It helps languages where the
+plural form of a noun is not simply constructed by adding an `s' but
+that is all.  Once again people fell into the trap of believing the
+rules their language is using are universal.  But the handling of plural
+forms differs widely between the language families.  For example,
+Rafal Maszkowski @code{<rzm@@mat.uni.torun.pl>} reports:
+
+@quotation
+In Polish we use e.g. plik (file) this way:
+@example
+1 plik
+2,3,4 pliki
+5-21 pliko'w
+22-24 pliki
+25-31 pliko'w
+@end example
+and so on (o' means 8859-2 oacute which should be rather okreska,
+similar to aogonek).
+@end quotation
+
+There are two things which can differ between languages (and even inside
+language families);
+
+@itemize @bullet
+@item
+The form how plural forms are build differs.  This is a problem with
+language which have many irregularities.  German, for instance, is a
+drastic case.  Though English and German are part of the same language
+family (Germanic), the almost regular forming of plural noun forms
+(appending an `s') is hardly found in German.
+
+@item
+The number of plural forms differ.  This is somewhat surprising for
+those who only have experiences with Romanic and Germanic languages
+since here the number is the same (there are two).
+
+But other language families have only one form or many forms.  More
+information on this in an extra section.
+@end itemize
+
+The consequence of this is that application writers should not try to
+solve the problem in their code.  This would be localization since it is
+only usable for certain, hardcoded language environments.  Instead the
+extended @code{gettext} interface should be used.
+
+These extra functions are taking instead of the one key string two
+strings and an numerical argument.  The idea behind this is that using
+the numerical argument and the first string as a key, the implementation
+can select using rules specified by the translator the right plural
+form.  The two string arguments then will be used to provide a return
+value in case no message catalog is found (similar to the normal
+@code{gettext} behavior).  In this case the rules for Germanic language
+is used and it is assumed that the first string argument is the singular
+form, the second the plural form.
+
+This has the consequence that programs without language catalogs can
+display the correct strings only if the program itself is written using
+a Germanic language.  This is a limitation but since the GNU C library
+(as well as the GNU @code{gettext} package) are written as part of the
+GNU package and the coding standards for the GNU project require program
+being written in English, this solution nevertheless fulfills its
+purpose.
+
+@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
+The @code{ngettext} function is similar to the @code{gettext} function
+as it finds the message catalogs in the same way.  But it takes two
+extra arguments.  The @var{msgid1} parameter must contain the singular
+form of the string to be converted.  It is also used as the key for the
+search in the catalog.  The @var{msgid2} parameter is the plural form.
+The parameter @var{n} is used to determine the plural form.  If no
+message catalog is found @var{msgid1} is returned if @code{n == 1},
+otherwise @code{msgid2}.
+
+An example for the us of this function is:
+
+@smallexample
+  printf (ngettext ("%d file removed", "%d files removed", n), n);
+@end smallexample
+
+Please note that the numeric value @var{n} has to be passed to the
+@code{printf} function as well.  It is not sufficient to pass it only to
+@code{ngettext}.
+@end deftypefun
+
+@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
+The @code{dngettext} is similar to the @code{dgettext} function in the
+way the message catalog is selected.  The difference is that it takes
+two extra parameter to provide the correct plural form.  These two
+parameters are handled in the same way @code{ngettext} handles them.
+@end deftypefun
+
+@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
+The @code{dcngettext} is similar to the @code{dcgettext} function in the
+way the message catalog is selected.  The difference is that it takes
+two extra parameter to provide the correct plural form.  These two
+parameters are handled in the same way @code{ngettext} handles them.
+@end deftypefun
+
+Now, how do these functions solve the problem of the plural forms?
+Without the input of linguists (which was not available) it was not
+possible to determine whether there are only a few different forms in
+which plural forms are formed or whether the number can increase with
+every new supported language.
+
+Therefore the solution implemented is to allow the translator to specify
+the rules of how to select the plural form.  Since the formula varies
+with every language this is the only viable solution except for
+hardcoding the information in the code (which still would require the
+possibility of extensions to not prevent the use of new languages).  The
+details are explained in the GNU @code{gettext} manual.  Here only a a
+bit of information is provided.
+
+The information about the plural form selection has to be stored in the
+header entry (the one with the empty (@code{msgid} string).  There should
+be something like:
+
+@smallexample
+  nplurals=2; plural=n == 1 ? 0 : 1
+@end smallexample
+
+The @code{nplurals} value must be a decimal number which specifies how
+many different plural forms exist for this language.  The string
+following @code{plural} is an expression which is using the C language
+syntax.  Exceptions are that no negative number are allowed, numbers
+must be decimal, and the only variable allowed is @code{n}.  This
+expression will be evaluated whenever one of the functions
+@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called.  The
+numeric value passed to these functions is then substituted for all uses
+of the variable @code{n} in the expression.  The resulting value then
+must be greater or equal to zero and smaller than the value given as the
+value of @code{nplurals}.
+
+@noindent
+The following rules are known at this point.  The language with families
+are listed.  But this does not necessarily mean the information can be
+generalized for the whole family (as can be easily seen in the table
+below).@footnote{Additions are welcome.  Send appropriate information to
+@email{bug-glibc-manual@@gnu.org}.}
+
+@table @asis
+@item Only one form:
+Some languages only require one single form.  There is no distinction
+between the singular and plural form.  And appropriate header entry
+would look like this:
+
+@smallexample
+nplurals=1; plural=0
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Finno-Ugric family
+Hungarian
+@item Asian family
+Japanese
+@item Turkic/Altaic family
+Turkish
+@end table
+
+@item Two forms, singular used for one only
+This is the form used in most existing programs since it is what English
+is using.  A header entry would look like this:
+
+@smallexample
+nplurals=2; plural=n != 1
+@end smallexample
+
+(Note: this uses the feature of C expressions that boolean expressions
+have to value zero or one.)
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Germanic family
+Danish, Dutch, English, German, Norwegian, Swedish
+@item Finno-Ugric family
+Finnish
+@item Latin/Greek family
+Greek
+@item Semitic family
+Hebrew
+@item Romanic family
+Italian, Spanish
+@item Artificial
+Esperanto
+@end table
+
+@item Two forms, singular used for zero and one
+Exceptional case in the language family.  The header entry would be:
+
+@smallexample
+nplurals=2; plural=n>1
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Romanic family
+French
+@end table
+
+@item Three forms, special cases for one and two
+The header entry would be:
+
+@smallexample
+nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Celtic
+Gaeilge
+@end table
+
+@item Three forms, special case for one and all numbers ending in 2, 3, or 4
+The header entry would look like this:
+
+@smallexample
+nplurals=3; plural=n==1 ? 0 : n%10>=2 && n%10<=4 ? 1 : 2
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Slavic family
+Russian
+@end table
+
+@item Three forms, special case for one and some numbers ending in 2, 3, or 4
+The header entry would look like this:
+
+@smallexample
+nplurals=3; plural=n==1 ? 0 : \
+  n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2
+@end smallexample
+
+(Continuation in the next line is possible.)
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Slavic family
+Polish
+@end table
+
+@item Four forms, special case for one and all numbers ending in 2, 3, or 4
+The header entry would look like this:
+
+@smallexample
+nplurals=4; plural=n==1 ? 0 : n%10==2 ? 1 : n%10==3 || n%10==4 ? 2 : 3
+@end smallexample
+
+@noindent
+Languages with this property include:
+
+@table @asis
+@item Slavic family
+Slovenian
+@end table
+@end table
+
+@node GUI program problems, Optimized gettext, Plural forms, gettext
+@subsection How to use @code{gettext} in GUI programs
+
+One place where the @code{gettext} functions, if used normally, have big
+problems is within programs with graphical user interfaces (GUIs).  The
+problem is that many of the strings which have to be translated are very
+short.  They have to appear in pull-down menus which restricts the
+length.  But strings which are not containing entire sentences or at
+least large fragments of a sentence may appear in more than one
+situation in the program but might have different translations.  This is
+especially true for the one-word strings which are frequently used in
+GUI programs.
+
+As a consequence many people say that the @code{gettext} approach is
+wrong and instead @code{catgets} should be used which indeed does not
+have this problem.  But there is a very simple and powerful method to
+handle these kind of problems with the @code{gettext} functions.
+
+@noindent
+As as example consider the following fictional situation.  A GUI program
+has a menu bar with the following entries:
+
+@smallexample
++------------+------------+--------------------------------------+
+| File       | Printer    |                                      |
++------------+------------+--------------------------------------+
+| Open     | | Select   |
+| New      | | Open     |
++----------+ | Connect  |
+             +----------+
+@end smallexample
+
+To have the strings @code{File}, @code{Printer}, @code{Open},
+@code{New}, @code{Select}, and @code{Connect} translated there has to be
+at some point in the code a call to a function of the @code{gettext}
+family.  But in two places the string passed into the function would be
+@code{Open}.  The translations might not be the same and therefore we
+are in the dilemma described above.
+
+One solution to this problem is to artificially enlengthen the strings
+to make them unambiguous.  But what would the program do if no
+translation is available?  The enlengthened string is not what should be
+printed.  So we should use a little bit modified version of the functions.
+
+To enlengthen the strings a uniform method should be used.  E.g., in the
+example above the strings could be chosen as
+
+@smallexample
+Menu|File
+Menu|Printer
+Menu|File|Open
+Menu|File|New
+Menu|Printer|Select
+Menu|Printer|Open
+Menu|Printer|Connect
+@end smallexample
+
+Now all the strings are different and if now instead of @code{gettext}
+the following little wrapper function is used, everything works just
+fine:
+
+@cindex sgettext
+@smallexample
+  char *
+  sgettext (const char *msgid)
+  @{
+    char *msgval = gettext (msgid);
+    if (msgval == msgid)
+      msgval = strrchr (msgid, '|') + 1;
+    return msgval;
+  @}
+@end smallexample
+
+What this little function does is to recognize the case when no
+translation is available.  This can be done very efficiently by a
+pointer comparison since the return value is the input value.  If there
+is no translation we know that the input string is in the format we used
+for the Menu entries and therefore contains a @code{|} character.  We
+simply search for the last occurrence of this character and return a
+pointer to the character following it.  That's it!
+
+If one now consistently uses the enlengthened string form and replaces
+the @code{gettext} calls with calls to @code{sgettext} (this is normally
+limited to very few places in the GUI implementation) then it is
+possible to produce a program which can be internationalized.
+
+The other @code{gettext} functions (@code{dgettext}, @code{dcgettext}
+and the @code{ngettext} equivalents) can and should have corresponding
+functions as well which look almost identical, except for the parameters
+and the call to the underlying function.
+
+Now there is of course the question why such functions do not exist in
+the GNU gettext package?  There are two parts of the answer to this question.
+
+@itemize @bullet
+@item
+They are easy to write and therefore can be provided by the project they
+are used in.  This is not an answer by itself and must be seen together
+with the second part which is:
+
+@item
+There is no way the gettext package can contain a version which can work
+everywhere.  The problem is the selection of the character to separate
+the prefix from the actual string in the enlenghtened string.  The
+examples above used @code{|} which is a quite good choice because it
+resembles a notation frequently used in this context and it also is a
+character not often used in message strings.
+
+But what if the character is used in message strings.  Or if the chose
+character is not available in the character set on the machine one
+compiles (e.g., @code{|} is not required to exist for @w{ISO C}; this is
+why the @file{iso646.h} file exists in @w{ISO C} programming environments).
+@end itemize
+
+There is only one more comment to be said.  The wrapper function above
+require that the translations strings are not enlengthened themselves.
+This is only logical.  There is no need to disambiguate the strings
+(since they are never used as keys for a search) and one also saves
+quite some memory and disk space by doing this.
+
+@node Optimized gettext,  , GUI program problems, gettext
  @subsection Optimization of the *gettext functions
  
  At this point of the discussion we should talk about an advantage of the
@@ -3315,55 +3819,12 @@ string is always the same.  One way to use this is:
  
  @noindent
  But this solution is not usable in all situation (e.g. when the locale
-selection changes) nor is it good readable.
-
-The GNU C compiler, version 2.7 and above, provide another solution for
-this.  To describe this we show here some lines of the
-@file{intl/libgettext.h} file.  For an explanation of the expression
-command block see @ref{Statement Exprs, , Statements and Declarations in
-Expressions, gcc, The GNU CC Manual}.
-
-@example
-@group
-#  if defined __GNUC__ && __GNUC__ == 2 && __GNUC_MINOR__ >= 7
-extern int _nl_msg_cat_cntr;
-#   define     dcgettext(domainname, msgid, category)           \
-  (__extension__                                                 \
-   (@{                                                            \
-     char *result;                                               \
-     if (__builtin_constant_p (msgid))                           \
-       @{                                                         \
-         static char *__translation__;                           \
-         static int __catalog_counter__;                         \
-         if (! __translation__                                   \
-             || __catalog_counter__ != _nl_msg_cat_cntr)         \
-           @{                                                     \
-             __translation__ =                                   \
-               dcgettext__ ((domainname), (msgid), (category));  \
-             __catalog_counter__ = _nl_msg_cat_cntr;             \
-           @}                                                     \
-         result = __translation__;                               \
-       @}                                                         \
-     else                                                        \
-       result = dcgettext__ ((domainname), (msgid), (category)); \
-     result;                                                     \
-    @}))
-#  endif
-@end group
-@end example
-
-The interesting thing here is the @code{__builtin_constant_p} predicate.
-This is evaluated at compile time and so optimization can take place
-immediately.  Here two cases are distinguished: the argument to
-@code{gettext} is not a constant value in which case simply the function
-@code{dcgettext__} is called, the real implementation of the
-@code{dcgettext} function.
+selection changes) nor does it lead to legible code.
  
-If the string argument @emph{is} constant we can reuse the once gained
-translation when the locale selection has not changed.  This is exactly
-what is done here.  The @code{_nl_msg_cat_cntr} variable is defined in
-the @file{loadmsgcat.c} which is available in @file{libintl.a} and is
-changed whenever a new message catalog is loaded.
+For this reason, GNU @code{gettext} caches previous translation results.
+When the same translation is requested twice, with no new message
+catalogs being loaded in between, @code{gettext} will, the second time,
+find the result through a single cache lookup.
  
  @node Comparison, Using libintl.a, gettext, Programmers
  @section Comparing the Two Interfaces
@@ -3480,53 +3941,9 @@ We believe that we can solve all conflicts with this method.  If it is
  difficult one can also consider changing one of the conflicting string a
  little bit.  But it is not impossible to overcome.
  
-@c Should this be here?
-Translator note: It is perhaps appropriate here to tell those English
-speaking programmers that the plural form of a noun cannot be formed by
-appending a single `s'.  Most other languages use different methods.
-Even the above form is not general enough to cope with all languages.
-Rafal Maszkowski <rzm@@mat.uni.torun.pl> reports:
-
-@quotation
-In Polish we use e.g. plik (file) this way:
-@example
-1 plik
-2,3,4 pliki
-5-21 pliko'w
-22-24 pliki
-25-31 pliko'w
-@end example
-and so on (o' means 8859-2 oacute which should be rather okreska,
-similar to aogonek).
-@end quotation
-
-A workable approach might be to consider methods like the one used for
-@code{LC_TIME} in the POSIX.2 standard.  The value of the
-@code{alt_digits} field can be up to 100 strings which represent the
-numbers 1 to 100.  Using this in a situation of an internationalized
-program means that an array of translatable strings should be indexed by
-the number which should represent.  A small example:
-
-@example
-@group
-void
-print_month_info (int month)
-@{
-  const char *month_pos[12] =
-  @{ N_("first"), N_("second"), N_("third"),    N_("fourth"),
-    N_("fifth"), N_("sixth"),  N_("seventh"),  N_("eighth"),
-    N_("ninth"), N_("tenth"),  N_("eleventh"), N_("twelfth") @};
-  printf (_("%s is the %s month\n"), nl_langinfo (MON_1 + month),
-          _(month_pos[month]));
-@}
-@end group
-@end example
-
-@noindent
-It should be obvious that this method is only reasonable for small
-ranges of numbers.
-
-@c catgets allows same original entry to have different translations
+@code{catgets} allows same original entry to have different translations,
+but @code{gettext} has another, scalable approach for solving ambiguities
+of this kind: @xref{Ambiguities}.
  
  @node Using libintl.a, gettext grok, Comparison, Programmers
  @section Using libintl.a in own programs
diff --git a/doc/version.texi b/doc/version.texi

index 4e5b91aece458157da50979531eaa824e49bf337..80506d70ad4c7e8776ca872c4125cde20983abf3 100644 (file)
--- a/doc/version.texi
+++ b/doc/version.texi
@@ -1,3 +1,3 @@
-@set UPDATED 6 May 2000
+@set UPDATED 1 January 2001
  @set EDITION 0.10.36
  @set VERSION 0.10.36
diff --git a/intl/ChangeLog b/intl/ChangeLog

index 2521469dc445b4599945afd4df07d75763bf262b..1fc403312c8eb92854b94d2093c1b7f4944c9461 100644 (file)
--- a/intl/ChangeLog
+++ b/intl/ChangeLog
@@ -1,3 +1,30 @@
+2001-01-01  Bruno Haible  <haible@clisp.cons.org>
+
+       Finish implementation of plural form handling.
+       * dcigettext.c (known_translation_t): Rename 'domain' field to
+       'domainname'. Remove 'plindex' field. Add 'domain' and
+       'translation_length' fields.
+       (transcmp): Don't compare 'plindex' fields.
+       (plural_lookup): New function.
+       (DCIGETTEXT): Change cache handing in the plural case. Don't call
+       plural_eval before the translation and its catalog file have been
+       found. Remove plindex from cache key. Add 'translation_length' and
+       'domain' to cache result. 
+       (_nl_find_msg): Remove index argument, return length of translation
+       to the caller instead. Weaken comparison of string lengths, to account
+       for plural entries. Call iconv() on the entire result string, not
+       only on the portion needed so far.
+       * loadinfo.h (_nl_find_msg): Remove index argument, add lengthp
+       argument.
+       * loadmsgcat.c (_nl_load_domain): Adapt to _nl_find_msg change.
+
+       * intl-compat.c (dcngettext, dngettext, ngettext): New functions.
+       * libgettext.h (ngettext__, dngettext__, dcngettext__): New
+       declarations.
+       (ngettext, dngettext): Add missing macro argument.
+
+       * intlh.inst.in (ngettext, dngettext): Add missing macro argument.
+
  2000-12-31  Bruno Haible  <haible@clisp.cons.org>
  
         * gettextP.h (ZERO): New macro.
diff --git a/intl/dcigettext.c b/intl/dcigettext.c

index 5dcb90a4c95911ed31ac47668a007a1d1b35b185..fd50bd7344ed6b8f8ef275aa8b02189a792cca9d 100644 (file)
--- a/intl/dcigettext.c
+++ b/intl/dcigettext.c
@@ -1,5 +1,5 @@
  /* Implementation of the internal dcigettext function.
-   Copyright (C) 1995-1999, 2000 Free Software Foundation, Inc.
+   Copyright (C) 1995-1999, 2000, 2001 Free Software Foundation, Inc.
  
     This program is free software; you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
@@ -203,10 +203,7 @@ static void *mempcpy PARAMS ((void *dest, const void *src, size_t n));
  struct known_translation_t
  {
    /* Domain in which to search.  */
-  char *domain;
-
-  /* Plural index.  */
-  unsigned long int plindex;
+  char *domainname;
  
    /* The category.  */
    int category;
@@ -214,8 +211,12 @@ struct known_translation_t
    /* State of the catalog counter at the point the string was found.  */
    int counter;
  
+  /* Catalog where the string was found.  */
+  struct loaded_l10nfile *domain;
+
    /* And finally the translation.  */
    const char *translation;
+  size_t translation_length;
  
    /* Pointer to the string in question.  */
    char msgid[ZERO];
@@ -243,16 +244,12 @@ transcmp (const void *p1, const void *p2)
    result = strcmp (s1->msgid, s2->msgid);
    if (result == 0)
      {
-      result = strcmp (s1->domain, s2->domain);
+      result = strcmp (s1->domainname, s2->domainname);
        if (result == 0)
-       {
-         result = s1->plindex - s2->plindex;
-         if (result == 0)
-           /* We compare the category last (though this is the cheapest
-              operation) since it is hopefully always the same (namely
-              LC_MESSAGES).  */
-           result = s1->category - s2->category;
-       }
+       /* We compare the category last (though this is the cheapest
+          operation) since it is hopefully always the same (namely
+          LC_MESSAGES).  */
+       result = s1->category - s2->category;
      }
  
    return result;
@@ -274,8 +271,14 @@ const char _nl_default_dirname[] = LOCALEDIR;
  struct binding *_nl_domain_bindings;
  
  /* Prototypes for local functions.  */
-static unsigned long int plural_eval (struct expression *pexp,
-                                     unsigned long int n) internal_function;
+static char *plural_lookup PARAMS ((struct loaded_l10nfile *domain,
+                                   unsigned long int n,
+                                   const char *translation,
+                                   size_t translation_len))
+     internal_function;
+static unsigned long int plural_eval PARAMS ((struct expression *pexp,
+                                             unsigned long int n))
+     internal_function;
  static const char *category_to_name PARAMS ((int category)) internal_function;
  static const char *guess_category_value PARAMS ((int category,
                                                  const char *categoryname))
@@ -384,6 +387,7 @@ DCIGETTEXT (domainname, msgid1, msgid2, plural, n, category)
    char *dirname, *xdomainname;
    char *single_locale;
    char *retval;
+  size_t retlen;
    int saved_errno;
  #if defined HAVE_TSEARCH || defined _LIBC
    struct known_translation_t *search;
@@ -407,23 +411,26 @@ DCIGETTEXT (domainname, msgid1, msgid2, plural, n, category)
  #if defined HAVE_TSEARCH || defined _LIBC
    msgid_len = strlen (msgid1) + 1;
  
-  if (plural == 0)
+  /* Try to find the translation among those which we found at
+     some time.  */
+  search = (struct known_translation_t *)
+          alloca (offsetof (struct known_translation_t, msgid) + msgid_len);
+  memcpy (search->msgid, msgid1, msgid_len);
+  search->domainname = (char *) domainname;
+  search->category = category;
+
+  foundp = (struct known_translation_t **) tfind (search, &root, transcmp);
+  if (foundp != NULL && (*foundp)->counter == _nl_msg_cat_cntr)
      {
-      /* Try to find the translation among those which we found at
-        some time.  */
-      search = (struct known_translation_t *) alloca (sizeof (*search)
-                                                     + msgid_len);
-      memcpy (search->msgid, msgid1, msgid_len);
-      search->domain = (char *) domainname;
-      search->plindex = 0;
-      search->category = category;
-
-      foundp = (struct known_translation_t **) tfind (search, &root, transcmp);
-      if (foundp != NULL && (*foundp)->counter == _nl_msg_cat_cntr)
-       {
-         __libc_rwlock_unlock (_nl_state_lock);
-         return (char *) (*foundp)->translation;
-       }
+      /* Now deal with plural.  */
+      if (plural)
+       retval = plural_lookup ((*foundp)->domain, n, (*foundp)->translation,
+                               (*foundp)->translation_length);
+      else
+       retval = (char *) (*foundp)->translation;
+
+      __libc_rwlock_unlock (_nl_state_lock);
+      return retval;
      }
  #endif
  
@@ -558,39 +565,7 @@ DCIGETTEXT (domainname, msgid1, msgid2, plural, n, category)
  
        if (domain != NULL)
         {
-         unsigned long int index = 0;
-
-         if (plural != 0)
-           {
-             struct loaded_domain *domaindata =
-               (struct loaded_domain *) domain->data;
-             index = plural_eval (domaindata->plural, n);
-             if (index >= domaindata->nplurals)
-               /* This should never happen.  It means the plural expression
-                  and the given maximum value do not match.  */
-               index = 0;
-
-#if defined HAVE_TSEARCH || defined _LIBC
-             /* Try to find the translation among those which we
-                found at some time.  */
-             search = (struct known_translation_t *) alloca (sizeof (*search)
-                                                             + msgid_len);
-             memcpy (search->msgid, msgid1, msgid_len);
-             search->domain = (char *) domainname;
-             search->plindex = index;
-             search->category = category;
-
-             foundp = (struct known_translation_t **) tfind (search, &root,
-                                                             transcmp);
-             if (foundp != NULL && (*foundp)->counter == _nl_msg_cat_cntr)
-               {
-                 __libc_rwlock_unlock (_nl_state_lock);
-                 return (char *) (*foundp)->translation;
-               }
-#endif
-           }
-
-         retval = _nl_find_msg (domain, msgid1, index);
+         retval = _nl_find_msg (domain, msgid1, &retlen);
  
           if (retval == NULL)
             {
@@ -599,15 +574,20 @@ DCIGETTEXT (domainname, msgid1, msgid2, plural, n, category)
               for (cnt = 0; domain->successor[cnt] != NULL; ++cnt)
                 {
                   retval = _nl_find_msg (domain->successor[cnt], msgid1,
-                                        index);
+                                        &retlen);
  
                   if (retval != NULL)
-                   break;
+                   {
+                     domain = domain->successor[cnt];
+                     break;
+                   }
                 }
             }
  
           if (retval != NULL)
             {
+             /* Found the translation of MSGID1 in domain DOMAIN:
+                starting at RETVAL, RETLEN bytes.  */
               FREE_BLOCKS (block_list);
               __set_errno (saved_errno);
  #if defined HAVE_TSEARCH || defined _LIBC
@@ -621,12 +601,14 @@ DCIGETTEXT (domainname, msgid1, msgid2, plural, n, category)
                             + msgid_len + domainname_len + 1);
                   if (newp != NULL)
                     {
-                     newp->domain = mempcpy (newp->msgid, msgid1, msgid_len);
-                     memcpy (newp->domain, domainname, domainname_len + 1);
-                     newp->plindex = index;
+                     newp->domainname =
+                       mempcpy (newp->msgid, msgid1, msgid_len);
+                     memcpy (newp->domainname, domainname, domainname_len + 1);
                       newp->category = category;
                       newp->counter = _nl_msg_cat_cntr;
+                     newp->domain = domain;
                       newp->translation = retval;
+                     newp->translation_length = retlen;
  
                       /* Insert the entry in the search tree.  */
                       foundp = (struct known_translation_t **)
@@ -641,9 +623,15 @@ DCIGETTEXT (domainname, msgid1, msgid2, plural, n, category)
                 {
                   /* We can update the existing entry.  */
                   (*foundp)->counter = _nl_msg_cat_cntr;
+                 (*foundp)->domain = domain;
                   (*foundp)->translation = retval;
+                 (*foundp)->translation_length = retlen;
                 }
  #endif
+             /* Now deal with plural.  */
+             if (plural)
+               retval = plural_lookup (domain, n, retval, retlen);
+
               __libc_rwlock_unlock (_nl_state_lock);
               return retval;
             }
@@ -655,14 +643,15 @@ DCIGETTEXT (domainname, msgid1, msgid2, plural, n, category)
  
  char *
  internal_function
-_nl_find_msg (domain_file, msgid, index)
+_nl_find_msg (domain_file, msgid, lengthp)
       struct loaded_l10nfile *domain_file;
       const char *msgid;
-     unsigned long int index;
+     size_t *lengthp;
  {
    struct loaded_domain *domain;
    size_t act;
    char *result;
+  size_t resultlen;
  
    if (domain_file->decided == 0)
      _nl_load_domain (domain_file);
@@ -686,17 +675,21 @@ _nl_find_msg (domain_file, msgid, index)
         /* Hash table entry is empty.  */
         return NULL;
  
-      if (W (domain->must_swap, domain->orig_tab[nstr - 1].length) == len
-         && strcmp (msgid,
-                    domain->data + W (domain->must_swap,
-                                      domain->orig_tab[nstr - 1].offset)) == 0)
-       {
-         act = nstr - 1;
-         goto found;
-       }
-
        while (1)
         {
+         /* Compare msgid with the original string at index nstr-1.
+            We compare the lengths with >=, not ==, because plural entries
+            are represented by strings with an embedded NUL.  */
+         if (W (domain->must_swap, domain->orig_tab[nstr - 1].length) >= len
+             && (strcmp (msgid,
+                         domain->data + W (domain->must_swap,
+                                           domain->orig_tab[nstr - 1].offset))
+                 == 0))
+           {
+             act = nstr - 1;
+             goto found;
+           }
+
           if (idx >= domain->hash_size - incr)
             idx -= domain->hash_size - incr;
           else
@@ -706,16 +699,6 @@ _nl_find_msg (domain_file, msgid, index)
           if (nstr == 0)
             /* Hash table entry is empty.  */
             return NULL;
-
-         if (W (domain->must_swap, domain->orig_tab[nstr - 1].length) == len
-             && (strcmp (msgid,
-                         domain->data + W (domain->must_swap,
-                                           domain->orig_tab[nstr - 1].offset))
-                 == 0))
-           {
-             act = nstr - 1;
-             goto found;
-           }
         }
        /* NOTREACHED */
      }
@@ -751,6 +734,7 @@ _nl_find_msg (domain_file, msgid, index)
       string to use a different character set, this is the time.  */
    result = ((char *) domain->data
             + W (domain->must_swap, domain->trans_tab[act].offset));
+  resultlen = W (domain->must_swap, domain->trans_tab[act].length) + 1;
  
  #if defined _LIBC || HAVE_ICONV
    if (
@@ -767,9 +751,10 @@ _nl_find_msg (domain_file, msgid, index)
          appropriate table with the same structure as the table
          of translations in the file, where we can put the pointers
          to the converted strings in.
-        There is a slight complication with the INDEX: We don't know
-        a priori which entries are plural entries. Therefore at any
-        moment we can only translate the variants 0 .. INDEX.  */
+        There is a slight complication with plural entries.  They
+        are represented by consecutive NUL terminated strings.  We
+        handle this case by converting RESULTLEN bytes, including
+        NULs.  */
  
        if (domain->conv_tab == NULL
           && ((domain->conv_tab = (char **) calloc (domain->nstrings,
@@ -782,8 +767,7 @@ _nl_find_msg (domain_file, msgid, index)
         /* Nothing we can do, no more memory.  */
         goto converted;
  
-      if (domain->conv_tab[act] == NULL
-         || *(nls_uint32 *) domain->conv_tab[act] < index)
+      if (domain->conv_tab[act] == NULL)
         {
           /* We haven't used this string so far, so it is not
              translated yet.  Do this now.  */
@@ -795,7 +779,6 @@ _nl_find_msg (domain_file, msgid, index)
           static unsigned char *freemem;
           static size_t freemem_size;
  
-         size_t resultlen;
           const unsigned char *inbuf;
           unsigned char *outbuf;
           int malloc_count;
@@ -803,21 +786,10 @@ _nl_find_msg (domain_file, msgid, index)
           transmem_block_t *transmem_list = NULL;
  # endif
  
-         /* Note that we translate (index + 1) consecutive strings at
-            once, including the final NUL byte.  */
-         {
-           unsigned long int i = index;
-           char *p = result;
-           do
-             p += strlen (p) + 1;
-           while (i-- > 0);
-           resultlen = p - result;
-         }
-
           __libc_lock_lock (lock);
  
           inbuf = result;
-         outbuf = freemem + sizeof (nls_uint32);
+         outbuf = freemem + sizeof (size_t);
  
           malloc_count = 0;
           while (1)
@@ -827,13 +799,13 @@ _nl_find_msg (domain_file, msgid, index)
               size_t non_reversible;
               int res;
  
-             if (freemem_size < sizeof (nls_uint32))
+             if (freemem_size < sizeof (size_t))
                 goto resize_freemem;
  
               res = __gconv (domain->conv,
                              &inbuf, inbuf + resultlen,
                              &outbuf,
-                            outbuf + freemem_size - sizeof (nls_uint32),
+                            outbuf + freemem_size - sizeof (size_t),
                              &non_reversible);
  
               if (res == __GCONV_OK || res == __GCONV_EMPTY_INPUT)
@@ -853,10 +825,10 @@ _nl_find_msg (domain_file, msgid, index)
               char *outptr = (char *) outbuf;
               size_t outleft;
  
-             if (freemem_size < sizeof (nls_uint32))
+             if (freemem_size < sizeof (size_t))
                 goto resize_freemem;
  
-             outleft = freemem_size - sizeof (nls_uint32);
+             outleft = freemem_size - sizeof (size_t);
               if (iconv (domain->conv, &inptr, &inleft, &outptr, &outleft)
                   != (size_t) (-1))
                 {
@@ -918,25 +890,26 @@ _nl_find_msg (domain_file, msgid, index)
               freemem = newmem;
  # endif
  
-             outbuf = freemem + sizeof (nls_uint32);
+             outbuf = freemem + sizeof (size_t);
             }
  
           /* We have now in our buffer a converted string.  Put this
              into the table of conversions.  */
-         *(nls_uint32 *) freemem = index;
+         *(size_t *) freemem = outbuf - freemem - sizeof (size_t);
           domain->conv_tab[act] = freemem;
           /* Shrink freemem, but keep it aligned.  */
           freemem_size -= outbuf - freemem;
           freemem = outbuf;
-         freemem += freemem_size & (alignof (nls_uint32) - 1);
-         freemem_size = freemem_size & ~ (alignof (nls_uint32) - 1);
+         freemem += freemem_size & (alignof (size_t) - 1);
+         freemem_size = freemem_size & ~ (alignof (size_t) - 1);
  
           __libc_lock_unlock (lock);
         }
  
-      /* Now domain->conv_tab[act] contains the translation of at least
-        the variants 0 .. INDEX.  */
-      result = domain->conv_tab[act] + sizeof (nls_uint32);
+      /* Now domain->conv_tab[act] contains the translation of all
+        the plural variants.  */
+      result = domain->conv_tab[act] + sizeof (size_t);
+      resultlen = *(size_t *) domain->conv_tab[act];
      }
  
   converted:
@@ -944,26 +917,58 @@ _nl_find_msg (domain_file, msgid, index)
  
  #endif /* _LIBC || HAVE_ICONV */
  
-  /* Now skip some strings.  How much depends on the index passed in.  */
+  *lengthp = resultlen;
+  return result;
+}
+
+
+/* Look up a plural variant.  */
+static char *
+internal_function
+plural_lookup (domain, n, translation, translation_len)
+     struct loaded_l10nfile *domain;
+     unsigned long int n;
+     const char *translation;
+     size_t translation_len;
+{
+  struct loaded_domain *domaindata = (struct loaded_domain *) domain->data;
+  unsigned long int index;
+  const char *p;
+
+  index = plural_eval (domaindata->plural, n);
+  if (index >= domaindata->nplurals)
+    /* This should never happen.  It means the plural expression and the
+       given maximum value do not match.  */
+    index = 0;
+
+  /* Skip INDEX strings at TRANSLATION.  */
+  p = translation;
    while (index-- > 0)
      {
  #ifdef _LIBC
-      result = __rawmemchr (result, '\0');
+      p = __rawmemchr (p, '\0');
  #else
-      result = strchr (result, '\0');
+      p = strchr (p, '\0');
  #endif
        /* And skip over the NUL byte.  */
-      ++result;
-    }
+      p++;
  
-  return result;
+      if (p >= translation + translation_len)
+       /* This should never happen.  It means the plural expression
+          evaluated to a value larger than the number of variants
+          available for MSGID1.  */
+       return (char *) translation;
+    }
+  return (char *) p;
  }
  
  
  /* Function to evaluate the plural expression and return an index value.  */
  static unsigned long int
  internal_function
-plural_eval (struct expression *pexp, unsigned long int n)
+plural_eval (pexp, n)
+     struct expression *pexp;
+     unsigned long int n;
  {
    switch (pexp->operation)
      {
diff --git a/intl/intl-compat.c b/intl/intl-compat.c

index c6aeb4677d93210b40a83b3a4dd0085e69aa7f81..299773b851e1fe316d3c21cc984733bd11aaac49 100644 (file)
--- a/intl/intl-compat.c
+++ b/intl/intl-compat.c
@@ -1,6 +1,6 @@
  /* intl-compat.c - Stub functions to call gettext functions from GNU gettext
     Library.
-   Copyright (C) 1995, 2000 Software Foundation, Inc.
+   Copyright (C) 1995, 2000, 2001 Software Foundation, Inc.
  
  This program is free software; you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
@@ -28,6 +28,9 @@ Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
  #undef gettext
  #undef dgettext
  #undef dcgettext
+#undef ngettext
+#undef dngettext
+#undef dcngettext
  #undef textdomain
  #undef bindtextdomain
  #undef bind_textdomain_codeset
@@ -78,6 +81,39 @@ gettext (msgid)
  }
  
  
+char *
+dcngettext (domainname, msgid1, msgid2, n, category)
+     const char *domainname;
+     const char *msgid1;
+     const char *msgid2;
+     unsigned long int n;
+     int category;
+{
+  return dcngettext__ (domainname, msgid1, msgid2, n, category);
+}
+
+
+char *
+dngettext (domainname, msgid1, msgid2, n)
+     const char *domainname;
+     const char *msgid1;
+     const char *msgid2;
+     unsigned long int n;
+{
+  return dngettext__ (domainname, msgid1, msgid2, n);
+}
+
+
+char *
+ngettext (msgid1, msgid2, n)
+     const char *msgid1;
+     const char *msgid2;
+     unsigned long int n;
+{
+  return ngettext__ (msgid1, msgid2, n);
+}
+
+
  char *
  textdomain (domainname)
       const char *domainname;
diff --git a/intl/intlh.inst.in b/intl/intlh.inst.in

index 8a30511f7700d8d9ac3c962adaa4cb9fea8c9dad..fe8281fecea682b94586d466a6fea8845d708733 100644 (file)
--- a/intl/intlh.inst.in
+++ b/intl/intlh.inst.in
@@ -1,5 +1,5 @@
  /* Message catalogs for internationalization.
-   Copyright (C) 1995, 1996, 1997, 2000 Free Software Foundation, Inc.
+   Copyright (C) 1995-1997, 2000, 2001 Free Software Foundation, Inc.
  
     This program is free software; you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
@@ -95,11 +95,11 @@ extern char *bind_textdomain_codeset PARAMS ((const char *__domainname,
  # define dgettext(domainname, msgid)                                         \
    dcgettext (domainname, msgid, LC_MESSAGES)
  
-# define ngettext(msgid, N)                                                  \
-    dngettext (NULL, msgid, N)
+# define ngettext(msgid1, Msgid2, N)                                         \
+    dngettext (NULL, msgid1, Msgid2, N)
  
-# define dngettext(domainname, msgid, n)                                     \
-    dcngettext (domainname, msgid, n, LC_MESSAGES)
+# define dngettext(domainname, msgid1, Msgid2, n)                            \
+    dcngettext (domainname, msgid1, Msgid2, n, LC_MESSAGES)
  
  #endif /* Optimizing. */
  
diff --git a/intl/libgettext.h b/intl/libgettext.h

index 58c83e6fa04d63412aad015cfcc01de390ccc99f..ff453ecacd58aff4936f5ae613e24501f6d1f606 100644 (file)
--- a/intl/libgettext.h
+++ b/intl/libgettext.h
@@ -1,5 +1,5 @@
  /* Message catalogs for internationalization.
-   Copyright (C) 1995, 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
+   Copyright (C) 1995-1998, 2000, 2001 Free Software Foundation, Inc.
  
     This program is free software; you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
@@ -112,17 +112,25 @@ extern char *dcgettext__ PARAMS ((const char *__domainname,
     number N.  */
  extern char *ngettext PARAMS ((const char *__msgid1, const char *__msgid2,
                                unsigned long int __n));
+extern char *ngettext__ PARAMS ((const char *__msgid1, const char *__msgid2,
+                                unsigned long int __n));
  
  /* Similar to `dgettext' but select the plural form corresponding to the
     number N.  */
  extern char *dngettext PARAMS ((const char *__domainname, const char *__msgid1,
                                 const char *__msgid2, unsigned long int __n));
+extern char *dngettext__ PARAMS ((const char *__domainname,
+                                 const char *__msgid1, const char *__msgid2,
+                                 unsigned long int __n));
  
  /* Similar to `dcgettext' but select the plural form corresponding to the
     number N.  */
  extern char *dcngettext PARAMS ((const char *__domainname, const char *__msgid1,
                                  const char *__msgid2, unsigned long int __n,
                                  int __category));
+extern char *dcngettext__ PARAMS ((const char *__domainname,
+                                  const char *__msgid1, const char *__msgid2,
+                                  unsigned long int __n, int __category));
  
  
  /* Set the current default message catalog to DOMAINNAME.
@@ -158,11 +166,11 @@ extern char *bind_textdomain_codeset__ PARAMS ((const char *__domainname,
  #  define dgettext(Domainname, Msgid)                                        \
       dcgettext (Domainname, Msgid, LC_MESSAGES)
  
-#  define ngettext(Msgid, N)                                                 \
-     dngettext (NULL, Msgid, N)
+#  define ngettext(Msgid1, Msgid2, N)                                        \
+     dngettext (NULL, Msgid1, Msgid2, N)
  
-#  define dngettext(Domainname, Msgid, N)                                    \
-     dcngettext (Domainname, Msgid, N, LC_MESSAGES)
+#  define dngettext(Domainname, Msgid1, Msgid2, N)                           \
+     dcngettext (Domainname, Msgid1, Msgid2, N, LC_MESSAGES)
  
  # endif
  
diff --git a/intl/loadinfo.h b/intl/loadinfo.h

index 3942763b7210f4043fe2d731411ccf5485d727cb..32184509998792c0fd8b49ebf247d94ebf9ec875 100644 (file)
--- a/intl/loadinfo.h
+++ b/intl/loadinfo.h
@@ -100,7 +100,7 @@ extern char *_nl_find_language PARAMS ((const char *name));
  
  
  extern char *_nl_find_msg PARAMS ((struct loaded_l10nfile *domain_file,
-                                  const char *msgid, unsigned long int index))
+                                  const char *msgid, size_t *lengthp))
       internal_function;
  
  #endif /* loadinfo.h */
diff --git a/intl/loadmsgcat.c b/intl/loadmsgcat.c

index a679de48cc92f5a06452a368031d0240cf8e5302..6324553671e798c6a35fa904121eeaa3bc2874d1 100644 (file)
--- a/intl/loadmsgcat.c
+++ b/intl/loadmsgcat.c
@@ -163,6 +163,7 @@ _nl_load_domain (domain_file)
    int use_mmap = 0;
    struct loaded_domain *domain;
    char *nullentry;
+  size_t nullentrylen;
  
    domain_file->decided = 1;
    domain_file->data = NULL;
@@ -306,7 +307,7 @@ _nl_load_domain (domain_file)
  # endif
  #endif
    domain->conv_tab = NULL;
-  nullentry = _nl_find_msg (domain_file, "", 0);
+  nullentry = _nl_find_msg (domain_file, "", &nullentrylen);
    if (nullentry != NULL)
      {
  #if defined _LIBC || HAVE_ICONV
diff --git a/src/ChangeLog b/src/ChangeLog

index 7d4beb53194cd02e53cc53d6cf249ec90024b013..d061b04bee3fc2b1f5abde2f89d8acc3e3335338 100644 (file)
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,73 @@
+2001-01-01  Bruno Haible  <haible@clisp.cons.org>
+
+       Implement plural form handling.
+       * message.h (struct message_variant_ty): Add msgstr_len field.
+       (struct message_ty): Add msgid_plural field.
+       (message_alloc): Take additional msgid_plural argument.
+       (message_variant_append): Take additional msgstr_len argument.
+       * message.c (message_alloc): Take additional msgid_plural argument.
+       (message_free): Free msgid_plural field.
+       (message_variant_append): Take additional msgstr_len argument.
+       (message_copy): Copy msgid_plural as well. Pass msgstr_len.
+       (message_merge): Likewise.
+       (message_print): Print plural entries using a different format.
+       (message_print_obsolete): Likewise.
+       * msgunfmt.c (string32): Return the string's size as well. Verify
+       the string is NUL terminated.
+       (read_mo_file): Split the original string into msgid and msgid_plural.
+       Pass msgstr_len.
+       * po-lex.h (msgstr_def): New definition, taken from msgfmt.c.
+       * po-lex.c (keyword_p): Recognize the msgid_plural keyword.
+       (po_gram_lex): Accept brackets as single-character tokens.
+       * po.h (struct po_method_ty): Method 'directive_message' takes
+       additional arguments 'msgid_plural', 'msgstr_len'.
+       (po_callback_message): Additional arguments 'msgid_plural',
+       'msgstr_len'.
+       * po-hash-gen.y (yyerror): Effectively rename to po_hash_error.
+       * po-gram-gen.y (yyerror): Effectively rename to po_gram_error,
+       thus enabling reporting of syntax errors.
+       (plural_counter): New variable.
+       (%token): Add MSGID_PLURAL, '[', ']' as new tokens.
+       (%union): Add new alternative of type 'struct msgstr_def'.
+       (msgid_pluralform, pluralform, pluralform_list): New productions.
+       (message): Add plural rules.
+       * po.c (po_directive_message): Additional arguments 'msgid_plural',
+       'msgstr_len'.
+       (po_callback_message): Likewise.
+       * msgfmt.c (SIZEOF): New macro.
+       (struct id_str_pair): Add id_len, id_plural, id_plural_len, str_len
+       fields.
+       (struct hashtable_entry): Renamed from struct msgstr_def. Add
+       'msgid_plural', 'msgstr_len' fields.
+       (format_directive_message): Additional arguments 'msgid_plural',
+       'msgstr_len'. Verify the validity of the charset field in the header.
+       Compare msgstr using memcmp, not strcmp.
+       (check_pair): Additional arguments 'msgid_plural', 'msgstr_len'.
+       Apply the tests to msgid_plural and each msgstr[i] string.
+       (format_debrief): Change error message.
+       (write_table): Store msgid_plural and msgstr_len in msg_arr[], then
+       output the strings including embedded NULs.
+       * msgcmp.c (compare_directive_message): Additional arguments
+       'msgid_plural', 'msgstr_len'.
+       * msgcomm.c (extract_directive_message): Additional arguments
+       'msgid_plural', 'msgstr_len'.
+       * msgmerge.c (merge_directive_message): Additional arguments
+       'msgid_plural', 'msgstr_len'.
+       * xget-lex.h (struct xgettext_token_ty): Replace argnum field with
+       argnum1, argnum2.
+       * xget-lex.c (xgettext_lex): Add to default keywords: "ngettext:1,2",
+       "dngettext:2,3", "dcngettext:2,3".
+       (xgettext_lex_keyword): Accept new syntax "id:argnum1,argnum2".
+       * xgettext.c (exclude_directive_message): Additional arguments
+       'msgid_plural', 'msgstr_len'.
+       (remember_a_message): Return the new message.
+       (remember_a_message_plural): New function.
+       (scan_c_file): Extend state machine to allow remembering msgid1 and
+       msgid2 later.
+       (extract_directive_message): Additional arguments 'msgid_plural',
+       'msgstr_len'. Compare msgstr using memcmp, not strcmp.
+       (construct_header): Update.
+
  2000-12-31  Bruno Haible  <haible@clisp.cons.org>
  
         * msgfmt.c (format_directive_message): Pass to insert_entry and
diff --git a/src/message.c b/src/message.c

index 26c21eb45a9ac51fa6f685c3612a0d73d840f360..5749a6e2177ac50dde096f3f94971a925167f3fb 100644 (file)
--- a/src/message.c
+++ b/src/message.c
@@ -1,5 +1,5 @@
  /* GNU gettext - internationalization aids
-   Copyright (C) 1995, 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
+   Copyright (C) 1995-1998, 2000, 2001 Free Software Foundation, Inc.
  
     This file was written by Peter Miller <millerp@canb.auug.org.au>
  
@@ -186,13 +186,15 @@ make_c_width_description_string (do_wrap)
  
  
  message_ty *
-message_alloc (msgid)
+message_alloc (msgid, msgid_plural)
       char *msgid;
+     const char *msgid_plural;
  {
    message_ty *mp;
  
    mp = xmalloc (sizeof (message_ty));
    mp->msgid = msgid;
+  mp->msgid_plural = (msgid_plural != NULL ? xstrdup (msgid_plural) : NULL);
    mp->comment = NULL;
    mp->comment_dot = NULL;
    mp->filepos_count = 0;
@@ -219,6 +221,8 @@ message_free (mp)
    if (mp->comment_dot != NULL)
      string_list_free (mp->comment_dot);
    free ((char *) mp->msgid);
+  if (mp->msgid_plural != NULL)
+    free ((char *) mp->msgid_plural);
    for (j = 0; j < mp->variant_count; ++j)
      free ((char *) mp->variant[j].msgstr);
    if (mp->variant != NULL)
@@ -250,10 +254,11 @@ message_variant_search (mp, domain)
  
  
  void
-message_variant_append (mp, domain, msgstr, pp)
+message_variant_append (mp, domain, msgstr, msgstr_len, pp)
       message_ty *mp;
       const char *domain;
       const char *msgstr;
+     size_t msgstr_len;
       const lex_pos_ty *pp;
  {
    size_t nbytes;
@@ -264,6 +269,7 @@ message_variant_append (mp, domain, msgstr, pp)
    mvp = &mp->variant[mp->variant_count++];
    mvp->domain = domain;
    mvp->msgstr = msgstr;
+  mvp->msgstr_len = msgstr_len;
    mvp->pos = *pp;
  }
  
@@ -297,12 +303,13 @@ message_copy (mp)
    message_ty *result;
    size_t j;
  
-  result = message_alloc (xstrdup (mp->msgid));
+  result = message_alloc (xstrdup (mp->msgid), mp->msgid_plural);
  
    for (j = 0; j < mp->variant_count; ++j)
      {
        message_variant_ty *mvp = &mp->variant[j];
-      message_variant_append (result, mvp->domain, mvp->msgstr, &mvp->pos);
+      message_variant_append (result, mvp->domain, mvp->msgstr, mvp->msgstr_len,
+                             &mvp->pos);
      }
    if (mp->comment)
      {
@@ -339,7 +346,7 @@ message_merge (def, ref)
    /* Take the msgid from the reference.  When fuzzy matches are made,
       the definition will not be unique, but the reference will be -
       usually because it has a typo.  */
-  result = message_alloc (xstrdup (ref->msgid));
+  result = message_alloc (xstrdup (ref->msgid), ref->msgid_plural);
  
    /* If msgid is the header entry (i.e., "") we find the
       POT-Creation-Date line in the reference.  */
@@ -503,10 +510,12 @@ message_merge (def, ref)
           if (header_fields[UNKNOWN].string != NULL)
             stpcpy (newp, header_fields[UNKNOWN].string);
  
-         message_variant_append (result, mvp->domain, cp, &mvp->pos);
+         message_variant_append (result, mvp->domain, cp, strlen (cp) + 1,
+                                 &mvp->pos);
         }
        else
-       message_variant_append (result, mvp->domain, mvp->msgstr, &mvp->pos);
+       message_variant_append (result, mvp->domain, mvp->msgstr,
+                               mvp->msgstr_len, &mvp->pos);
      }
  
    /* Take the comments from the definition file.  There will be none at
@@ -1203,7 +1212,37 @@ message_print (mp, fp, domain, blank_line, debug)
       are as readable as possible.  If there is no recorded msgstr for
       this domain, emit an empty string.  */
    wrap (fp, NULL, "msgid", mp->msgid, mp->do_wrap);
-  wrap (fp, NULL, "msgstr", mvp ? mvp->msgstr : "", mp->do_wrap);
+  if (mp->msgid_plural != NULL)
+    wrap (fp, NULL, "msgid_plural", mp->msgid_plural, mp->do_wrap);
+
+  if (mp->msgid_plural == NULL)
+    wrap (fp, NULL, "msgstr", mvp ? mvp->msgstr : "", mp->do_wrap);
+  else
+    {
+      char prefix_buf[20];
+      unsigned int i;
+
+      if (mvp)
+       {
+         const char *p;
+
+         for (p = mvp->msgstr, i = 0;
+              p < mvp->msgstr + mvp->msgstr_len;
+              p += strlen (p) + 1, i++)
+           {
+             sprintf (prefix_buf, "msgstr[%u]", i);
+             wrap (fp, NULL, prefix_buf, p, mp->do_wrap);
+           }
+       }
+      else
+       {
+         for (i = 0; i < 2; i++)
+           {
+             sprintf (prefix_buf, "msgstr[%u]", i);
+             wrap (fp, NULL, prefix_buf, "", mp->do_wrap);
+           }
+       }
+    }
  }
  
  
@@ -1289,7 +1328,25 @@ message_print_obsolete (mp, fp, domain, blank_line)
    /* Print each of the message components.  Wrap them nicely so they
       are as readable as possible.  */
    wrap (fp, "#~ ", "msgid", mp->msgid, mp->do_wrap);
-  wrap (fp, "#~ ", "msgstr", mvp->msgstr, mp->do_wrap);
+  if (mp->msgid_plural != NULL)
+    wrap (fp, "#~ ", "msgid_plural", mp->msgid_plural, mp->do_wrap);
+
+  if (mp->msgid_plural == NULL)
+    wrap (fp, "#~ ", "msgstr", mvp->msgstr, mp->do_wrap);
+  else
+    {
+      char prefix_buf[20];
+      unsigned int i;
+      const char *p;
+
+      for (p = mvp->msgstr, i = 0;
+          p < mvp->msgstr + mvp->msgstr_len;
+          p += strlen (p) + 1, i++)
+       {
+         sprintf (prefix_buf, "msgstr[%u]", i);
+         wrap (fp, "#~ ", prefix_buf, p, mp->do_wrap);
+       }
+    }
  }
  
  
diff --git a/src/message.h b/src/message.h

index ed9e8f8a7c588e913573c7dcf1c870f785ce0bf7..95e6e3db9e6eee753fcf3f05559f2dc1a55cf5fa 100644 (file)
--- a/src/message.h
+++ b/src/message.h
@@ -63,8 +63,11 @@ typedef struct message_variant_ty message_variant_ty;
  struct message_variant_ty
  {
    const char *domain;
+
    lex_pos_ty pos;
+
    const char *msgstr;
+  size_t msgstr_len;
  };
  
  typedef struct message_ty message_ty;
@@ -92,6 +95,9 @@ struct message_ty
    /* The msgid string.  */
    const char *msgid;
  
+  /* The msgid's plural, if present.  */
+  const char *msgid_plural;
+
    /* The msgstr strings, one for each observed domain in the file.  */
    size_t variant_count;
    message_variant_ty *variant;
@@ -105,13 +111,13 @@ struct message_ty
    int obsolete;
  };
  
-message_ty *message_alloc PARAMS ((char *msgid));
+message_ty *message_alloc PARAMS ((char *msgid, const char *msgid_plural));
  void message_free PARAMS ((message_ty *));
  
  message_variant_ty *message_variant_search PARAMS ((message_ty *mp,
                                                     const char *domain));
  void message_variant_append PARAMS ((message_ty *mp, const char *domain,
-                                    const char *msgstr,
+                                    const char *msgstr, size_t msgstr_len,
                                      const lex_pos_ty *pp));
  void message_comment_append PARAMS ((message_ty *, const char *));
  void message_comment_dot_append PARAMS ((message_ty *, const char *));
diff --git a/src/msgcmp.c b/src/msgcmp.c

index 785162847d9b48871af53b8e374b4fea5d7fea4f..059849e04811be44e27e72b0e0588e373b380e86 100644 (file)
--- a/src/msgcmp.c
+++ b/src/msgcmp.c
@@ -83,7 +83,9 @@ static void compare_destructor PARAMS ((po_ty *__that));
  static void compare_directive_domain PARAMS ((po_ty *__that, char *__name));
  static void compare_directive_message PARAMS ((po_ty *__that, char *__msgid,
                                                lex_pos_ty *msgid_pos,
+                                              char *__msgid_plural,
                                                char *__msgstr,
+                                              size_t __msgstr_len,
                                                lex_pos_ty *__msgstr_pos));
  static void compare_parse_debrief PARAMS ((po_ty *__that));
  
@@ -324,11 +326,14 @@ compare_directive_domain (that, name)
  
  
  static void
-compare_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
+compare_directive_message (that, msgid, msgid_pos, msgid_plural,
+                          msgstr, msgstr_len, msgstr_pos)
       po_ty *that;
       char *msgid;
       lex_pos_ty *msgid_pos;
+     char *msgid_plural;
       char *msgstr;
+     size_t msgstr_len;
       lex_pos_ty *msgstr_pos;
  {
    compare_class_ty *this = (compare_class_ty *) that;
@@ -344,7 +349,7 @@ compare_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
      free (msgid);
    else
      {
-      mp = message_alloc (msgid);
+      mp = message_alloc (msgid, msgid_plural);
        message_list_append (this->mlp, mp);
      }
  
@@ -358,7 +363,7 @@ compare_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
        free (msgstr);
      }
    else
-    message_variant_append (mp, this->domain, msgstr, msgstr_pos);
+    message_variant_append (mp, this->domain, msgstr, msgstr_len, msgstr_pos);
  }
  
  
diff --git a/src/msgcomm.c b/src/msgcomm.c

index e78da184eb3f8b55733e1e6ed8e8e87c8039277c..95d7c8840ca84431b04a9d56eb5ea32535656766 100644 (file)
--- a/src/msgcomm.c
+++ b/src/msgcomm.c
@@ -118,7 +118,9 @@ static void extract_constructor PARAMS ((po_ty *__that));
  static void extract_directive_domain PARAMS ((po_ty *__that, char *__name));
  static void extract_directive_message PARAMS ((po_ty *__that, char *__msgid,
                                                lex_pos_ty *__msgid_pos,
+                                              char *__msgid_plural,
                                                char *__msgstr,
+                                              size_t __msgstr_len,
                                                lex_pos_ty *__msgstr_pos));
  static void extract_parse_brief PARAMS ((po_ty *__that));
  static void extract_comment PARAMS ((po_ty *__that, const char *__s));
@@ -559,11 +561,14 @@ extract_directive_domain (that, name)
  
  
  static void
-extract_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
+extract_directive_message (that, msgid, msgid_pos, msgid_plural,
+                          msgstr, msgstr_len, msgstr_pos)
       po_ty *that;
       char *msgid;
       lex_pos_ty *msgid_pos;
+     char *msgid_plural;
       char *msgstr;
+     size_t msgstr_len;
       lex_pos_ty *msgstr_pos;
  {
    extract_class_ty *this = (extract_class_ty *)that;
@@ -599,7 +604,7 @@ extract_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
      free (msgid);
    else
      {
-      mp = message_alloc (msgid);
+      mp = message_alloc (msgid, msgid_plural);
        message_list_append (this->mlp, mp);
      }
  
@@ -656,7 +661,8 @@ extract_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
    if (mvp != NULL)
      free (msgstr);
    else
-    message_variant_append (mp, MESSAGE_DOMAIN_DEFAULT, msgstr, msgstr_pos);
+    message_variant_append (mp, MESSAGE_DOMAIN_DEFAULT, msgstr, msgstr_len,
+                           msgstr_pos);
  }
  
  
diff --git a/src/msgfmt.c b/src/msgfmt.c

index 19b69c199bf845cca109852dcf5c770176b4594c..97a76b83234add93f27853453eb37f3b4c7c27ca 100644 (file)
--- a/src/msgfmt.c
+++ b/src/msgfmt.c
@@ -55,18 +55,26 @@
  extern int errno;
  #endif
  
+#define SIZEOF(a) (sizeof(a) / sizeof(a[0]))
+
  /* Define the data structure which we need to represent the data to
     be written out.  */
  struct id_str_pair
  {
    char *id;
+  size_t id_len;
+  char *id_plural;
+  size_t id_plural_len;
    char *str;
+  size_t str_len;
  };
  
  /* Contains information about the definition of one translation.  */
-struct msgstr_def
+struct hashtable_entry
  {
+  char *msgid_plural;
    char *msgstr;
+  size_t msgstr_len;
    lex_pos_ty pos;
  };
  
@@ -172,7 +180,9 @@ static void format_constructor PARAMS ((po_ty *__that));
  static void format_directive_domain PARAMS ((po_ty *__pop, char *__name));
  static void format_directive_message PARAMS ((po_ty *__pop, char *__msgid,
                                               lex_pos_ty *__msgid_pos,
+                                             char *__msgid_plural,
                                               char *__msgstr,
+                                             size_t __msgstr_len,
                                               lex_pos_ty *__msgstr_pos));
  static void format_comment_special PARAMS ((po_ty *pop, const char *s));
  static void format_debrief PARAMS((po_ty *));
@@ -180,7 +190,8 @@ static struct msg_domain *new_domain PARAMS ((const char *name));
  static int compare_id PARAMS ((const void *pval1, const void *pval2));
  static void write_table PARAMS ((FILE *output_file, hash_table *tab));
  static void check_pair PARAMS ((const char *msgid, const lex_pos_ty *msgid_pos,
-                               const char *msgstr,
+                               const char *msgid_plural,
+                               const char *msgstr, size_t msgstr_len,
                                 const lex_pos_ty *msgstr_pos, int is_format));
  static const char *add_mo_suffix PARAMS ((const char *));
  
@@ -454,10 +465,13 @@ format_debrief (that)
  {
    msgfmt_class_ty *this = (msgfmt_class_ty *) that;
  
-  /* If in verbose mode, test whether header entry was found.  */
+  /* Test whether header entry was found.
+     FIXME: Should do this even if not in verbose mode, because the
+     consequences are not harmless.  But it breaks the test suite.  */
    if (verbose_level > 0 && this->has_header_entry == 0)
-    error (0, 0, _("%s: warning: PO file header missing, fuzzy, or invalid"),
-          gram_pos.file_name);
+    error (0, 0, _("%s: warning: PO file header missing, fuzzy, or invalid\n\
+%*s  warning: charset conversion will not work"),
+          gram_pos.file_name, strlen (gram_pos.file_name), "");
  }
  
  
@@ -507,16 +521,18 @@ domain name \"%s\" not suitable as file name: will use prefix"), name);
  
  /* Process `msgid'/`msgstr' pair from .po file.  */
  static void
-format_directive_message (that, msgid_string, msgid_pos, msgstr_string,
-                         msgstr_pos)
+format_directive_message (that, msgid_string, msgid_pos, msgid_plural,
+                         msgstr_string, msgstr_len, msgstr_pos)
       po_ty *that;
       char *msgid_string;
       lex_pos_ty *msgid_pos;
+     char *msgid_plural;
       char *msgstr_string;
+     size_t msgstr_len;
       lex_pos_ty *msgstr_pos;
  {
    msgfmt_class_ty *this = (msgfmt_class_ty *) that;
-  struct msgstr_def *entry;
+  struct hashtable_entry *entry;
  
    if (msgstr_string[0] == '\0' || (!include_all && this->is_fuzzy))
      {
@@ -559,8 +575,7 @@ format_directive_message (that, msgid_string, msgid_pos, msgstr_string,
             "PACKAGE VERSION", "YEAR-MO-DA", "FULL NAME", "LANGUAGE",
             NULL, "text/plain; charset=CHARSET", "ENCODING"
           };
-         const size_t nfields = (sizeof (required_fields)
-                                 / sizeof (required_fields[0]));
+         const size_t nfields = SIZEOF (required_fields);
           int initial = -1;
           int cnt;
  
@@ -596,6 +611,87 @@ some header fields still have the initial default value"));
             error (0, 0, _("field `%s' still has initial default value"),
                    required_fields[initial]);
         }
+
+      /* Verify the validity of CHARSET.  Even if not in verbose mode,
+        because the consequences are not harmless.  */
+      {
+       const char *charsetstr = strstr (msgstr_string, "charset=");
+
+       if (charsetstr != NULL)
+         {
+           /* The list of charsets supported by glibc's iconv() and by
+              the portable iconv() across platforms.  Taken from
+              intl/config.charset.  */
+           static const char *standard_charsets[] =
+           {
+             "ASCII", "ANSI_X3.4-1968", "US-ASCII",
+             "ISO-8859-1", "ISO_8859-1",
+             "ISO-8859-2", "ISO_8859-2",
+             "ISO-8859-3", "ISO_8859-3",
+             "ISO-8859-4", "ISO_8859-4",
+             "ISO-8859-5", "ISO_8859-5",
+             "ISO-8859-6", "ISO_8859-6",
+             "ISO-8859-7", "ISO_8859-7",
+             "ISO-8859-8", "ISO_8859-8",
+             "ISO-8859-9", "ISO_8859-9",
+             "ISO-8859-13", "ISO_8859-13",
+             "ISO-8859-15", "ISO_8859-15",
+             "KOI8-R",
+             "KOI8-U",
+             "CP850",
+             "CP866",
+             "CP874",
+             "CP932",
+             "CP949",
+             "CP950",
+             "CP1250",
+             "CP1251",
+             "CP1252",
+             "CP1253",
+             "CP1254",
+             "CP1255",
+             "CP1256",
+             "CP1257",
+             "GB2312",
+             "EUC-JP",
+             "EUC-KR",
+             "EUC-TW",
+             "BIG5",
+             "BIG5HKSCS",
+             "GBK",
+             "GB18030",
+             "SJIS",
+             "JOHAB",
+             "TIS-620",
+             "VISCII",
+             "UTF-8"
+           };
+           size_t len;
+           char *charset;
+           size_t i;
+
+           charsetstr += strlen ("charset=");
+           len = strcspn (charsetstr, " \t\n");
+           charset = (char *) alloca (len + 1);
+           memcpy (charset, charsetstr, len);
+           charset[len] = '\0';
+
+           for (i = 0; i < SIZEOF (standard_charsets); i++)
+             if (strcasecmp (charset, standard_charsets[i]) == 0)
+               break;
+           if (i == SIZEOF (standard_charsets))
+             error (0, 0, _("\
+%s: warning: charset \"%s\" is not a portable encoding name\n\
+%*s  warning: charset conversion might not work"),
+                    gram_pos.file_name, charset,
+                    strlen (gram_pos.file_name), "");
+         }
+       else
+         error (0, 0, _("\
+%s: warning: charset missing in header\n\
+%*s  warning: charset conversion will not work"),
+                gram_pos.file_name, strlen (gram_pos.file_name), "");
+      }
      }
    else
      /* We don't count the header entry in the statistic so place the
@@ -607,13 +703,16 @@ some header fields still have the initial default value"));
  
    /* We found a valid pair of msgid/msgstr.
       Construct struct to describe msgstr definition.  */
-  entry = (struct msgstr_def *) xmalloc (sizeof (*entry));
+  entry = (struct hashtable_entry *) xmalloc (sizeof (*entry));
  
+  entry->msgid_plural = msgid_plural;
    entry->msgstr = msgstr_string;
+  entry->msgstr_len = msgstr_len;
    entry->pos = *msgstr_pos;
  
    /* Do some more checks on both strings.  */
-  check_pair (msgid_string, msgid_pos, msgstr_string, msgstr_pos,
+  check_pair (msgid_string, msgid_pos, msgid_plural,
+             msgstr_string, msgstr_len, msgstr_pos,
               do_check && possible_c_format_p (this->is_c_format));
  
    /* Check whether already a domain is specified.  If not use default
@@ -636,7 +735,8 @@ some header fields still have the initial default value"));
              definition for reference.  */
           find_entry (&current_domain->symbol_tab, msgid_string,
                       strlen (msgid_string) + 1, (void **) &entry);
-         if (0 != strcmp(msgstr_string, entry->msgstr))
+         if (msgstr_len != entry->msgstr_len
+             || memcmp (msgstr_string, entry->msgstr, msgstr_len) != 0)
             {
               po_gram_error_at_line (msgid_pos, _("\
  duplicate message definition"));
@@ -739,7 +839,7 @@ write_table (output_file, tab)
    size_t cnt;
    const void *id;
    size_t id_len;
-  struct msgstr_def *entry;
+  struct hashtable_entry *entry;
    struct string_desc sd;
  
    /* Fill the structure describing the header.  */
@@ -769,7 +869,12 @@ write_table (output_file, tab)
         ++cnt)
      {
        msg_arr[cnt].id = (char *) id;
+      msg_arr[cnt].id_len = id_len;
+      msg_arr[cnt].id_plural = entry->msgid_plural;
+      msg_arr[cnt].id_plural_len =
+       (entry->msgid_plural != NULL ? strlen (entry->msgid_plural) + 1 : 0);
        msg_arr[cnt].str = entry->msgstr;
+      msg_arr[cnt].str_len = entry->msgstr_len;
      }
  
    /* Sort the table according to original string.  */
@@ -785,7 +890,8 @@ write_table (output_file, tab)
    /* Write out length and starting offset for all original strings.  */
    for (cnt = 0; cnt < tab->filled; ++cnt)
      {
-      sd.length = strlen (msg_arr[cnt].id);
+      /* Subtract 1 because of the terminating NUL.  */
+      sd.length = msg_arr[cnt].id_len + msg_arr[cnt].id_plural_len - 1;
        fwrite (&sd, sizeof (sd), 1, output_file);
        sd.offset += roundup (sd.length + 1, alignment);
      }
@@ -793,7 +899,8 @@ write_table (output_file, tab)
    /* Write out length and starting offset for all translation strings.  */
    for (cnt = 0; cnt < tab->filled; ++cnt)
      {
-      sd.length = strlen (msg_arr[cnt].str);
+      /* Subtract 1 because of the terminating NUL.  */
+      sd.length = msg_arr[cnt].str_len - 1;
        fwrite (&sd, sizeof (sd), 1, output_file);
        sd.offset += roundup (sd.length + 1, alignment);
      }
@@ -840,19 +947,22 @@ write_table (output_file, tab)
    /* Now write the original strings.  */
    for (cnt = 0; cnt < tab->filled; ++cnt)
      {
-      size_t len = strlen (msg_arr[cnt].id);
+      size_t len = msg_arr[cnt].id_len + msg_arr[cnt].id_plural_len;
  
-      fwrite (msg_arr[cnt].id, len + 1, 1, output_file);
-      fwrite (&null, 1, roundup (len + 1, alignment) - (len + 1), output_file);
+      fwrite (msg_arr[cnt].id, msg_arr[cnt].id_len, 1, output_file);
+      if (msg_arr[cnt].id_plural_len > 0)
+       fwrite (msg_arr[cnt].id_plural, msg_arr[cnt].id_plural_len, 1,
+               output_file);
+      fwrite (&null, 1, roundup (len, alignment) - len, output_file);
      }
  
    /* Now write the translation strings.  */
    for (cnt = 0; cnt < tab->filled; ++cnt)
      {
-      size_t len = strlen (msg_arr[cnt].str);
+      size_t len = msg_arr[cnt].str_len;
  
-      fwrite (msg_arr[cnt].str, len + 1, 1, output_file);
-      fwrite (&null, 1, roundup (len + 1, alignment) - (len + 1), output_file);
+      fwrite (msg_arr[cnt].str, len, 1, output_file);
+      fwrite (&null, 1, roundup (len, alignment) - len, output_file);
  
        free (msg_arr[cnt].str);
      }
@@ -863,39 +973,93 @@ write_table (output_file, tab)
  
  
  static void
-check_pair (msgid, msgid_pos, msgstr, msgstr_pos, is_format)
+check_pair (msgid, msgid_pos, msgid_plural, msgstr, msgstr_len, msgstr_pos,
+           is_format)
       const char *msgid;
       const lex_pos_ty *msgid_pos;
+     const char *msgid_plural;
       const char *msgstr;
+     size_t msgstr_len;
       const lex_pos_ty *msgstr_pos;
       int is_format;
  {
-  size_t msgid_len = strlen (msgid);
-  size_t msgstr_len = strlen (msgstr);
+  int has_newline;
+  unsigned int i;
+  const char *p;
    size_t nidfmts, nstrfmts;
  
    /* If the msgid string is empty we have the special entry reserved for
       information about the translation.  */
-  if (msgid_len == 0)
+  if (msgid[0] == '\0')
      return;
  
-  /* Test 1: check whether both or none of the strings begin with a '\n'.  */
-  if (((msgid[0] == '\n') ^ (msgstr[0] == '\n')) != 0)
+  /* Test 1: check whether all or none of the strings begin with a '\n'.  */
+  has_newline = (msgid[0] == '\n');
+#define TEST_NEWLINE(p) (p[0] == '\n')
+  if (msgid_plural != NULL)
+    {
+      if (TEST_NEWLINE(msgid_plural) != has_newline)
+       {
+         error_at_line (0, 0, msgid_pos->file_name, msgid_pos->line_number,
+                        _("\
+`msgid' and `msgid_plural' entries do not both begin with '\\n'"));
+         exit_status = EXIT_FAILURE;
+       }
+      for (p = msgstr, i = 0; p < msgstr + msgstr_len; p += strlen (p) + 1, i++)
+       if (TEST_NEWLINE(p) != has_newline)
+         {
+           error_at_line (0, 0, msgid_pos->file_name, msgid_pos->line_number,
+                          _("\
+`msgid' and `msgstr[%u]' entries do not both begin with '\\n'"), i);
+           exit_status = EXIT_FAILURE;
+         }
+    }
+  else
      {
-      error_at_line (0, 0, msgid_pos->file_name, msgid_pos->line_number, _("\
+      if (TEST_NEWLINE(msgstr) != has_newline)
+       {
+         error_at_line (0, 0, msgid_pos->file_name, msgid_pos->line_number,
+                        _("\
  `msgid' and `msgstr' entries do not both begin with '\\n'"));
-      exit_status = EXIT_FAILURE;
+         exit_status = EXIT_FAILURE;
+       }
      }
+#undef TEST_NEWLINE
  
-  /* Test 2: check whether both or none of the strings end with a '\n'.  */
-  if (((msgid[msgid_len - 1] == '\n') ^ (msgstr[msgstr_len - 1] == '\n')) != 0)
+  /* Test 2: check whether all or none of the strings end with a '\n'.  */
+  has_newline = (msgid[strlen (msgid) - 1] == '\n');
+#define TEST_NEWLINE(p) (p[0] != '\0' && p[strlen (p) - 1] == '\n')
+  if (msgid_plural != NULL)
      {
-      error_at_line (0, 0, msgid_pos->file_name, msgid_pos->line_number, _("\
+      if (TEST_NEWLINE(msgid_plural) != has_newline)
+       {
+         error_at_line (0, 0, msgid_pos->file_name, msgid_pos->line_number,
+                        _("\
+`msgid' and `msgid_plural' entries do not both end with '\\n'"));
+         exit_status = EXIT_FAILURE;
+       }
+      for (p = msgstr, i = 0; p < msgstr + msgstr_len; p += strlen (p) + 1, i++)
+       if (TEST_NEWLINE(p) != has_newline)
+         {
+           error_at_line (0, 0, msgid_pos->file_name, msgid_pos->line_number,
+                          _("\
+`msgid' and `msgstr[%u]' entries do not both end with '\\n'"), i);
+           exit_status = EXIT_FAILURE;
+         }
+    }
+  else
+    {
+      if (TEST_NEWLINE(msgstr) != has_newline)
+       {
+         error_at_line (0, 0, msgid_pos->file_name, msgid_pos->line_number,
+                        _("\
  `msgid' and `msgstr' entries do not both end with '\\n'"));
-      exit_status = EXIT_FAILURE;
+         exit_status = EXIT_FAILURE;
+       }
      }
+#undef TEST_NEWLINE
  
-  if (is_format != 0)
+  if (is_format != 0 && msgid_plural == NULL)
      {
        /* Test 3: check whether both formats strings contain the same
          number of format specifications.  */
diff --git a/src/msgmerge.c b/src/msgmerge.c

index 78b7cded7df830b4dd000ce2b9d4d7c1406fe786..4f38857f9c362ad60e0b8d5ca2fce683b5164048 100644 (file)
--- a/src/msgmerge.c
+++ b/src/msgmerge.c
@@ -132,7 +132,9 @@ static void merge_destructor PARAMS ((po_ty *__that));
  static void merge_directive_domain PARAMS ((po_ty *__that, char *__name));
  static void merge_directive_message PARAMS ((po_ty *__that, char *__msgid,
                                              lex_pos_ty *__msgid_pos,
+                                            char *__msgid_plural,
                                              char *__msgstr,
+                                            size_t __msgstr_len,
                                              lex_pos_ty *__msgstr_pos));
  static void merge_parse_brief PARAMS ((po_ty *__that));
  static void merge_parse_debrief PARAMS ((po_ty *__that));
@@ -449,11 +451,14 @@ merge_directive_domain (that, name)
  
  
  static void
-merge_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
+merge_directive_message (that, msgid, msgid_pos, msgid_plural,
+                        msgstr, msgstr_len, msgstr_pos)
       po_ty *that;
       char *msgid;
       lex_pos_ty *msgid_pos;
+     char *msgid_plural;
       char *msgstr;
+     size_t msgstr_len;
       lex_pos_ty *msgstr_pos;
  {
    merge_class_ty *this = (merge_class_ty *) that;
@@ -470,7 +475,7 @@ merge_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
      free (msgid);
    else
      {
-      mp = message_alloc (msgid);
+      mp = message_alloc (msgid, msgid_plural);
        message_list_append (this->mlp, mp);
      }
  
@@ -520,7 +525,7 @@ merge_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
        free (msgstr);
      }
    else
-    message_variant_append (mp, this->domain, msgstr, msgstr_pos);
+    message_variant_append (mp, this->domain, msgstr, msgstr_len, msgstr_pos);
  }
  
  
diff --git a/src/msgunfmt.c b/src/msgunfmt.c

index 758cbedf44c99351abac0ae4584f3ae4bfd81631..48f41a5ae29bbbe756e845b6a8641f4bc3b4272b 100644 (file)
--- a/src/msgunfmt.c
+++ b/src/msgunfmt.c
@@ -1,5 +1,5 @@
  /* msgunfmt - converts binary .mo files to Uniforum style .po files
-   Copyright (C) 1995, 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
+   Copyright (C) 1995-1998, 2000, 2001 Free Software Foundation, Inc.
     Written by Ulrich Drepper <drepper@gnu.ai.mit.edu>, April 1995.
  
     This program is free software; you can redistribute it and/or modify
@@ -86,7 +86,8 @@ static void usage PARAMS ((int __status));
  static void error_print PARAMS ((void));
  static nls_uint32 read32 PARAMS ((FILE *__fp, const char *__fn));
  static void seek32 PARAMS ((FILE *__fp, const char *__fn, long __offset));
-static char *string32 PARAMS ((FILE *__fp, const char *__fn, long __offset));
+static char *string32 PARAMS ((FILE *__fp, const char *__fn, long __offset,
+                              size_t *lengthp));
  static message_list_ty *read_mo_file PARAMS ((message_list_ty *__mlp,
                                               const char *__fn));
  
@@ -300,10 +301,11 @@ seek32 (fp, fn, offset)
  
  
  static char *
-string32 (fp, fn, offset)
+string32 (fp, fn, offset, lengthp)
       FILE *fp;
       const char *fn;
       long offset;
+     size_t *lengthp;
  {
    long length;
    char *buffer;
@@ -322,16 +324,21 @@ string32 (fp, fn, offset)
    /* Read in the string.  Complain if there is an error or it comes up
       short.  Add the NUL ourselves.  */
    seek32 (fp, fn, offset);
-  n = fread (buffer, 1, length, fp);
-  if (n != length)
+  n = fread (buffer, 1, length + 1, fp);
+  if (n != length + 1)
      {
        if (ferror (fp))
         error (EXIT_FAILURE, errno, _("error while reading \"%s\""), fn);
        error (EXIT_FAILURE, 0, _("file \"%s\" truncated"), fn);
      }
-  buffer[length] = '\0';
+  if (buffer[length] != '\0')
+    {
+      error (EXIT_FAILURE, 0,
+            _("file \"%s\" contains a not NUL terminated string"), fn);
+    }
  
    /* Return the string to the caller.  */
+  *lengthp = length + 1;
    return buffer;
  }
  
@@ -391,16 +398,22 @@ read_mo_file (mlp, fn)
        static lex_pos_ty pos = { __FILE__, __LINE__ };
        message_ty *mp;
        char *msgid;
+      size_t msgid_len;
        char *msgstr;
+      size_t msgstr_len;
  
        /* Read the msgid.  */
-      msgid = string32 (fp, fn, header.orig_tab_offset + j * 8);
+      msgid = string32 (fp, fn, header.orig_tab_offset + j * 8, &msgid_len);
  
        /* Read the msgstr.  */
-      msgstr = string32 (fp, fn, header.trans_tab_offset + j * 8);
-
-      mp = message_alloc (msgid);
-      message_variant_append (mp, MESSAGE_DOMAIN_DEFAULT, msgstr, &pos);
+      msgstr = string32 (fp, fn, header.trans_tab_offset + j * 8, &msgstr_len);
+
+      mp = message_alloc (msgid,
+                         (strlen (msgid) + 1 < msgid_len
+                          ? msgid + strlen (msgid) + 1
+                          : NULL));
+      message_variant_append (mp, MESSAGE_DOMAIN_DEFAULT, msgstr, msgstr_len,
+                             &pos);
        message_list_append (mlp, mp);
      }
  
diff --git a/src/po-gram-gen.y b/src/po-gram-gen.y

index 9153139d20194a1fbd9b8db275403309672f1307..edbfb5a959130e4960cda354e769a13863f4894f 100644 (file)
--- a/src/po-gram-gen.y
+++ b/src/po-gram-gen.y
@@ -1,5 +1,5 @@
  /* GNU gettext - internationalization aids
-   Copyright (C) 1995, 1996, 1998, 2000 Free Software Foundation, Inc.
+   Copyright (C) 1995, 1996, 1998, 2000, 2001 Free Software Foundation, Inc.
  
     This file was written by Peter Miller <pmiller@agso.gov.au>
  
@@ -44,6 +44,7 @@ Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
  #define yymaxdepth po_gram_maxdepth
  #define yyparse po_gram_parse
  #define yylex   po_gram_lex
+#define yyerror po_gram_error
  #define yylval  po_gram_lval
  #define yychar  po_gram_char
  #define yydebug po_gram_debug
@@ -78,14 +79,18 @@ Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
  #define yygindex po_gram_yygindex
  #define yytable  po_gram_yytable
  #define yycheck  po_gram_yycheck
+
+static long plural_counter;
  %}
  
  %token COMMENT
  %token DOMAIN
  %token JUNK
  %token MSGID
+%token MSGID_PLURAL
  %token MSGSTR
  %token NAME
+%token '[' ']'
  %token NUMBER
  %token STRING
  
@@ -94,11 +99,13 @@ Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
    char *string;
    long number;
    lex_pos_ty pos;
+  struct msgstr_def rhs;
  }
  
-%type <string> STRING COMMENT string_list
+%type <string> STRING COMMENT string_list msgid_pluralform
  %type <number> NUMBER
  %type <pos> msgid msgstr
+%type <rhs> pluralform pluralform_list
  
  %right MSGSTR
  
@@ -122,7 +129,25 @@ domain
  message
         : msgid string_list msgstr string_list
                 {
-                 po_callback_message ($2, &$1, $4, &$3);
+                 po_callback_message ($2, &$1, NULL,
+                                      $4, strlen ($4) + 1, &$3);
+               }
+       | msgid string_list msgid_pluralform pluralform_list
+               {
+                 po_callback_message ($2, &$1, $3,
+                                      $4.msgstr, $4.msgstr_len, &$4.pos);
+               }
+       | msgid string_list msgid_pluralform
+               {
+                 po_gram_error_at_line (&$1, _("missing `msgstr[]' section"));
+                 free ($2);
+                 free ($3);
+               }
+       | msgid string_list pluralform_list
+               {
+                 po_gram_error_at_line (&$1, _("missing `msgid_plural' section"));
+                 free ($2);
+                 free ($3.msgstr);
                 }
         | msgid string_list
                 {
@@ -131,6 +156,48 @@ message
                 }
         ;
  
+msgid_pluralform
+       : MSGID_PLURAL string_list
+               {
+                 plural_counter = 0;
+                 $$ = $2;
+               }
+       ;
+
+pluralform_list
+       : pluralform
+               {
+                 $$ = $1;
+               }
+       | pluralform_list pluralform
+               {
+                 $$.msgstr = (char *) xmalloc ($1.msgstr_len + $2.msgstr_len);
+                 memcpy ($$.msgstr, $1.msgstr, $1.msgstr_len);
+                 memcpy ($$.msgstr + $1.msgstr_len, $2.msgstr, $2.msgstr_len);
+                 $$.msgstr_len = $1.msgstr_len + $2.msgstr_len;
+                 $$.pos = $1.pos;
+                 free ($1.msgstr);
+                 free ($2.msgstr);
+               }
+       ;
+
+pluralform
+       : msgstr '[' NUMBER ']' string_list
+               {
+                 if ($3 != plural_counter)
+                   {
+                     if (plural_counter == 0)
+                       po_gram_error_at_line (&$1, _("first plural form has nonzero index"));
+                     else
+                       po_gram_error_at_line (&$1, _("plural form has wrong index"));
+                   }
+                 plural_counter++;
+                 $$.msgstr = $5;
+                 $$.msgstr_len = strlen ($5) + 1;
+                 $$.pos = $1;
+               }
+       ;
+
  msgid
         : MSGID
                 {
diff --git a/src/po-hash-gen.y b/src/po-hash-gen.y

index 7b29427c4901b0e48c7fc5048322b209b747e731..68ce63db3d34649e1a813e28f7e9371faf8bf7d5 100644 (file)
--- a/src/po-hash-gen.y
+++ b/src/po-hash-gen.y
@@ -1,5 +1,5 @@
  /* GNU gettext - internationalization aids
-   Copyright (C) 1995, 1996, 1998 Free Software Foundation, Inc.
+   Copyright (C) 1995, 1996, 1998, 2001 Free Software Foundation, Inc.
  
     This file was written by Peter Miller <pmiller@agso.gov.au>
  
@@ -40,6 +40,7 @@ Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */
  #define yymaxdepth po_hash_maxdepth
  #define yyparse po_hash_parse
  #define yylex   po_hash_lex
+#define yyerror po_hash_error
  #define yylval  po_hash_lval
  #define yychar  po_hash_char
  #define yydebug po_hash_debug
diff --git a/src/po-lex.c b/src/po-lex.c

index 5e97946890a38d756a47c242b60015decaa04320..a3474b9eb529aca431536374719f671b4ff94753 100644 (file)
--- a/src/po-lex.c
+++ b/src/po-lex.c
@@ -66,7 +66,7 @@ static int pass_obsolete_entries = 0;
  /* Prototypes for local functions.  */
  static int lex_getc PARAMS ((void));
  static void lex_ungetc PARAMS ((int __ch));
-static int keyword_p PARAMS ((char *__s));
+static int keyword_p PARAMS ((const char *__s));
  static int control_sequence PARAMS ((void));
  
  
@@ -239,12 +239,14 @@ lex_ungetc (c)
  
  static int
  keyword_p (s)
-     char *s;
+     const char *s;
  {
    if (!strcmp (s, "domain"))
      return DOMAIN;
    if (!strcmp (s, "msgid"))
      return MSGID;
+  if (!strcmp (s, "msgid_plural"))
+    return MSGID_PLURAL;
    if (!strcmp (s, "msgstr"))
      return MSGSTR;
    po_gram_error (_("keyword \"%s\" unknown"), s);
@@ -546,6 +548,12 @@ po_gram_lex ()
           po_gram_lval.number = atol (buf);
           return NUMBER;
  
+       case '[':
+         return '[';
+
+       case ']':
+         return ']';
+
         default:
           /* This will cause a syntax error.  */
           return JUNK;
diff --git a/src/po-lex.h b/src/po-lex.h

index 82ae7e856a6342a902a2da7b4a9d59dad456cba4..f9cede0d37965b160ba117900fd7b7fa607c00f8 100644 (file)
--- a/src/po-lex.h
+++ b/src/po-lex.h
@@ -1,5 +1,5 @@
  /* GNU gettext - internationalization aids
-   Copyright (C) 1995, 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
+   Copyright (C) 1995-1998, 2000, 2001 Free Software Foundation, Inc.
  
     This file was written by Peter Miller <millerp@canb.auug.org.au>
  
@@ -122,4 +122,12 @@ extern void po_gram_error_at_line PARAMS ((const lex_pos_ty *__pos,
  #endif
  
  
+/* Contains information about the definition of one translation.  */
+struct msgstr_def
+{
+  char *msgstr;
+  size_t msgstr_len;
+  lex_pos_ty pos;
+};
+
  #endif
diff --git a/src/po.c b/src/po.c

index 2f959740cf2e32d6c1d4d2192cfde3a2002657b7..a74fda97ab4d3c98a8ce533998fe67f9ed7dea3b 100644 (file)
--- a/src/po.c
+++ b/src/po.c
@@ -41,7 +41,8 @@ static void po_parse_debrief PARAMS ((po_ty *__pop));
  static void po_directive_domain PARAMS ((po_ty *__pop, char *__name));
  static void po_directive_message PARAMS ((po_ty *__pop, char *__msgid,
                                           lex_pos_ty *__msgid_pos,
-                                         char *__msgstr,
+                                         char *__msgid_plural,
+                                         char *__msgstr, size_t __msgstr_len,
                                           lex_pos_ty *__msgstr_pos));
  static void po_comment PARAMS ((po_ty *__pop, const char *__s));
  static void po_comment_dot PARAMS ((po_ty *__pop, const char *__s));
@@ -138,27 +139,35 @@ po_callback_domain (name)
  
  
  static void
-po_directive_message (pop, msgid, msgid_pos, msgstr, msgstr_pos)
+po_directive_message (pop, msgid, msgid_pos, msgid_plural,
+                     msgstr, msgstr_len, msgstr_pos)
       po_ty *pop;
       char *msgid;
       lex_pos_ty *msgid_pos;
+     char *msgid_plural;
       char *msgstr;
+     size_t msgstr_len;
       lex_pos_ty *msgstr_pos;
  {
    if (pop->method->directive_message)
-    pop->method->directive_message (pop, msgid, msgid_pos, msgstr, msgstr_pos);
+    pop->method->directive_message (pop, msgid, msgid_pos, msgid_plural,
+                                   msgstr, msgstr_len, msgstr_pos);
  }
  
  
  void
-po_callback_message (msgid, msgid_pos, msgstr, msgstr_pos)
+po_callback_message (msgid, msgid_pos, msgid_plural,
+                    msgstr, msgstr_len, msgstr_pos)
       char *msgid;
       lex_pos_ty *msgid_pos;
+     char *msgid_plural;
       char *msgstr;
+     size_t msgstr_len;
       lex_pos_ty *msgstr_pos;
  {
    /* assert(callback_arg); */
-  po_directive_message (callback_arg, msgid, msgid_pos, msgstr, msgstr_pos);
+  po_directive_message (callback_arg, msgid, msgid_pos, msgid_plural,
+                       msgstr, msgstr_len, msgstr_pos);
  }
  
  
diff --git a/src/po.h b/src/po.h

index 45e742c31277708a5d2d290752da56158288e664..0b77f8e85bdb170838a3531e5cc5f2de9c6b498c 100644 (file)
--- a/src/po.h
+++ b/src/po.h
@@ -1,5 +1,5 @@
  /* GNU gettext - internationalization aids
-   Copyright (C) 1995, 1996, 1998 Free Software Foundation, Inc.
+   Copyright (C) 1995, 1996, 1998, 2000, 2001 Free Software Foundation, Inc.
  
     This file was written by Peter Miller <millerp@canb.auug.org.au>
  
@@ -55,8 +55,10 @@ struct po_method_ty
    void (*directive_domain) PARAMS ((struct po_ty *__pop, char *__name));
  
    /* what to do with a message directive */
-  void (*directive_message) PARAMS ((struct po_ty *__pop, char *__msgid,
-                                    lex_pos_ty *__msgid_pos, char *__msgstr,
+  void (*directive_message) PARAMS ((struct po_ty *__pop,
+                                    char *__msgid, lex_pos_ty *__msgid_pos,
+                                    char *__msgid_plural,
+                                    char *__msgstr, size_t __msgstr_len,
                                      lex_pos_ty *__msgstr_pos));
  
    /* This method is invoked before the parse, but after the file is
@@ -130,7 +132,8 @@ extern void po_free PARAMS ((po_ty *__pop));
  extern void po_callback_domain PARAMS ((char *__name));
  extern void po_callback_message PARAMS ((char *__msgid,
                                          lex_pos_ty *__msgid_pos,
-                                        char *__msgstr,
+                                        char *__msgid_plural,
+                                        char *__msgstr, size_t __msgstr_len,
                                          lex_pos_ty *__msgstr_pos));
  extern void po_callback_comment PARAMS ((const char *__s));
  extern void po_callback_comment_dot PARAMS ((const char *__s));
diff --git a/src/xget-lex.c b/src/xget-lex.c

index 9be4aa18eeda1ec89159dba256fd397d39661267..b86fbd041239dd3c00ea3bc72f2b4a5123bf477b 100644 (file)
--- a/src/xget-lex.c
+++ b/src/xget-lex.c
@@ -1204,6 +1204,9 @@ xgettext_lex (tp)
               xgettext_lex_keyword ("gettext");
               xgettext_lex_keyword ("dgettext:2");
               xgettext_lex_keyword ("dcgettext:2");
+             xgettext_lex_keyword ("ngettext:1,2");
+             xgettext_lex_keyword ("dngettext:2,3");
+             xgettext_lex_keyword ("dcngettext:2,3");
               xgettext_lex_keyword ("gettext_noop");
               default_keywords = 0;
             }
@@ -1213,7 +1216,8 @@ xgettext_lex (tp)
               == 0)
             {
               tp->type = xgettext_token_type_keyword;
-             tp->argnum = (int) (long) keyword_value;
+             tp->argnum1 = (int) (long) keyword_value & ((1 << 10) - 1);
+             tp->argnum2 = (int) (long) keyword_value >> 10;
               tp->line_number = token.line_number;
               tp->file_name = logical_file_name;
             }
@@ -1267,9 +1271,10 @@ xgettext_lex_keyword (name)
      default_keywords = 0;
    else
      {
-      int argnum;
+      int argnum1;
+      int argnum2;
        size_t len;
-      const char *sp;
+      char *sp;
  
        if (keywords.table == NULL)
         init_hash (&keywords, 100);
@@ -1287,16 +1292,26 @@ xgettext_lex_keyword (name)
           name_copy[len] = '\0';
           name = name_copy;
  
-         argnum = atoi (sp + 1);
+         sp++;
+         argnum1 = strtol (sp, &sp, 10);
+         if (*sp == ',')
+           {
+             sp++;
+             argnum2 = strtol (sp, &sp, 10);
+           }
+         else
+           argnum2 = 0;
         }
        else
         {
           len = strlen (name);
  
-         argnum = 1;
+         argnum1 = 1;
+         argnum2 = 0;
         }
  
-      insert_entry (&keywords, name, len + 1, (void *) (long) argnum);
+      insert_entry (&keywords, name, len + 1,
+                   (void *) (long) (argnum1 + (argnum2 << 10)));
      }
  }
  
diff --git a/src/xget-lex.h b/src/xget-lex.h

index 086c4144289751ecc7761ddd68f0134f6471331b..660ebb70a6bca1b8aa62fc3a61c1a2fbf5532c87 100644 (file)
--- a/src/xget-lex.h
+++ b/src/xget-lex.h
@@ -1,5 +1,5 @@
  /* GNU gettext - internationalization aids
-   Copyright (C) 1995, 1996, 1998, 2000 Free Software Foundation, Inc.
+   Copyright (C) 1995, 1996, 1998, 2000, 2001 Free Software Foundation, Inc.
  
     This file was written by Peter Miller <millerp@canb.auug.org.au>
  
@@ -37,8 +37,9 @@ struct xgettext_token_ty
  {
    xgettext_token_type_ty type;
  
-  /* This field is used only for xgettext_token_type_keyword.  */
-  int argnum;
+  /* These fields are used only for xgettext_token_type_keyword.  */
+  int argnum1;
+  int argnum2;
  
    /* This field is used only for xgettext_token_type_string_literal.  */
    char *string;
diff --git a/src/xgettext.c b/src/xgettext.c

index 884ecb7824cdcf72f0cc80c7edab3805317af975..0ccabd44acbc6744b3f0d4c9953930f4f5892223 100644 (file)
--- a/src/xgettext.c
+++ b/src/xgettext.c
@@ -1,5 +1,5 @@
  /* Extracts strings from C source file to Uniforum style .po file.
-   Copyright (C) 1995, 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
+   Copyright (C) 1995-1998, 2000, 2001 Free Software Foundation, Inc.
     Written by Ulrich Drepper <drepper@gnu.ai.mit.edu>, April 1995.
  
     This program is free software; you can redistribute it and/or modify
@@ -164,18 +164,24 @@ static string_list_ty *read_name_from_file PARAMS ((const char *__file_name));
  static void exclude_directive_domain PARAMS ((po_ty *__pop, char *__name));
  static void exclude_directive_message PARAMS ((po_ty *__pop, char *__msgid,
                                                lex_pos_ty *__msgid_pos,
+                                              char *__msgid_plural,
                                                char *__msgstr,
+                                              size_t __msgstr_len,
                                                lex_pos_ty *__msgstr_pos));
  static void read_exclusion_file PARAMS ((char *__file_name));
-static void remember_a_message PARAMS ((message_list_ty *__mlp,
-                                       xgettext_token_ty *__tp));
+static message_ty *remember_a_message PARAMS ((message_list_ty *__mlp,
+                                              xgettext_token_ty *__tp));
+static void remember_a_message_plural PARAMS ((message_ty *__mp,
+                                              xgettext_token_ty *__tp));
  static void scan_c_file PARAMS ((const char *__file_name,
                                  message_list_ty *__mlp));
  static void extract_constructor PARAMS ((po_ty *__that));
  static void extract_directive_domain PARAMS ((po_ty *__that, char *__name));
  static void extract_directive_message PARAMS ((po_ty *__that, char *__msgid,
                                                lex_pos_ty *__msgid_pos,
+                                              char *__msgid_plural,
                                                char *__msgstr,
+                                              size_t __msgstr_len,
                                                lex_pos_ty *__msgstr_pos));
  static void extract_parse_brief PARAMS ((po_ty *__that));
  static void extract_comment PARAMS ((po_ty *__that, const char *__s));
@@ -666,11 +672,14 @@ exclude_directive_domain (pop, name)
  
  
  static void
-exclude_directive_message (pop, msgid, msgid_pos, msgstr, msgstr_pos)
+exclude_directive_message (pop, msgid, msgid_pos, msgid_plural,
+                          msgstr, msgstr_len, msgstr_pos)
       po_ty *pop;
       char *msgid;
       lex_pos_ty *msgid_pos;
+     char *msgid_plural;
       char *msgstr;
+     size_t msgstr_len;
       lex_pos_ty *msgstr_pos;
  {
    message_ty *mp;
@@ -683,7 +692,7 @@ exclude_directive_message (pop, msgid, msgid_pos, msgstr, msgstr_pos)
      free (msgid);
    else
      {
-      mp = message_alloc (msgid);
+      mp = message_alloc (msgid, msgid_plural);
        /* Do not free msgid.  */
        message_list_append (exclude, mp);
      }
@@ -728,7 +737,7 @@ read_exclusion_file (file_name)
  }
  
  
-static void
+static message_ty *
  remember_a_message (mlp, tp)
       message_list_ty *mlp;
       xgettext_token_ty *tp;
@@ -748,7 +757,7 @@ remember_a_message (mlp, tp)
          message gets the correct comments.  */
        xgettext_lex_comment_reset ();
  
-      return;
+      return NULL;
      }
  
    /* See if we have seen this message before.  */
@@ -764,7 +773,7 @@ remember_a_message (mlp, tp)
        static lex_pos_ty pos = { __FILE__, __LINE__ };
  
        /* Allocate a new message and append the message to the list.  */
-      mp = message_alloc (msgid);
+      mp = message_alloc (msgid, NULL);
        /* Do not free msgid.  */
        message_list_append (mlp, mp);
  
@@ -774,13 +783,14 @@ remember_a_message (mlp, tp)
         {
           msgstr = (char *) xmalloc (strlen (msgstr_prefix)
                                      + strlen (msgid)
-                                    + strlen(msgstr_suffix) + 1);
+                                    + strlen (msgstr_suffix) + 1);
           stpcpy (stpcpy (stpcpy (msgstr, msgstr_prefix), msgid),
                   msgstr_suffix);
         }
        else
         msgstr = "";
-      message_variant_append (mp, MESSAGE_DOMAIN_DEFAULT, msgstr, &pos);
+      message_variant_append (mp, MESSAGE_DOMAIN_DEFAULT, msgstr,
+                             strlen (msgstr) + 1, &pos);
      }
  
    /* Ask the lexer for the comments it has seen.  Only do this for the
@@ -829,6 +839,55 @@ remember_a_message (mlp, tp)
    /* Tell the lexer to reset its comment buffer, so that the next
       message gets the correct comments.  */
    xgettext_lex_comment_reset ();
+
+  return mp;
+}
+
+
+static void
+remember_a_message_plural (mp, tp)
+     message_ty *mp;
+     xgettext_token_ty *tp;
+{
+  char *msgid_plural;
+  message_variant_ty *mvp;
+  char *msgstr1;
+  size_t msgstr1_len;
+  char *msgstr;
+
+  msgid_plural = tp->string;
+
+  /* See if the message is already a plural message.  */
+  if (mp->msgid_plural == NULL)
+    {
+      mp->msgid_plural = msgid_plural;
+
+      /* Construct the first plural form from the prefix and suffix,
+        otherwise use the empty string.  The translator will have to
+        provide additional plural forms.  */
+      mvp = message_variant_search (mp, MESSAGE_DOMAIN_DEFAULT);
+      if (mvp != NULL)
+       {
+         if (msgstr_prefix)
+           {
+             msgstr1 = (char *) xmalloc (strlen (msgstr_prefix)
+                                         + strlen (msgid_plural)
+                                         + strlen (msgstr_suffix) + 1);
+             stpcpy (stpcpy (stpcpy (msgstr1, msgstr_prefix), msgid_plural),
+                     msgstr_suffix);
+           }
+         else
+           msgstr1 = "";
+         msgstr1_len = strlen (msgstr1) + 1;
+         msgstr = (char *) xmalloc (mvp->msgstr_len + msgstr1_len);
+         memcpy (msgstr, mvp->msgstr, mvp->msgstr_len);
+         memcpy (msgstr + mvp->msgstr_len, msgstr1, msgstr1_len);
+         mvp->msgstr = msgstr;
+         mvp->msgstr_len = mvp->msgstr_len + msgstr1_len;
+       }
+    }
+  else
+    free (msgid_plural);
  }
  
  
@@ -839,13 +898,26 @@ scan_c_file(filename, mlp)
  {
    int state;
    int commas_to_skip = 0;      /* defined only when in states 1 and 2 */
+  int plural_commas = 0;       /* defined only when in states 1 and 2 */
+  message_ty *plural_mp = NULL;        /* defined only when in states 1 and 2 */
    int paren_nesting = 0;       /* defined only when in state 2 */
  
    /* The file is broken into tokens.  Scan the token stream, looking for
       a keyword, followed by a left paren, followed by a string.  When we
       see this sequence, we have something to remember.  We assume we are
       looking at a valid C or C++ program, and leave the complaints about
-     the grammar to the compiler.  */
+     the grammar to the compiler.
+
+     Normal handling: Look for
+       [A] keyword [B] ( ... [C] ... msgid ... ) [E]
+     Plural handling: Look for
+       [A] keyword [B] ( ... [C] ... msgid ... [D] ... msgid_plural ... ) [E]
+     At point [A]: state == 0.
+     At point [B]: state == 1, commas_to_skip set, plural_mp == NULL.
+     At point [C]: state == 2, commas_to_skip set, plural_mp == NULL.
+     At point [D]: state == 2, commas_to_skip set again, plural_mp != NULL.
+     At point [E]: state == 0.  */
+
    xgettext_lex_open (filename);
  
    /* Start state is 0.  */
@@ -881,7 +953,10 @@ scan_c_file(filename, mlp)
                     _("%s:%d: warning: keyword between outer keyword and its arg"),
                     token.file_name, token.line_number);
            }
-        commas_to_skip = token.argnum - 1;
+        commas_to_skip = token.argnum1 - 1;
+        plural_commas = (token.argnum2 > token.argnum1
+                         ? token.argnum2 - token.argnum1 : 0);
+        plural_mp = NULL;
          state = 1;
          continue;
  
@@ -916,8 +991,29 @@ scan_c_file(filename, mlp)
          continue;
  
         case xgettext_token_type_string_literal:
-        if (extract_all || (state == 2 && commas_to_skip == 0))
+        if (extract_all)
            remember_a_message (mlp, &token);
+        else if (state == 2 && commas_to_skip == 0)
+          {
+            if (plural_mp == NULL)
+              {
+                /* Seen an msgid.  */
+                if (plural_commas == 0)
+                  remember_a_message (mlp, &token);
+                else
+                  {
+                    plural_mp = remember_a_message (mlp, &token);
+                    commas_to_skip = plural_commas;
+                    plural_commas = 0;
+                  }
+              }
+            else
+              {
+                /* Seen an msgid_plural.  */
+                remember_a_message_plural (plural_mp, &token);
+                plural_mp = NULL;
+              }
+          }
          else
            {
              free (token.string);
@@ -994,11 +1090,14 @@ extract_directive_domain (that, name)
  
  
  static void
-extract_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
+extract_directive_message (that, msgid, msgid_pos, msgid_plural,
+                          msgstr, msgstr_len, msgstr_pos)
       po_ty *that;
       char *msgid;
       lex_pos_ty *msgid_pos;
+     char *msgid_plural;
       char *msgstr;
+     size_t msgstr_len;
       lex_pos_ty *msgstr_pos;
  {
    extract_class_ty *this = (extract_class_ty *)that;
@@ -1039,7 +1138,7 @@ extract_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
      free (msgid);
    else
      {
-      mp = message_alloc (msgid);
+      mp = message_alloc (msgid, msgid_plural);
        message_list_append (this->mlp, mp);
      }
  
@@ -1080,7 +1179,9 @@ extract_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
  
    /* See if this domain has been seen for this message ID.  */
    mvp = message_variant_search (mp, MESSAGE_DOMAIN_DEFAULT);
-  if (mvp != NULL && strcmp (msgstr, mvp->msgstr) != 0)
+  if (mvp != NULL
+      && (msgstr_len != mvp->msgstr_len
+         || memcmp (msgstr, mvp->msgstr, msgstr_len) != 0))
      {
        po_gram_error_at_line (msgid_pos, _("duplicate message definition"));
        po_gram_error_at_line (&mvp->pos, _("\
@@ -1088,7 +1189,8 @@ extract_directive_message (that, msgid, msgid_pos, msgstr, msgstr_pos)
        free (msgstr);
      }
    else
-    message_variant_append (mp, MESSAGE_DOMAIN_DEFAULT, msgstr, msgstr_pos);
+    message_variant_append (mp, MESSAGE_DOMAIN_DEFAULT, msgstr, msgstr_len,
+                           msgstr_pos);
  }
  
  
@@ -1236,7 +1338,7 @@ construct_header ()
    char tz_sign;
    long tz_min;
  
-  mp = message_alloc ("");
+  mp = message_alloc ("", NULL);
  
    if (foreign_user)
      message_comment_append (mp, "\
@@ -1279,7 +1381,8 @@ Content-Transfer-Encoding: ENCODING\n",
    if (msgstr == NULL)
      error (EXIT_FAILURE, errno, _("while preparing output"));
  
-  message_variant_append (mp, MESSAGE_DOMAIN_DEFAULT, msgstr, &pos);
+  message_variant_append (mp, MESSAGE_DOMAIN_DEFAULT, msgstr,
+                         strlen (msgstr) + 1, &pos);
  
    return mp;
  }
diff --git a/tests/ChangeLog b/tests/ChangeLog

index 2310d0dcfe92d7b9aa19e3349f51af7ad4896343..c6ff0aaa87dcc43543235db6407bafb5ad78e820 100644 (file)
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,14 @@
+2001-01-01  Bruno Haible  <haible@clisp.cons.org>
+
+       Implement plural form handling.
+       * plural-1: New file.
+       * plural-1-prg.c: New file.
+       * Makefile.am (TESTS): Add plural-1.
+       (INCLUDES, EXTRA_PROGRAMS, cake_SOURCES, cake_LDADD, CLEANFILES): New
+       macros.
+       (all-local): New target.
+       * xg-test1.ok.po: Regenerated.
+
  1997-08-01 15:46  Ulrich Drepper  <drepper@cygnus.com>
  
         * Makefile.am (AUTOMAKE_OPTIONS): Require version 1.2.
diff --git a/tests/Makefile.am b/tests/Makefile.am

index 98e970a6c791df8dfb80a5fdd70c0600e97541ea..4c92482e24e57dc0aae0cbbd545c37af7edc57be 100644 (file)
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -1,5 +1,5 @@
  ## Makefile for the check subdirectory of the GNU NLS Utilities
-## Copyright (C) 1995, 1996, 1997 Free Software Foundation, Inc.
+## Copyright (C) 1995-1997, 2001 Free Software Foundation, Inc.
  ##
  ## This program is free software; you can redistribute it and/or modify
  ## it under the terms of the GNU General Public License as published by
@@ -22,7 +22,7 @@ AUTOMAKE_OPTIONS = 1.2 gnits
  TESTS = gettext-1 gettext-2 msgcmp-1 msgcmp-2 msgfmt-1 msgfmt-2 msgfmt-3 \
         msgfmt-4 msgmerge-1 msgmerge-2 msgmerge-3 msgmerge-4 msgmerge-5 \
         msgunfmt-1 xgettext-1 xgettext-2 xgettext-3 xgettext-4 xgettext-5 \
-       xgettext-6 xgettext-7 xgettext-8 xgettext-9
+       xgettext-6 xgettext-7 xgettext-8 xgettext-9 plural-1
  
  EXTRA_DIST = $(TESTS) test.mo xg-test1.ok.po
  
@@ -34,10 +34,19 @@ TESTS_ENVIRONMENT = top_srcdir=$(top_srcdir) PATH=../src:$$PATH \
                     MSGFMT=`echo msgfmt|sed '$(transform)'` \
                     MSGCMP=`echo msgcmp|sed '$(transform)'` \
                     MSGMERGE=`echo msgmerge|sed '$(transform)'` \
-                   MSGUNFMT=`echo msgunfmt|sed '$(transform)'` $(SHELL)
+                   MSGUNFMT=`echo msgunfmt|sed '$(transform)'` \
+                   $(SHELL)
  
  xg-test1.ok.po: $(top_srcdir)/src/xgettext.c $(top_srcdir)/src/msgfmt.c \
                 $(top_srcdir)/src/gettextp.c
         $(XGETTEXT) -d xg-test1.ok -p $(srcdir) -k_ --omit-header \
           $(top_srcdir)/src/xgettext.c $(top_srcdir)/src/msgfmt.c \
           $(top_srcdir)/src/gettextp.c
+
+# An auxiliary program used by the plural-1 test.
+INCLUDES = -I${top_srcdir}/intl
+EXTRA_PROGRAMS = cake
+cake_SOURCES = plural-1-prg.c
+cake_LDADD = ../intl/libintl.a
+all-local: cake
+CLEANFILES = cake
diff --git a/tests/plural-1 b/tests/plural-1

new file mode 100644 (file)

index 0000000..52a186e
--- /dev/null
+++ b/tests/plural-1
@@ -0,0 +1,84 @@
+#! /bin/sh
+
+tmpfiles=""
+trap 'rm -fr $tmpfiles' 1 2 3 15
+
+tmpfiles="$tmpfiles cake.pot"
+: ${XGETTEXT=xgettext}
+${XGETTEXT} -o cake.pot --omit-header ${top_srcdir}/tests/plural-1-prg.c
+
+tmpfiles="$tmpfiles cake.ok"
+cat <<EOF > cake.ok
+msgid "a piece of cake"
+msgid_plural "%d pieces of cake"
+msgstr[0] ""
+msgstr[1] ""
+EOF
+
+: ${DIFF=diff}
+${DIFF} cake.ok cake.pot || exit 1
+
+tmpfiles="$tmpfiles fr.po"
+cat <<EOF > fr.po
+# Les gateaux allemands sont les meilleurs du monde.
+msgid "a piece of cake"
+msgid_plural "%d pieces of cake"
+msgstr[0] "un morceau de gateau"
+msgstr[1] "%d morceaux de gateau"
+EOF
+
+tmpfiles="$tmpfiles fr.po.new"
+: ${MSGMERGE=msgmerge}
+${MSGMERGE} -q -o fr.po.new fr.po cake.pot
+
+: ${DIFF=diff}
+${DIFF} fr.po fr.po.new || exit 1
+
+tmpfiles="$tmpfiles fr"
+test -d fr || mkdir fr
+test -d fr/LC_MESSAGES || mkdir fr/LC_MESSAGES
+
+: ${MSGFMT=msgfmt}
+${MSGFMT} -o fr/LC_MESSAGES/cake.mo fr.po
+
+tmpfiles="$tmpfiles fr.po.tmp"
+: ${MSGUNFMT=msgunfmt}
+${MSGUNFMT} fr/LC_MESSAGES/cake.mo -o fr.po.tmp
+
+tmpfiles="$tmpfiles fr.po.strip"
+sed 1d < fr.po > fr.po.strip
+
+: ${DIFF=diff}
+${DIFF} fr.po.strip fr.po.tmp || exit 1
+
+LANGUAGE=fr
+LC_ALL=
+LC_MESSAGES=
+LANG=
+export LANGUAGE LC_ALL LC_MESSAGES LANG
+
+tmpfiles="$tmpfiles cake.ok cake.out"
+: ${DIFF=diff}
+echo 'un morceau de gateau' > cake.ok
+./cake 1 > cake.out || exit 1
+${DIFF} cake.ok cake.out || exit 1
+echo '2 morceaux de gateau' > cake.ok
+./cake 2 > cake.out || exit 1
+${DIFF} cake.ok cake.out || exit 1
+echo '10 morceaux de gateau' > cake.ok
+./cake 10 > cake.out || exit 1
+${DIFF} cake.ok cake.out || exit 1
+
+rm -fr $tmpfiles
+
+exit 0
+
+# Preserve executable bits for this shell script.
+# Thanks to Noah Friedman for this great trick.
+Local Variables:
+eval:(defun frobme () (set-file-modes buffer-file-name file-mode))
+eval:(make-local-variable 'file-mode)
+eval:(setq file-mode (file-modes (buffer-file-name)))
+eval:(make-local-variable 'after-save-hook)
+eval:(add-hook 'after-save-hook 'frobme)
+End:
diff --git a/tests/plural-1-prg.c b/tests/plural-1-prg.c

new file mode 100644 (file)

index 0000000..5bfbafe
--- /dev/null
+++ b/tests/plural-1-prg.c
@@ -0,0 +1,28 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Make sure we use the included libintl, not the system's one. */
+#if 0
+#include <libintl.h>
+#else
+#define ENABLE_NLS 1
+#include "libgettext.h"
+#undef textdomain
+#define textdomain textdomain__
+#undef bindtextdomain
+#define bindtextdomain bindtextdomain__
+#undef ngettext
+#define ngettext ngettext__
+#endif
+
+int main (argc, argv)
+  int argc;
+  char *argv[];
+{
+  int n = atoi (argv[1]);
+  textdomain ("cake");
+  bindtextdomain ("cake", ".");
+  printf (ngettext ("a piece of cake", "%d pieces of cake", n), n);
+  printf ("\n");
+  return 0;
+}
author	Bruno Haible <bruno@clisp.org>
	Fri, 12 Jan 2001 17:00:31 +0000 (17:00 +0000)
committer	Bruno Haible <bruno@clisp.org>
	Fri, 12 Jan 2001 17:00:31 +0000 (17:00 +0000)
Admin/plans		patch \| blob \| blame \| history
doc/ChangeLog		patch \| blob \| blame \| history
doc/gettext.texi		patch \| blob \| blame \| history
doc/version.texi		patch \| blob \| blame \| history
intl/ChangeLog		patch \| blob \| blame \| history
intl/dcigettext.c		patch \| blob \| blame \| history
intl/intl-compat.c		patch \| blob \| blame \| history
intl/intlh.inst.in		patch \| blob \| blame \| history
intl/libgettext.h		patch \| blob \| blame \| history
intl/loadinfo.h		patch \| blob \| blame \| history
intl/loadmsgcat.c		patch \| blob \| blame \| history
src/ChangeLog		patch \| blob \| blame \| history
src/message.c		patch \| blob \| blame \| history
src/message.h		patch \| blob \| blame \| history
src/msgcmp.c		patch \| blob \| blame \| history
src/msgcomm.c		patch \| blob \| blame \| history
src/msgfmt.c		patch \| blob \| blame \| history
src/msgmerge.c		patch \| blob \| blame \| history
src/msgunfmt.c		patch \| blob \| blame \| history
src/po-gram-gen.y		patch \| blob \| blame \| history
src/po-hash-gen.y		patch \| blob \| blame \| history
src/po-lex.c		patch \| blob \| blame \| history
src/po-lex.h		patch \| blob \| blame \| history
src/po.c		patch \| blob \| blame \| history
src/po.h		patch \| blob \| blame \| history
src/xget-lex.c		patch \| blob \| blame \| history
src/xget-lex.h		patch \| blob \| blame \| history
src/xgettext.c		patch \| blob \| blame \| history
tests/ChangeLog		patch \| blob \| blame \| history
tests/Makefile.am		patch \| blob \| blame \| history
tests/plural-1	[new file with mode: 0644]	patch \| blob
tests/plural-1-prg.c	[new file with mode: 0644]	patch \| blob