From: Bruno Haible
+
+
+
+This manual is still in DRAFT state. Some sections are still
+empty, or almost. We keep merging material from other sources
+(essentially e-mail folders) while the proper integration of this
+material is delayed.
+
+In this manual, we use he when speaking of the programmer or
+maintainer, she when speaking of the translator, and they
+when speaking of the installers or end users of the translated program.
+This is only a convenience for clarifying the documentation. It is
+absolutely not meant to imply that some roles are more appropriate
+to males or females. Besides, as you might guess, GNU
+This chapter explains the goals sought in the creation
+of GNU
+Please send suggestions and corrections to:
+
+
+Please include the manual's edition number and update date in your messages.
+
+
+Usually, programs are written and documented in English, and use
+English at execution time to interact with users. This is true
+not only of GNU software, but also of a great deal of commercial
+and free software. Using a common language is quite handy for
+communication between developers, maintainers and users from all
+countries. On the other hand, most people are less comfortable with
+English than with their own native language, and would prefer to
+use their mother tongue for day to day's work, as far as possible.
+Many would simply love to see their computer screen showing
+a lot less of English, and far more of their own language.
+
+
+However, to many people, this dream might appear so far fetched that
+they may believe it is not even worth spending time thinking about
+it. They have no confidence at all that the dream might ever
+become true. Yet some have not lost hope, and have organized themselves.
+The Translation Project is a formalization of this hope into a
+workable structure, which has a good chance to get all of us nearer
+the achievement of a truly multi-lingual set of programs.
+
+
+GNU
+GNU
+The Translation Project also uses the GNU
+Two long words appear all the time when we discuss support of native
+language in programs, and these words have a precise meaning, worth
+being explained here, once and for all in this document. The words are
+internationalization and localization. Many people,
+tired of writing these long words over and over again, took the
+habit of writing i18n and l10n instead, quoting the first
+and last letter of each word, and replacing the run of intermediate
+letters by a number merely telling how many such letters there are.
+But in this manual, in the sake of clarity, we will patiently write
+the names in full, each time...
+
+
+By internationalization, one refers to the operation by which a
+program, or a set of programs turned into a package, is made aware of and
+able to support multiple languages. This is a generalization process,
+by which the programs are untied from calling only English strings or
+other English specific habits, and connected to generic ways of doing
+the same, instead. Program developers may use various techniques to
+internationalize their programs. Some of these have been standardized.
+GNU
+By localization, one means the operation by which, in a set
+of programs already internationalized, one gives the program all
+needed information so that it can adapt itself to handle its input
+and output in a fashion which is correct for some native language and
+cultural habits. This is a particularisation process, by which generic
+methods already implemented in an internationalized program are used
+in specific ways. The programming environment puts several functions
+to the programmers disposal which allow this runtime configuration.
+The formal description of specific set of cultural habits for some
+country, together with all associated translations targeted to the
+same native language, is called the locale for this language
+or country. Users achieve localization of programs by setting proper
+values to special environment variables, prior to executing those
+programs, identifying which locale should be used.
+
+
+In fact, locale message support is only one component of the cultural
+data that makes up a particular locale. There are a whole host of
+routines and functions provided to aid programmers in developing
+internationalized software and which allow them to access the data
+stored in a particular locale. When someone presently refers to a
+particular locale, they are obviously referring to the data stored
+within that particular locale. Similarly, if a programmer is referring
+to "accessing the locale routines", they are referring to the
+complete suite of routines that access all of the locale's information.
+
+
+One uses the expression Native Language Support, or merely NLS,
+for speaking of the overall activity or feature encompassing both
+internationalization and localization, allowing for multi-lingual
+interactions in a program. In a nutshell, one could say that
+internationalization is the operation by which further localizations
+are made possible.
+
+
+Also, very roughly said, when it comes to multi-lingual messages,
+internationalization is usually taken care of by programmers, and
+localization is usually taken care of by translators.
+
+
+For a totally multi-lingual distribution, there are many things to
+translate beyond output messages.
+
+
+As we already stressed, translation is only one aspect of locales.
+Other internationalization aspects are system services and are handled
+in GNU
+There are a few major areas which may vary between countries and
+hence, define what a locale must describe. The following list helps
+putting multi-lingual messages into the proper context of other tasks
+related to locales. See the GNU
+Components of locale outside of message handling are standardized in
+the ISO C standard and the SUSV2 specification. GNU
+The letters PO in `.po' files means Portable Object, to
+distinguish it from `.mo' files, where MO stands for Machine
+Object. This paradigm, as well as the PO file format, is inspired
+by the NLS standard developed by Uniforum, and implemented by Sun
+in their Solaris system.
+
+
+PO files are meant to be read and edited by humans, and associate each
+original, translatable string of a given package with its translation
+in a particular target language. A single PO file is dedicated to
+a single target language. If a package supports many languages,
+there is one such PO file per language supported, and each package
+has its own set of PO files. These PO files are best created by
+the
+MO files are meant to be read by programs, and are binary in nature.
+A few systems already offer tools for creating and handling MO files
+as part of the Native Language Support coming with the system, but the
+format of these MO files is often different from system to system,
+and non-portable. The tools already provided with these systems don't
+support all the features of GNU
+The following diagram summarizes the relation between the files
+handled by GNU
+The indication `PO mode' appears in two places in this picture,
+and you may safely read it as merely meaning "hand editing", using
+any editor of your choice, really. However, for those of you being
+the lucky users of Emacs, PO mode has been specifically created
+for providing a cozy environment for editing or modifying PO files.
+While editing a PO file, PO mode allows for the easy browsing of
+auxiliary and compendium PO files, as well as for following references into
+the set of C program sources from which PO files have been derived.
+It has a few special features, among which are the interactive marking
+of program strings as translatable, and the validatation of PO files
+with easy repositioning to PO file lines showing errors.
+
+
+As a programmer, the first step to bringing GNU
+For newly written software the strings of course can and should be
+marked while writing it. The
+Doing this allows you to prepare the sources for internationalization.
+Later when you feel ready for the step to use the
+and link against `libintl.a' or `libintl.so'. Note that on
+GNU systems, you don't need to link with
+Once the C sources have been modified, the
+The first time through, there is no `lang.po' yet, so the
+
+Then comes the initial translation of messages. Translation in
+itself is a whole matter, still exclusively meant for humans,
+and whose complexity far overwhelms the level of this manual.
+Nevertheless, a few hints are given in some other chapter of this
+manual (see section 10 The Translator's View). You will also find there indications
+about how to contact translating teams, or becoming part of them,
+for sharing your translating concerns with others who target the same
+native language.
+
+
+While adding the translated messages into the `lang.pox'
+PO file, if you do not have Emacs handy, you are on your own
+for ensuring that your efforts fully respect the PO file format, and quoting
+conventions (see section 2.2 The Format of PO Files). This is surely not an impossible task,
+as this is the way many people have handled PO files already for Uniforum or
+Solaris. On the other hand, by using PO mode in Emacs, most details
+of PO file format are taken care of for you, but you have to acquire
+some familiarity with PO mode itself. Besides main PO mode commands
+(see section 2.3 Main PO mode Commands), you should know how to move between entries
+(see section 2.4 Entry Positioning), and how to handle untranslated entries
+(see section 6.4 Untranslated Entries).
+
+
+If some common translations have already been saved into a compendium
+PO file, translators may use PO mode for initializing untranslated
+entries from the compendium, and also save selected translations into
+the compendium, updating it (see section 6.11 Using Translation Compendiums). Compendium files
+are meant to be exchanged between members of a given translation team.
+
+
+Programs, or packages of programs, are dynamic in nature: users write
+bug reports and suggestion for improvements, maintainers react by
+modifying programs in various ways. The fact that a package has
+already been internationalized should not make maintainers shy
+of adding new strings, or modifying strings already translated.
+They just do their job the best they can. For the Translation
+Project to work smoothly, it is important that maintainers do not
+carry translation concerns on their already loaded shoulders, and that
+translators be kept as free as possible of programmatic concerns.
+
+
+The only concern maintainers should have is carefully marking new
+strings as translatable, when they should be, and do not otherwise
+worry about them being translated, as this will come in proper time.
+Consequently, when programs and their strings are adjusted in various
+ways by maintainers, and for matters usually unrelated to translation,
+
+It is important for translators (and even maintainers) to understand
+that package translation is a continuous process in the lifetime of a
+package, and not something which is done once and for all at the start.
+After an initial burst of translation activity for a given package,
+interventions are needed once in a while, because here and there,
+translated entries become obsolete, and new untranslated entries
+appear, needing translation.
+
+
+The
+Whatever route or means taken, the goal is to obtain an updated
+`lang.pox' file offering translations for all strings.
+When this is properly achieved, this file `lang.pox' may
+take the place of the previous official `lang.po' file.
+
+
+The temporal mobility, or fluidity of PO files, is an integral part of
+the translation game, and should be well understood, and accepted.
+People resisting it will have a hard time participating in the
+Translation Project, or will give a hard time to other participants! In
+particular, maintainers should relax and include all available official
+PO files in their distributions, even if these have not recently been
+updated, without banging or otherwise trying to exert pressure on the
+translator teams to get the job done. The pressure should rather come
+from the community of users speaking a particular language, and
+maintainers should consider themselves fairly relieved of any concern
+about the adequacy of translation files. On the other hand, translators
+should reasonably try updating the PO files they are responsible for,
+while the package is undergoing pretest, prior to an official
+distribution.
+
+
+Once the PO file is complete and dependable, the
+Finally, the modified and marked C sources are compiled and linked
+with the GNU
+The remainder of this manual has the purpose of explaining in depth the various
+steps outlined above.
+
+
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_10.html b/doc/gettext_10.html
new file mode 100644
index 000000000..11294417c
--- /dev/null
+++ b/doc/gettext_10.html
@@ -0,0 +1,509 @@
+
+
+
+
+
+Free software is going international! The Translation Project is a way
+to get maintainers, translators and users all together, so free software
+will gradually become able to speak many native languages.
+
+
+The GNU
+To achieve the Translation Project, we need many interested
+people who like their own language and write it well, and who are also
+able to synergize with other translators speaking the same language.
+If you'd like to volunteer to work at translating messages,
+please send mail to your translating team.
+
+
+Each team has its own mailing list, courtesy of Linux
+International. You may reach your translating team at the address
+`ll@li.org', replacing ll by the two-letter ISO 639
+code for your language. Language codes are not the same as
+country codes given in ISO 3166. The following translating teams
+exist:
+
+
+Chinese
+For example, you may reach the Chinese translating team by writing to
+`zh@li.org'. When you become a member of the translating team
+for your own language, you may subscribe to its list. For example,
+Swedish people can send a message to `sv-request@li.org',
+having this message body:
+
+
+Keep in mind that team members should be interested in working
+at translations, or at solving translational difficulties, rather than
+merely lurking around. If your team does not exist yet and you want to
+start one, please write to `translation@iro.umontreal.ca';
+you will then reach the coordinator for all translator teams.
+
+
+A handful of GNU packages have already been adapted and provided
+with message translations for several languages. Translation
+teams have begun to organize, using these packages as a starting
+point. But there are many more packages and many languages for
+which we have no volunteer translators. If you would like to
+volunteer to work at translating messages, please send mail to
+`translation@iro.umontreal.ca' indicating what language(s)
+you can work on.
+
+
+This is now official, GNU is going international! Here is the
+announcement submitted for the January 1995 GNU Bulletin:
+
+
+A handful of GNU packages have already been adapted and provided
+with message translations for several languages. Translation
+teams have begun to organize, using these packages as a starting
+point. But there are many more packages and many languages
+for which we have no volunteer translators. If you'd like to
+volunteer to work at translating messages, please send mail to
+`translation@iro.umontreal.ca' indicating what language(s)
+you can work on.
+
+This document should answer many questions for those who are curious about
+the process or would like to contribute. Please at least skim over it,
+hoping to cut down a little of the high volume of e-mail generated by this
+collective effort towards internationalization of free software.
+
+
+Most free programming which is widely shared is done in English, and
+currently, English is used as the main communicating language between
+national communities collaborating to free software. This very document
+is written in English. This will not change in the foreseeable future.
+
+
+However, there is a strong appetite from national communities for
+having more software able to write using national language and habits,
+and there is an on-going effort to modify free software in such a way
+that it becomes able to do so. The experiments driven so far raised
+an enthusiastic response from pretesters, so we believe that
+internationalization of free software is dedicated to succeed.
+
+
+For suggestion clarifications, additions or corrections to this
+document, please e-mail to `translation@iro.umontreal.ca'.
+
+
+Facing this internationalization effort, a few users expressed their
+concerns. Some of these doubts are presented and discussed, here.
+
+
+On a larger scale, the true solution would be to organize some kind of
+fairly precise set up in which volunteers could participate. I gave
+some thought to this idea lately, and realize there will be some
+touchy points. I thought of writing to Richard Stallman to launch
+such a project, but feel it might be good to shake out the ideas
+between ourselves first. Most probably that Linux International has
+some experience in the field already, or would like to orchestrate
+the volunteer work, maybe. Food for thought, in any case!
+
+
+I guess we have to setup something early, somehow, that will help
+many possible contributors of the same language to interlock and avoid
+work duplication, and further be put in contact for solving together
+problems particular to their tongue (in most languages, there are many
+difficulties peculiar to translating technical English). My Swedish
+contributor acknowledged these difficulties, and I'm well aware of
+them for French.
+
+
+This is surely not a technical issue, but we should manage so the
+effort of locale contributors be maximally useful, despite the national
+team layer interface between contributors and maintainers.
+
+
+The Translation Project needs some setup for coordinating language
+coordinators. Localizing evolving programs will surely
+become a permanent and continuous activity in the free software community,
+once well started.
+The setup should be minimally completed and tested before GNU
+
+I also think GNU will need sooner than it thinks, that someone setup
+a way to organize and coordinate these groups. Some kind of group
+of groups. My opinion is that it would be good that GNU delegates
+this task to a small group of collaborating volunteers, shortly.
+Perhaps in `gnu.announce' a list of this national committee's
+can be published.
+
+
+My role as coordinator would simply be to refer to Ulrich any German
+speaking volunteer interested to localization of free software packages, and
+maybe helping national groups to initially organize, while maintaining
+national registries for until national groups are ready to take over.
+In fact, the coordinator should ease volunteers to get in contact with
+one another for creating national teams, which should then select
+one coordinator per language, or country (regionalized language).
+If well done, the coordination should be useful without being an
+overwhelming task, the time to put delegations in place.
+
+
+I suggest we look for volunteer coordinators/editors for individual
+languages. These people will scan contributions of translation files
+for various programs, for their own languages, and will ensure high
+and uniform standards of diction.
+
+
+From my current experience with other people in these days, those who
+provide localizations are very enthusiastic about the process, and are
+more interested in the localization process than in the program they
+localize, and want to do many programs, not just one. This seems
+to confirm that having a coordinator/editor for each language is a
+good idea.
+
+
+We need to choose someone who is good at writing clear and concise
+prose in the language in question. That is hard--we can't check
+it ourselves. So we need to ask a few people to judge each others'
+writing and select the one who is best.
+
+
+I announce my prerelease to a few dozen people, and you would not
+believe all the discussions it generated already. I shudder to think
+what will happen when this will be launched, for true, officially,
+world wide. Who am I to arbitrate between two Czekolsovak users
+contradicting each other, for example?
+
+
+I assume that your German is not much better than my French so that
+I would not be able to judge about these formulations. What I would
+suggest is that for each language there is a group for people who
+maintain the PO files and judge about changes. I suspect there will
+be cultural differences between how such groups of people will behave.
+Some will have relaxed ways, reach consensus easily, and have anyone
+of the group relate to the maintainers, while others will fight to
+death, organize heavy administrations up to national standards, and
+use strict channels.
+
+
+The German team is putting out a good example. Right now, they are
+maybe half a dozen people revising translations of each other and
+discussing the linguistic issues. I do not even have all the names.
+Ulrich Drepper is taking care of coordinating the German team.
+He subscribed to all my pretest lists, so I do not even have to warn
+him specifically of incoming releases.
+
+
+I'm sure, that is a good idea to get teams for each language working
+on translations. That will make the translations better and more
+consistent.
+
+
+Taking French for example, there are a few sub-cultures around computers
+which developed diverging vocabularies. Picking volunteers here and
+there without addressing this problem in an organized way, soon in the
+project, might produce a distasteful mix of internationalized programs,
+and possibly trigger endless quarrels among those who really care.
+
+
+Keeping some kind of unity in the way French localization of
+internationalized programs is achieved is a difficult (and delicate) job.
+Knowing the latin character of French people (:-), if we take this
+the wrong way, we could end up nowhere, or spoil a lot of energies.
+Maybe we should begin to address this problem seriously before
+GNU
+I expect the next big changes after the official release. Please note
+that I use the German translation of the short GPL message. We need
+to set a few good examples before the localization goes out for true
+in the free software community. Here are a few points to discuss:
+
+
+If we get any inquiries about GNU
+The `*-pretest' lists are quite useful to me, maybe the idea could
+be generalized to many GNU, and non-GNU packages. But each maintainer
+his/her way!
+
+
+Fran@,{c}ois, we have a mechanism in place here at
+`gnu.ai.mit.edu' to track teams, support mailing lists for
+them and log members. We have a slight preference that you use it.
+If this is OK with you, I can get you clued in.
+
+
+Things are changing! A few years ago, when Daniel Fekete and I
+asked for a mailing list for GNU localization, nested at the FSF, we
+were politely invited to organize it anywhere else, and so did we.
+For communicating with my pretesters, I later made a handful of
+mailing lists located at iro.umontreal.ca and administrated by
+
+I suspect that the German team will organize itself a mailing list
+located in Germany, and so forth for other countries. But before they
+organize for true, it could surely be useful to offer mailing lists
+located at the FSF to each national team. So yes, please explain me
+how I should proceed to create and handle them.
+
+
+We should create temporary mailing lists, one per country, to help
+people organize. Temporary, because once regrouped and structured, it
+would be fair the volunteers from country bring back their list
+in there and manage it as they want. My feeling is that, in the long
+run, each team should run its own list, from within their country.
+There also should be some central list to which all teams could
+subscribe as they see fit, as long as each team is represented in it.
+
+
+There will surely be some discussion about this messages after the
+packages are finally released. If people now send you some proposals
+for better messages, how do you proceed? Jim, please note that
+right now, as I put forward nearly a dozen of localizable programs, I
+receive both the translations and the coordination concerns about them.
+
+
+If I put one of my things to pretest, Ulrich receives the announcement
+and passes it on to the German team, who make last minute revisions.
+Then he submits the translation files to me as the maintainer.
+For free packages I do not maintain, I would not even hear about it.
+This scheme could be made to work for the whole Translation Project,
+I think. For security reasons, maybe Ulrich (national coordinators,
+in fact) should update central registry kept at the Translation Project
+(Jim, me, or Len's recruits) once in a while.
+
+
+In December/January, I was aggressively ready to internationalize
+all of GNU, giving myself the duty of one small GNU package per week
+or so, taking many weeks or months for bigger packages. But it does
+not work this way. I first did all the things I'm responsible for.
+I've nothing against some missionary work on other maintainers, but
+I'm also loosing a lot of energy over it--same debates over again.
+
+
+And when the first localized packages are released we'll get a lot of
+responses about ugly translations :-). Surely, and we need to have
+beforehand a fairly good idea about how to handle the information
+flow between the national teams and the package maintainers.
+
+
+Please start saving somewhere a quick history of each PO file. I know
+for sure that the file format will change, allowing for comments.
+It would be nice that each file has a kind of log, and references for
+those who want to submit comments or gripes, or otherwise contribute.
+I sent a proposal for a fast and flexible format, but it is not
+receiving acceptance yet by the GNU deciders. I'll tell you when I
+have more information about this.
+
+
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_11.html b/doc/gettext_11.html
new file mode 100644
index 000000000..ea125d0e3
--- /dev/null
+++ b/doc/gettext_11.html
@@ -0,0 +1,707 @@
+
+
+
+
+
+The maintainer of a package has many responsibilities. One of them
+is ensuring that the package will install easily on many platforms,
+and that the magic we described earlier (see section 8 The User's View) will work
+for installers and end users.
+
+
+Of course, there are many possible ways by which GNU
+Nevertheless, GNU
+Even if
+Some free software packages are distributed as
+We cannot say much about flat distributions. A flat
+directory structure has the disadvantage of increasing the difficulty
+of updating to a new version of GNU
+Maybe because GNU
+There are some works which are required for using GNU
+It is worth adding here a few words about how the maintainer should
+ideally behave with PO files submissions. As a maintainer, your role is
+to authentify the origin of the submission as being the representative
+of the appropriate translating teams of the Translation Project (forward
+the submission to `translation@iro.umontreal.ca' in case of doubt),
+to ensure that the PO file format is not severely broken and does not
+prevent successful installation, and for the rest, to merely to put these
+PO files in `po/' for distribution.
+
+
+As a maintainer, you do not have to take on your shoulders the
+responsibility of checking if the translations are adequate or
+complete, and should avoid diving into linguistic matters. Translation
+teams drive themselves and are fully responsible of their linguistic
+choices for the Translation Project. Keep in mind that translator teams are not
+driven by maintainers. You can help by carefully redirecting all
+communications and reports from users about linguistic matters to the
+appropriate translation team, or explain users how to reach or join
+their team. The simplest might be to send them the `ABOUT-NLS' file.
+
+
+Maintainers should never ever apply PO file bug reports
+themselves, short-cutting translation teams. If some translator has
+difficulty to get some of her points through her team, it should not be
+an issue for her to directly negotiate translations with maintainers.
+Teams ought to settle their problems themselves, if any. If you, as
+a maintainer, ever think there is a real problem with a team, please
+never try to solve a team's problem on your own.
+
+
+Some files are consistently and identically needed in every package
+internationalized through GNU
+and accepts the following options:
+
+
+If directory is given, this is the top level directory of a
+package to prepare for using GNU
+The program
+If your site support symbolic links,
+It is interesting to understand that most new files for supporting
+GNU
+Besides files which are automatically added through
+So, here comes a list of files, each one followed by a description of
+all alterations it needs. Many examples are taken out from the GNU
+
+The `po/' directory should receive a file named
+`POTFILES.in'. This file tells which files, among all program
+sources, have marked strings needing translation. Here is an example
+of such a file:
+
+
+Hash-marked comments and white lines are ignored. All other lines
+list those source files containing strings marked for translation
+(see section 3.2 How Marks Appear in Sources), in a notation relative to the top level
+of your whole distribution, rather than the location of the
+`POTFILES.in' file itself.
+
+
+You need to add the GNU `config.guess' and `config.sub' files
+to your distribution. They are needed because the `intl/' directory
+has platform dependent support for determining the locale's character
+encoding and therefore needs to identify the platform.
+
+
+You can obtain the newest version of `config.guess' and
+`config.sub' from `ftp://ftp.gnu.org/pub/gnu/config/'.
+Less recent versions are also contained in the GNU
+Normally, `config.guess' and `config.sub' are put at the
+top level of a distribution. But it is also possible to put them in a
+subdirectory, altogether with other configuration support files like
+`install-sh', `ltconfig', `ltmain.sh',
+`mkinstalldirs' or `missing'. All you need to do, other than
+moving the files, is to add the following line to your
+`configure.in'.
+
+
+If you do not have an `aclocal.m4' file in your distribution,
+the simplest is to concatenate the files `codeset.m4',
+`gettext.m4', `iconv.m4', `isc-posix.m4',
+`lcmessage.m4', `progtest.m4' from GNU
+If you already have an `aclocal.m4' file, then you will have
+to merge the said macro files into your `aclocal.m4'. Note that if
+you are upgrading from a previous release of GNU
+These macros check for the internationalization support functions
+and related informations. Hopefully, once stabilized, these macros
+might be integrated in the standard Autoconf set, because this
+piece of
+Earlier GNU
+Here are a few modifications you need to make to your main, top-level
+`Makefile.in' file.
+
+
+Some of the modifications made in the main `Makefile.in' will
+also be needed in the `Makefile.in' from your package sources,
+which we assume here to be in the `src/' subdirectory. Here are
+all the modifications needed in `src/Makefile.in':
+
+
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_12.html b/doc/gettext_12.html
new file mode 100644
index 000000000..e06983908
--- /dev/null
+++ b/doc/gettext_12.html
@@ -0,0 +1,160 @@
+
+
+
+
+
+We would like to conclude this GNU
+Internationalization concerns and algorithms have been informally
+and casually discussed for years in GNU, sometimes around GNU
+
+This all began in July 1994, when Patrick D'Cruze had the idea and
+initiative of internationalizing version 3.9.2 of GNU
+Jim implemented
+While Jim took some distance and time and became dad for a second
+time, Roland wanted to get GNU
+Let's summarize by saying that Ulrich Drepper wrote GNU
+While this was being done, Fran@,{c}ois adapted half a dozen of
+GNU packages to
+Fran@,{c}ois also wrote PO mode in June 1995 with the collaboration
+of Greg McGary, as a kind of contribution to Ulrich's package.
+He also gave a hand with the GNU
+Eugene H. Dorr (`dorre@well.com') maintains an interesting
+bibliography on internationalization matters, called
+Internationalization Reference List, which is available as:
+
+
+Michael Gschwind (`mike@vlsivie.tuwien.ac.at') maintains a
+Frequently Asked Questions (FAQ) list, entitled Programming for
+Internationalisation. This FAQ discusses writing programs which
+can handle different language conventions, character sets, etc.;
+and is applicable to all character set encodings, with particular
+emphasis on ISO 8859-1. It is regularly published in Usenet
+groups `comp.unix.questions', `comp.std.internat',
+`comp.software.international', `comp.lang.c',
+`comp.windows.x', `comp.std.c', `comp.answers'
+and `news.answers'. The home location of this document is:
+
+
+Patrick D'Cruze (`pdcruze@li.org') wrote a tutorial about NLS
+matters, and Jochen Hein (`Hein@student.tu-clausthal.de') took
+over the responsibility of maintaining it. It may be found as:
+
+
+This site is mirrored in:
+
+
+A French version of the same tutorial should be findable at:
+
+
+together with French translations of many Linux-related documents.
+
+
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_13.html b/doc/gettext_13.html
new file mode 100644
index 000000000..ee20244ae
--- /dev/null
+++ b/doc/gettext_13.html
@@ -0,0 +1,523 @@
+
+
+
+
+
+The ISO 639 standard defines two character codes for many languages.
+All abbreviations for languages used in the Translation Project should
+come from this standard.
+
+
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_14.html b/doc/gettext_14.html
new file mode 100644
index 000000000..99ad0ca6d
--- /dev/null
+++ b/doc/gettext_14.html
@@ -0,0 +1,745 @@
+
+
+
+
+
+The ISO 3166 standard defines two character codes for many countries
+and territories. All abbreviations for countries used in the Translation
+Project should come from this standard.
+
+
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_2.html b/doc/gettext_2.html
new file mode 100644
index 000000000..fcf1a0588
--- /dev/null
+++ b/doc/gettext_2.html
@@ -0,0 +1,685 @@
+
+
+
+
+
+The GNU
+Once you have received, unpacked, configured and compiled the GNU
+
+During the installation of the PO mode, you might want to modify your
+file `.emacs', once and for all, so it contains a few lines looking
+like:
+
+
+Later, whenever you edit some `.po', `.pot' or `.pox'
+file, or any file having the string `.po.' within its name,
+Emacs loads `po-mode.elc' (or `po-mode.el') as needed, and
+automatically activates PO mode commands for the associated buffer.
+The string PO appears in the mode line for any buffer for
+which PO mode is active. Many PO files may be active at once in a
+single Emacs session.
+
+
+If you are using Emacs version 20 or newer, and have already installed
+the appropriate international fonts on your system, you may also tell
+Emacs how to determine automatically the coding system of every PO file.
+This will often (but not always) cause the necessary fonts to be loaded
+and used for displaying the translations on your Emacs screen. For this
+to happen, add the lines:
+
+
+to your `.emacs' file. If, with this, you still see boxes instead
+of international characters, try a different font set (via Shift Mouse
+button 1).
+
+
+A PO file is made up of many entries, each entry holding the relation
+between an original untranslated string and its corresponding
+translation. All entries in a given PO file usually pertain
+to a single project, and all translations are expressed in a single
+target language. One PO file entry has the following schematic
+structure:
+
+
+The general structure of a PO file should be well understood by
+the translator. When using PO mode, very little has to be known
+about the format details, as PO mode takes care of them for her.
+
+
+Entries begin with some optional white space. Usually, when generated
+through GNU
+After white space and comments, entries show two strings, namely
+first the untranslated string as it appears in the original program
+sources, and then, the translation of this string. The original
+string is introduced by the keyword
+The
+The comment lines beginning with #, are special because they are
+not completely ignored by the programs as comments generally are. The
+comma separated list of flags is used by the
+A different kind of entries is used for translations which involve
+plural forms.
+
+
+It happens that some lines, usually whitespace or comments, follow the
+very last entry of a PO file. Such lines are not part of any entry,
+and PO mode is unable to take action on those lines. By using the
+PO mode function M-x po-normalize, the translator may get
+rid of those spurious lines. See section 2.5 Normalizing Strings in Entries.
+
+
+The remainder of this section may be safely skipped by those using
+PO mode, yet it may be interesting for everybody to have a better
+idea of the precise format of a PO file. On the other hand, those
+not having Emacs handy should carefully continue reading on.
+
+
+Each of untranslated-string and translated-string respects
+the C syntax for a character string, including the surrounding quotes
+and imbedded backslashed escape sequences. When the time comes
+to write multi-line strings, one should not use escaped newlines.
+Instead, a closing quote should follow the last character on the
+line to be continued, and an opening quote should resume the string
+at the beginning of the following PO file line. For example:
+
+
+In this example, the empty string is used on the first line, to
+allow better alignment of the H from the word `Here'
+over the f from the word `for'. In this example, the
+
+One should carefully distinguish between end of lines marked as
+`\n' inside quotes, which are part of the represented
+string, and end of lines in the PO file itself, outside string quotes,
+which have no incidence on the represented string.
+
+
+Outside strings, white lines and comments may be used freely.
+Comments start at the beginning of a line with `#' and extend
+until the end of the PO file line. Comments written by translators
+should have the initial `#' immediately followed by some white
+space. If the `#' is not immediately followed by white space,
+this comment is most likely generated and managed by specialized GNU
+tools, and might disappear or be replaced unexpectedly when the PO
+file is given to
+After setting up Emacs with something similar to the lines in
+section 2.1 Completing GNU
+When PO mode is active in a window, the letters `PO' appear
+in the mode line for that window. The mode line also displays how
+many entries of each kind are held in the PO file. For example,
+the string `132t+3f+10u+2o' would tell the translator that the
+PO mode contains 132 translated entries (see section 6.2 Translated Entries,
+3 fuzzy entries (see section 6.3 Fuzzy Entries), 10 untranslated entries
+(see section 6.4 Untranslated Entries) and 2 obsolete entries (see section 6.5 Obsolete Entries). Zero-coefficients items are not shown. So, in this example, if
+the fuzzy entries were unfuzzied, the untranslated entries were translated
+and the obsolete entries were deleted, the mode line would merely display
+`145t' for the counters.
+
+
+The main PO commands are those which do not fit into the other categories of
+subsequent sections. These allow for quitting PO mode or for managing windows
+in special ways.
+
+
+The command U (
+The commands Q (
+The command O (
+The command h (
+The command = (
+The command V (
+The program
+The cursor in a PO file window is almost always part of
+an entry. The only exceptions are the special case when the cursor
+is after the last entry in the file, or when the PO file is
+empty. The entry where the cursor is found to be is said to be the
+current entry. Many PO mode commands operate on the current entry,
+so moving the cursor does more than allowing the translator to browse
+the PO file, this also selects on which entry commands operate.
+
+
+Some PO mode commands alter the position of the cursor in a specialized
+way. A few of those special purpose positioning are described here,
+the others are described in following sections.
+
+
+Any Emacs command able to reposition the cursor may be used
+to select the current entry in PO mode, including commands which
+move by characters, lines, paragraphs, screens or pages, and search
+commands. However, there is a kind of standard way to display the
+current entry in PO mode, which usual Emacs commands moving
+the cursor do not especially try to enforce. The command .
+(
+It is yet to be decided if PO mode helps the translator, or otherwise
+irritates her, by forcing a rigid window disposition while she
+is doing her work. We originally had quite precise ideas about
+how windows should behave, but on the other hand, anyone used to
+Emacs is often happy to keep full control. Maybe a fixed window
+disposition might be offered as a PO mode option that the translator
+might activate or deactivate at will, so it could be offered on an
+experimental basis. If nobody feels a real need for using it, or
+a compulsion for writing it, we should drop this whole idea.
+The incentive for doing it should come from translators rather than
+programmers, as opinions from an experienced translator are surely
+more worth to me than opinions from programmers thinking about
+how others should do translation.
+
+
+The commands n (
+The commands < (
+The translator may decide, before working at the translation of
+a particular entry, that she needs to browse the remainder of the
+PO file, maybe for finding the terminology or phraseology used
+in related entries. She can of course use the standard Emacs idioms
+for saving the current cursor location in some register, and use that
+register for getting back, or else, use the location ring.
+
+
+PO mode offers another approach, by which cursor locations may be saved
+onto a special stack. The command m (
+If the translator wants the position to be kept on the location stack,
+maybe for taking a look at the entry associated with the top
+element, then go elsewhere with the intent of getting back later, she
+ought to use m immediately after r.
+
+
+The command x (
+There are many different ways for encoding a particular string into a
+PO file entry, because there are so many different ways to split and
+quote multi-line strings, and even, to represent special characters
+by backslahsed escaped sequences. Some features of PO mode rely on
+the ability for PO mode to scan an already existing PO file for a
+particular string encoded into the
+A conventional representation of strings in a PO file is currently
+under discussion, and PO mode experiments with a canonical representation.
+Having both
+So, for achieving normalization of at least the strings of a given
+PO file needing a canonical representation, the following PO mode
+command is available:
+
+
+The special command M-x po-normalize, which has no associated
+keys, revises all entries, ensuring that strings of both original
+and translated entries use uniform internal quoting in the PO file.
+It also removes any crumb after the last entry. This command may be
+useful for PO files freshly imported from elsewhere, or if we ever
+improve on the canonical quoting format we use. This canonical format
+is not only meant for getting cleaner PO files, but also for greatly
+speeding up
+M-x po-normalize presently makes three passes over the entries.
+The first implements heuristics for converting PO files for GNU
+
+Having such an explicit normalizing command allows for importing PO
+files from other sources, but also eases the evolution of the current
+convention, evolution driven mostly by aesthetic concerns, as of now.
+It is easy to make suggested adjustments at a later time, as the
+normalizing command and eventually, other GNU
+Right now, in PO mode, strings are single line or multi-line. A string
+goes multi-line if and only if it has embedded newlines, that
+is, if it matches `[^\n]\n+[^\n]'. So, we would have:
+
+
+but, replacing the space by a newline, this becomes:
+
+
+We are deliberately using a caricatural example, here, to make the
+point clearer. Usually, multi-lines are not that bad looking.
+It is probable that we will implement the following suggestion.
+We might lump together all initial newlines into the empty string,
+and also all newlines introducing empty lines (that is, for n
+> 1, the n-1'th last newlines would go together on a separate
+string), so making the previous example appear:
+
+
+There are a few yet undecided little points about string normalization,
+to be documented in this manual, once these questions settle.
+
+
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_3.html b/doc/gettext_3.html
new file mode 100644
index 000000000..ee3715d13
--- /dev/null
+++ b/doc/gettext_3.html
@@ -0,0 +1,620 @@
+
+
+
+
+
+For the programmer, changes to the C source code fall into three
+categories. First, you have to make the localization functions
+known to all modules needing message translation. Second, you should
+properly trigger the operation of GNU
+Presuming that your set of programs, or package, has been adjusted
+so all needed GNU
+The remaining changes to your C sources are discussed in the further
+sections of this chapter.
+
+
+The initialization of locale data should be done with more or less
+the same code in every program, as demonstrated below:
+
+
+PACKAGE and LOCALEDIR should be provided either by
+`config.h' or by the Makefile. For now consult the
+The use of
+Some systems also have problems with parsing numbers using the
+
+So it is sometimes necessary to replace the
+On all POSIX conformant systems the locale categories
+Note that changing the
+It is also possible to switch the locale forth and back between the
+environment dependent locale and the C locale, but this approach is
+normally avoided because a
+All strings requiring translation should be marked in the C sources. Marking
+is done in such a way that each translatable string appears to be
+the sole argument of some function or preprocessor macro. There are
+only a few such possible functions or macros meant for translation,
+and their names are said to be marking keywords. The marking is
+attached to strings themselves, rather than to what we do with them.
+This approach has more uses. A blatant example is an error message
+produced by formatting. The format string needs translation, as
+well as some strings inserted through some `%s' specification
+in the format, while the result from
+This marking operation has two goals. The first goal of marking
+is for triggering the retrieval of the translation, at run time.
+The keyword are possibly resolved into a routine able to dynamically
+return the proper translation, as far as possible or wanted, for the
+argument string. Most localizable strings are found in executable
+positions, that is, attached to variables or given as parameters to
+functions. But this is not universal usage, and some translatable
+strings appear in structured initializations. See section 3.5 Special Cases of Translatable Strings.
+
+
+The second goal of the marking operation is to help
+The canonical keyword for marking translatable strings is
+`gettext', it gave its name to the whole GNU
+Many packages use `_' (a simple underline) as a keyword,
+and write `_("Translatable string")' instead of `gettext
+("Translatable string")'. Further, the coding rule, from GNU standards,
+wanting that there is a space between the keyword and the opening
+parenthesis is relaxed, in practice, for this particular usage.
+So, the textual overhead per translatable string is reduced to
+only three characters: the underline and the two parentheses.
+However, even if GNU
+instead of merely using `#include <libintl.h>'.
+
+
+Later on, the maintenance is relatively easy. If, as a programmer,
+you add or modify a string, you will have to ask yourself if the
+new or altered string requires translation, and include it within
+`_()' if you think it should be translated. `"%s: %d"' is
+an example of string not requiring translation!
+
+
+In PO mode, one set of features is meant more for the programmer than
+for the translator, and allows him to interactively mark which strings,
+in a set of program sources, are translatable, and which are not.
+Even if it is a fairly easy job for a programmer to find and mark
+such strings by other means, using any editor of his choice, PO mode
+makes this work more comfortable. Further, this gives translators
+who feel a little like programmers, or programmers who feel a little
+like translators, a tool letting them work at marking translatable
+strings in the program sources, while simultaneously producing a set of
+translation in some language, for the package being internationalized.
+
+
+The set of program sources, targetted by the PO mode commands describe
+here, should have an Emacs tags table constructed for your project,
+prior to using these PO file commands. This is easy to do. In any
+shell window, change the directory to the root of your project, then
+execute a command resembling:
+
+
+presuming here you want to process all `.h' and `.c' files
+from the `src/' and `lib/' directories. This command will
+explore all said files and create a `TAGS' file in your root
+directory, somewhat summarizing the contents using a special file
+format Emacs can understand.
+
+
+For packages following the GNU coding standards, there is
+a make goal
+Once your `TAGS' file is ready, the following commands assist
+the programmer at marking translatable strings in his set of sources.
+But these commands are necessarily driven from within a PO file
+window, and it is likely that you do not even have such a PO file yet.
+This is not a problem at all, as you may safely open a new, empty PO
+file, mainly for using these commands. This empty PO file will slowly
+fill in while you mark strings as translatable in your program sources.
+
+
+The , (
+A string is a good candidate for translation if it contains a sequence
+of three or more letters. A string containing at most two letters in
+a row will be considered as a candidate if it has more letters than
+non-letters. The command disregards strings containing no letters,
+or isolated letters only. It also disregards strings within comments,
+or strings already marked with some keyword PO mode knows (see below).
+
+
+If you have never told Emacs about some `TAGS' file to use, the
+command will request that you specify one from the minibuffer, the
+first time you use the command. You may later change your `TAGS'
+file by using the regular Emacs command M-x visit-tags-table,
+which will ask you to name the precise `TAGS' file you want
+to use. See section `Tag Tables' in The Emacs Editor.
+
+
+Each time you use the , command, the search resumes from where it was
+left by the previous search, and goes through all program sources,
+obeying the `TAGS' file, until all sources have been processed.
+However, by giving a prefix argument to the command (C-u
+,), you may request that the search be restarted all over again
+from the first program source; but in this case, strings that you
+recently marked as translatable will be automatically skipped.
+
+
+Using this , command does not prevent using of other regular
+Emacs tags commands. For example, regular
+The M-, (
+The M-. command has a few built-in speedups, so you do not
+have to explicitly type all keywords all the time. The first such
+speedup is that you are presented with a preferred keyword,
+which you may accept by merely typing RET at the prompt.
+The second speedup is that you may type any non-ambiguous prefix of the
+keyword you really mean, and the command will complete it automatically
+for you. This also means that PO mode has to know all
+your possible keywords, and that it will not accept mistyped keywords.
+
+
+If you reply ? to the keyword request, the command gives a
+list of all known keywords, from which you may choose. When the
+command is prefixed by an argument (C-u M-.), it inhibits
+updating any program source or PO file buffer, and does some simple
+keyword management instead. In this case, the command asks for a
+keyword, written in full, which becomes a new allowed keyword for
+later M-. commands. Moreover, this new keyword automatically
+becomes the preferred keyword for later commands. By typing
+an already known keyword in response to C-u M-., one merely
+changes the preferred keyword and does nothing more.
+
+
+All keywords known for M-. are recognized by the , command
+when scanning for strings, and strings already marked by any of those
+known keywords are automatically skipped. If many PO files are opened
+simultaneously, each one has its own independent set of known keywords.
+There is no provision in PO mode, currently, for deleting a known
+keyword, you have to quit the file (maybe using q) and reopen
+it afresh. When a PO file is newly brought up in an Emacs window, only
+`gettext' and `_' are known as keywords, and `gettext'
+is preferred for the M-. command. In fact, this is not useful to
+prefer `_', as this one is already built in the M-, command.
+
+
+In C programs strings are often used within calls of functions from the
+
+A possible German translation for the above string might be:
+
+
+A C programmer, even if he cannot speak German, will recognize that
+there is something wrong here. The order of the two format specifiers
+is changed but of course the arguments in the
+To prevent errors at runtime caused by translations the
+If the word order in the above German translation would be correct one
+would have to write
+
+
+The routines in
+Because not all strings in a program must be format strings it is not
+useful for
+Therefore the
+The careful reader now might say that this again can cause problems.
+The heuristic might guess it wrong. This is true and therefore
+
+This situation happens quite often. The
+
+If a string is marked with c-format and this is not correct the
+user can find out who is responsible for the decision. See
+section 4.1 Invoking the
+The attentive reader might now point out that it is not always possible
+to mark translatable string with
+While it is no problem to mark the string
+The first task can be fulfilled by creating a new keyword, which names a
+no-op. For the second we have to mark all access points to a string
+from the array. So one solution can look like this:
+
+
+Please convince yourself that the string which is written by
+
+The above is of course not the only solution. You could also come along
+with the following one:
+
+
+But this has some drawbacks. First the programmer has to take care that
+he uses
+One advantage is that you need not make control flow analysis to make
+sure the output is really translated in any case. But this analysis is
+generally not very difficult. If it should be in any situation you can
+use this second method in this situation.
+
+
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_4.html b/doc/gettext_4.html
new file mode 100644
index 000000000..daf1fc2ee
--- /dev/null
+++ b/doc/gettext_4.html
@@ -0,0 +1,208 @@
+
+
+
+
+
+After preparing the sources, the programmer creates a PO template file.
+This section explains how to use
+Search path for supplementary PO files is:
+`/usr/local/share/nls/src/'.
+
+
+If inputfile is `-', standard input is read.
+
+
+This implementation of
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_5.html b/doc/gettext_5.html
new file mode 100644
index 000000000..3723f1c0e
--- /dev/null
+++ b/doc/gettext_5.html
@@ -0,0 +1,188 @@
+
+
+
+
+
+When starting a new translation, the translator copies the
+`package.pot' template file to a file called
+`LANG.po'. Then she modifies the initial comments and
+the header entry of this file.
+
+
+The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and
+"FIRST AUTHOR <EMAIL@ADDRESS>, YEAR" ought to be replaced by sensible
+information. This can be done in any text editor; if Emacs is used
+and it switched to PO mode automatically (because it has recognized
+the file's suffix), you can disable it by typing M-x fundamental-mode.
+
+
+Modifying the header entry can already be done using PO mode: in Emacs,
+type M-x po-mode RET and then RET again to start editing the
+entry. You should fill in the following fields.
+
+
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_6.html b/doc/gettext_6.html
new file mode 100644
index 000000000..363d5670f
--- /dev/null
+++ b/doc/gettext_6.html
@@ -0,0 +1,930 @@
+
+
+
+
+
+Each PO file entry for which the
+Some commands are more specifically related to translated entry processing.
+
+
+The commands t (
+Translated entries usually result from the translator having edited in
+a translation for them, section 6.6 Modifying Translations. However, if the
+variable
+Each PO file entry may have a set of attributes, which are
+qualities given a name and explicitely associated with the translation,
+using a special system comment. One of these attributes
+has the name
+Fuzzy entries, even if they account for translated entries for
+most other purposes, usually call for revision by the translator.
+Those may be produced by applying the program
+Also, the translator may decide herself to mark an entry as fuzzy
+for her own convenience, when she wants to remember that the entry
+has to be later revisited. So, some commands are more specifically
+related to fuzzy entry processing.
+
+
+The commands f (
+The command TAB (
+The initial value of
+The translator may also use the DEL command
+(
+Also, when time comes to quit working on a PO file buffer with the q
+command, the translator is asked for confirmation, if fuzzy string
+still exists.
+
+
+When
+The usual commands moving from entry to entry consider untranslated
+entries on the same level as active entries. Untranslated entries
+are easily recognizable by the fact they end with `msgstr ""'.
+
+
+The work of the translator might be (quite naively) seen as the process
+of seeking for an untranslated entry, editing a translation for
+it, and repeating these actions until no untranslated entries remain.
+Some commands are more specifically related to untranslated entry
+processing.
+
+
+The commands u (
+An entry can be turned back into an untranslated entry by
+merely emptying its translation, using the command k
+(
+Also, when time comes to quit working on a PO file buffer
+with the q command, the translator is asked for confirmation,
+if some untranslated string still exists.
+
+
+By obsolete PO file entries, we mean those entries which are
+commented out, usually by
+The usual commands moving from entry to entry consider obsolete
+entries on the same level as active entries. Obsolete entries are
+easily recognizable by the fact that all their lines start with
+#, even those lines containing
+Commands exist for emptying the translation or reinitializing it
+to the original untranslated string. Commands interfacing with the
+kill ring may force some previously saved text into the translation.
+The user may interactively edit the translation. All these commands
+may apply to obsolete entries, carefully leaving the entry obsolete
+after the fact.
+
+
+Moreover, some commands are more specifically related to obsolete
+entry processing.
+
+
+The commands o (
+PO mode does not provide ways for un-commenting an obsolete entry
+and making it active, because this would reintroduce an original
+untranslated string which does not correspond to any marked string
+in the program sources. This goes with the philosophy of never
+introducing useless
+However, it is possible to comment out an active entry, so making
+it obsolete. GNU
+Here is a quite interesting problem to solve for later development of
+PO mode, for those nights you are not sleepy. The idea would be that
+PO mode might become bright enough, one of these days, to make good
+guesses at retrieving the most probable candidate, among all obsolete
+entries, for initializing the translation of a newly appeared string.
+I think it might be a quite hard problem to do this algorithmically, as
+we have to develop good and efficient measures of string similarity.
+Right now, PO mode completely lets the decision to the translator,
+when the time comes to find the adequate obsolete translation, it
+merely tries to provide handy tools for helping her to do so.
+
+
+PO mode prevents direct edition of the PO file, by the usual
+means Emacs give for altering a buffer's contents. By doing so,
+it pretends helping the translator to avoid little clerical errors
+about the overall file format, or the proper quoting of strings,
+as those errors would be easily made. Other kinds of errors are
+still possible, but some may be caught and diagnosed by the batch
+validation process, which the translator may always trigger by the
+V command. For all other errors, the translator has to rely on
+her own judgment, and also on the linguistic reports submitted to her
+by the users of the translated package, having the same mother tongue.
+
+
+When the time comes to create a translation, correct an error diagnosed
+mechanically or reported by a user, the translators have to resort to
+using the following commands for modifying the translations.
+
+
+The command RET (
+The command LFD (
+It is possible to arrange so, whenever editing an untranslated
+entry, the LFD command be automatically executed. If you set
+
+In fact, whether it is best to start a translation with an empty
+string, or rather with a copy of the original string, is a matter of
+taste or habit. Sometimes, the source language and the
+target language are so different that is simply best to start writing
+on an empty page. At other times, the source and target languages
+are so close that it would be a waste to retype a number of words
+already being written in the original string. A translator may also
+like having the original string right under her eyes, as she will
+progressively overwrite the original text with the translation, even
+if this requires some extra editing work to get rid of the original.
+
+
+The command k (
+The translator may use k or w many times in the course
+of her work, as the kill ring may hold several saved translations.
+From the kill ring, strings may later be reinserted in various
+Emacs buffers. In particular, the kill ring may be used for moving
+translation strings between different entries of a single PO file
+buffer, or if the translator is handling many such buffers at once,
+even between PO files.
+
+
+To facilitate exchanges with buffers which are not in PO mode, the
+translation string put on the kill ring by the k command is fully
+unquoted before being saved: external quotes are removed, multi-line
+strings are concatenated, and backslash escaped sequences are turned
+into their corresponding characters. In the special case of obsolete
+entries, the translation is also uncommented prior to saving.
+
+
+The command y (
+When a string is yanked into a PO file entry, it is fully and
+automatically requoted for complying with the format PO files should
+have. Further, if the entry is obsolete, PO mode then appropriately
+push the inserted string inside comments. Once again, translators
+should not burden themselves with quoting considerations besides, of
+course, the necessity of the translated string itself respective to
+the program using it.
+
+
+Note that k or w are not the only commands pushing strings
+on the kill ring, as almost any PO mode command replacing translation
+strings (or the translator comments) automatically saves the old string
+on the kill ring. The main exceptions to this general rule are the
+yanking commands themselves.
+
+
+To better illustrate the operation of killing and yanking, let's
+use an actual example, taken from a common situation. When the
+programmer slightly modifies some string right in the program, his
+change is later reflected in the PO file by the appearance
+of a new untranslated entry for the modified string, and the fact
+that the entry translating the original or unmodified string becomes
+obsolete. In many cases, the translator might spare herself some work
+by retrieving the unmodified translation from the obsolete entry,
+then initializing the untranslated entry
+When the translator finds an untranslated entry and suspects that a
+slight variant of the translation exists, she immediately uses m
+to mark the current entry location, then starts chasing obsolete
+entries with o, hoping to find some translation corresponding
+to the unmodified string. Once found, she uses the DEL command
+for deleting the obsolete entry, knowing that DEL also kills
+the translation, that is, pushes the translation on the kill ring.
+Then, r returns to the initial untranslated entry, and y
+then yanks the saved translation right into the
+When some sequence of keys has to be typed over and over again, the
+translator may find it useful to become better acquainted with the Emacs
+capability of learning these sequences and playing them back under request.
+See section `Keyboard Macros' in The Emacs Editor.
+
+
+Any translation work done seriously will raise many linguistic
+difficulties, for which decisions have to be made, and the choices
+further documented. These documents may be saved within the
+PO file in form of translator comments, which the translator
+is free to create, delete, or modify at will. These comments may
+be useful to herself when she returns to this PO file after a while.
+
+
+Comments not having whitespace after the initial `#', for example,
+those beginning with `#.' or `#:', are not translator
+comments, they are exclusively created by other
+The following commands are somewhat similar to those modifying translations,
+so the general indications given for those apply here. See section 6.6 Modifying Translations.
+
+
+These commands parallel PO mode commands for modifying the translation
+strings, and behave much the same way as they do, except that they handle
+this part of PO file comments meant for translator usage, rather
+than the translation strings. So, if the descriptions given below are
+slightly succinct, it is because the full details have already been given.
+See section 6.6 Modifying Translations.
+
+
+The command # (
+Functions found on
+The command K (
+On the kill ring, all strings have the same nature. There is no
+distinction between translation strings and translator
+comments strings. So, for example, let's presume the translator
+has just finished editing a translation, and wants to create a new
+translator comment to document why the previous translation was
+not good, just to remember what was the problem. Foreseeing that she
+will do that in her documentation, the translator may want to quote
+the previous translation in her translator comments. To do so, she
+may initialize the translator comments with the previous translation,
+still at the head of the kill ring. Because editing already pushed the
+previous translation on the kill ring, she merely has to type M-w
+prior to #, and the previous translation will be right there,
+all ready for being introduced by some explanatory text.
+
+
+On the other hand, presume there are some translator comments already
+and that the translator wants to add to those comments, instead
+of wholly replacing them. Then, she should edit the comment right
+away with #. Once inside the editing window, she can use the
+regular Emacs commands C-y (
+The PO subedit minor mode has a few peculiarities worth being described
+in fuller detail. It installs a few commands over the usual editing set
+of Emacs, which are described below.
+
+
+The window's contents represents a translation for a given message,
+or a translator comment. The translator may modify this window to
+her heart's content. Once this done, the command C-c C-c
+(
+If the translator becomes unsatisfied with her translation or comment,
+to the extent she prefers keeping what was existent prior to the
+RET or # command, she may use the command C-c C-k
+(
+The command C-c C-a allows for glancing through translations
+already achieved in other languages, directly while editing the current
+translation. This may be quite convenient when the translator is fluent
+at many languages, but of course, only makes sense when such completed
+auxiliary PO files are already available to her (see section 6.10 Consulting Auxiliary PO Files).
+
+
+Functions found on
+While editing her translation, the translator should pay attention to not
+inserting unwanted RET (newline) characters at the end of
+the translated string if those are not meant to be there, or to removing
+such characters when they are required. Since these characters are not
+visible in the editing buffer, they are easily introduced by mistake.
+To help her, RET automatically puts the character <
+at the end of the string being edited, but this < is not really
+part of the string. On exiting the editing window with C-c C-c,
+PO mode automatically removes such < and all whitespace added after
+it. If the translator adds characters after the terminating <, it
+looses its delimiting property and integrally becomes part of the string.
+If she removes the delimiting <, then the edited string is taken
+as is, with all trailing newlines, even if invisible. Also, if
+the translated string ought to end itself with a genuine <, then
+the delimiting < may not be removed; so the string should appear,
+in the editing window, as ending with two < in a row.
+
+
+When a translation (or a comment) is being edited, the translator may move
+the cursor back into the PO file buffer and freely move to other entries,
+browsing at will. If, with an edition pending, the translator wanders in the
+PO file buffer, she may decide to start modifying another entry. Each entry
+being edited has its own subedit buffer. It is possible to simultaneously
+edit the translation and the comment of a single entry, or to
+edit entries in different PO files, all at once. Typing RET
+on a field already being edited merely resumes that particular edit. Yet,
+the translator should better be comfortable at handling many Emacs windows!
+
+
+Pending subedits may be completed or aborted in any order, regardless
+of how or when they were started. When many subedits are pending and the
+translator asks for quitting the PO file (with the q command), subedits
+are automatically resumed one at a time, so she may decide for each of them.
+
+
+PO mode is particularily powerful when used with PO files
+created through GNU
+When the translator gets to an untranslated entry, she is fairly
+often faced with an original string which is not as informative as
+it normally should be, being succinct, cryptic, or otherwise ambiguous.
+Before chosing how to translate the string, she needs to understand
+better what the string really means and how tight the translation has
+to be. Most of times, when problems arise, the only way left to make
+her judgment is looking at the true program sources from where this
+string originated, searching for surrounding comments the programmer
+might have put in there, and looking around for helping clues of
+any kind.
+
+
+Surely, when looking at program sources, the translator will receive
+more help if she is a fluent programmer. However, even if she is
+not versed in programming and feels a little lost in C code, the
+translator should not be shy at taking a look, once in a while.
+It is most probable that she will still be able to find some of the
+hints she needs. She will learn quickly to not feel uncomfortable
+in program code, paying more attention to programmer's comments,
+variable and function names (if he dared chosing them well), and
+overall organization, than to programmation itself.
+
+
+The following commands are meant to help the translator at getting
+program source context for a PO file entry.
+
+
+The commands s (
+Even if s (or M-s) opens a new window, the cursor stays
+in the PO file window. If the translator really wants to
+get into the program source window, she ought to do it explicitly,
+maybe by using command O.
+
+
+When s is typed for the first time, or for a PO file entry which
+is different of the last one used for getting source context, then the
+command reacts by giving the first context available for this entry,
+if any. If some context has already been recently displayed for the
+current PO file entry, and the translator wandered off to do other
+things, typing s again will merely resume, in another window,
+the context last displayed. In particular, if the translator moved
+the cursor away from the context in the source file, the command will
+bring the cursor back to the context. By using s many times
+in a row, with no other commands intervening, PO mode will cycle to
+the next available contexts for this particular entry, getting back
+to the first context once the last has been shown.
+
+
+The command M-s behaves differently. Instead of cycling through
+references, it lets the translator choose a particular reference among
+many, and displays that reference. It is best used with completion,
+if the translator types TAB immediately after M-s, in
+response to the question, she will be offered a menu of all possible
+references, as a reminder of which are the acceptable answers.
+This command is useful only where there are really many contexts
+available for a single string to translate.
+
+
+Program source files are usually found relative to where the PO
+file stands. As a special provision, when this fails, the file is
+also looked for, but relative to the directory immediately above it.
+Those two cases take proper care of most PO files. However, it might
+happen that a PO file has been moved, or is edited in a different
+place than its normal location. When this happens, the translator
+should tell PO mode in which directory normally sits the genuine PO
+file. Many such directories may be specified, and all together, they
+constitute what is called the search path for program sources.
+The command S (
+PO mode is able to help the knowledgeable translator, being fluent in
+many languages, at taking advantage of translations already achieved
+in other languages she just happens to know. It provides these other
+language translations as additional context for her own work. Moreover,
+it has features to ease the production of translations for many languages
+at once, for translators preferring to work in this way.
+
+
+An auxiliary PO file is an existing PO file meant for the same
+package the translator is working on, but targeted to a different mother
+tongue language. Commands exist for declaring and handling auxiliary
+PO files, and also for showing contexts for the entry under work.
+
+
+Here are the auxiliary file commands available in PO mode.
+
+
+Command A (
+The command a (
+The command M-a (
+For all this to work fully, auxiliary PO files will have to be normalized,
+in that way that
+However, PO files initially created by PO mode itself, while marking
+strings in source files, are normalised differently. So are PO
+files resulting of the the `M-x normalize' command. Until these
+discrepancies between PO mode and other GNU
+Compendiums are yet to be implemented.
+
+
+An incoming PO mode feature will let the translator maintain a
+compendium of already achieved translations. A compendium
+is a special PO file containing a set of translations recurring in
+many different packages. The translator will be given commands for
+adding entries to her compendium, and later initializing untranslated
+entries, or updating already translated entries, from translations
+kept in the compendium. For this to work, however, the compendium
+would have to be normalized. See section 2.5 Normalizing Strings in Entries.
+
+
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_7.html b/doc/gettext_7.html
new file mode 100644
index 000000000..74b8829b0
--- /dev/null
+++ b/doc/gettext_7.html
@@ -0,0 +1,268 @@
+
+
+
+
+
+If input file is `-', standard input is read. If output file
+is `-', output is written to standard output.
+
+
+The format of the generated MO files is best described by a picture,
+which appears below.
+
+
+The first two words serve the identification of the file. The magic
+number will always signal GNU MO files. The number is stored in the
+byte order of the generating machine, so the magic number really is
+two numbers:
+Follow a number of pointers to later tables in the file, allowing
+for the extension of the prefix part of MO files without having to
+recompile programs reading them. This might become useful for later
+inserting a few flag bits, indication about the charset used, new
+tables, or other things.
+
+
+Then, at offset O and offset T in the picture, two tables
+of string descriptors can be found. In both tables, each string
+descriptor uses two 32 bits integers, one for the string length,
+another for the offset of the string in the MO file, counting in bytes
+from the start of the file. The first table contains descriptors
+for the original strings, and is sorted so the original strings
+are in increasing lexicographical order. The second table contains
+descriptors for the translated strings, and is parallel to the first
+table: to find the corresponding translation one has to access the
+array slot in the second array with the same index.
+
+
+Having the original strings sorted enables the use of simple binary
+search, for when the MO file does not contain an hashing table, or
+for when it is not practical to use the hashing table provided in
+the MO file. This also has another advantage, as the empty string
+in a PO file GNU
+The size S of the hash table can be zero. In this case, the
+hash table itself is not contained in the MO file. Some people might
+prefer this because a precomputed hashing table takes disk space, and
+does not win that much speed. The hash table contains indices
+to the sorted array of strings in the MO file. Conflict resolution is
+done by double hashing. The precise hashing algorithm used is fairly
+dependent of GNU
+As for the strings themselves, they follow the hash file, and each
+is terminated with a NUL, and this NUL is not counted in
+the length which appears in the string descriptor. The
+Plural forms are stored by letting the plural of the original string
+follow the singular of the original string, separated through a
+NUL byte. The length which appears in the string descriptor
+includes both. However, only the singular of the original string
+takes part in the hash table lookup. The plural variants of the
+translation are all stored consecutively, separated through a
+NUL byte. Here also, the length in the string descriptor
+includes all of them.
+
+
+Nothing prevents a MO file from having embedded NULs in strings.
+However, the program interface currently used already presumes
+that strings are NUL terminated, so embedded NULs are
+somewhat useless. But the MO file format is general enough so other
+interfaces would be later possible, if for example, we ever want to
+implement wide characters right in MO files, where NUL bytes may
+accidently appear. (No, we don't want to have wide characters in MO
+files. They would make the file unnecessarily large, and the
+`wchar_t' type being platform dependent, MO files would be
+platform dependent as well.)
+
+
+This particular issue has been strongly debated in the GNU
+
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_8.html b/doc/gettext_8.html
new file mode 100644
index 000000000..4a18aa4a8
--- /dev/null
+++ b/doc/gettext_8.html
@@ -0,0 +1,119 @@
+
+
+
+
+
+When GNU
+So, let's try to describe here how we would like the magic to operate,
+as we want the users' view to be the simplest, among all ways one
+could look at GNU
+When a package is distributed, there are two kinds of users:
+installers who fetch the distribution, unpack it, configure
+it, compile it and install it for themselves or others to use; and
+end users that call programs of the package, once these have
+been installed at their site. GNU
+Languages are not equally supported in all packages using GNU
+
+More generally, a matrix is available for showing the current state
+of the Translation Project, listing which packages are prepared for
+multi-lingual messages, and which languages are supported by each.
+Because this information changes often, this matrix is not kept within
+this GNU
+By default, packages fully using GNU
+Internationalized packages have usually many `ll.po'
+files. Unless
+translations are disabled, all those available are installed together
+with the package. However, the environment variable
+We consider here those packages using GNU
+Go to the first, previous, next, last section, table of contents.
+
+
diff --git a/doc/gettext_9.html b/doc/gettext_9.html
new file mode 100644
index 000000000..00592a1ed
--- /dev/null
+++ b/doc/gettext_9.html
@@ -0,0 +1,1410 @@
+
+
+
+
+
+One aim of the current message catalog implementation provided by
+GNU
+The
+Another, personal comment on this that only a bunch of committee members
+could have made this interface. They never really tried to program
+using this interface. It is a fast, memory-saving implementation, an
+user can happily live with it. But programmers hate it (at least me and
+some others do...)
+
+
+But we must not forget one point: after all the trouble with transfering
+the rights on Unix(tm) they at last came to X/Open, the very same who
+published this specification. This leads me to making the prediction
+that this interface will be in future Unix standards (e.g. Spec1170) and
+therefore part of all Unix implementation (implementations, which are
+allowed to wear this name).
+
+
+The interface to the
+
+The function takes as the argument the name of the catalog. This usual
+refers to the name of the program or the package. The second parameter
+is not further specified in the standard. I don't even know whether it
+is implemented consistently among various systems. So the common advice
+is to use
+This handle is of course used in the
+The first parameter is this catalog descriptor. The second parameter
+specifies the set of messages in this catalog, in which the message
+described by
+The fourth argument is not used to address the translation. It is given
+as a default value in case when one of the addressing stages fail. One
+important thing to remember is that although the return type of catgets
+is
+The last of these function functions is used and behaves as expected:
+
+
+After this no
+Now that this description seemed to be really easy -- where are the
+problem we speak of? In fact the interface could be used in a
+reasonable way, but constructing the message catalogs is a pain. The
+reason for this lies in the third argument of
+The definition of the
+The main points about this solution is that it does not follow the
+method of normal file handling (open-use-close) and that it does not
+burden the programmer so many task, especially the unique key handling.
+Of course here is also a unique key needed, but this key is the message
+itself (how long or short it is). See section 9.3 Comparing the Two Interfaces for a more
+detailed comparison of the two methods.
+
+
+The following section contains a rather detailed description of the
+interface. We make it that detailed because this is the interface
+we chose for the GNU
+The minimal functionality an interface must have is a) to select a
+domain the strings are coming from (a single domain for all programs is
+not reasonable because its construction and maintenance is difficult,
+perhaps impossible) and b) to access a string in a selected domain.
+
+
+This is principally the description of the
+This provides the possibility to change or query the current status of
+the current global domain of the
+To use a domain set by
+is to be used. This is the simplest reasonable form one can imagine.
+The translation of the string msgid is returned if it is available
+in the current domain. If not available the argument itself is
+returned. If the argument is
+One things which should come into mind is that no explicit dependency to
+the used domain is given. The current value of the domain for the
+
+For the easiest case, which is normally used in internationalized
+packages, once at the beginning of execution a call to
+While this single name domain works well for most applications there
+might be the need to get translations from more than one domain. Of
+course one could switch between different domains with calls to
+
+For this reasons there are two more functions to retrieve strings:
+
+
+Both take an additional argument at the first place, which corresponds
+to the argument of
+A second ambiguity can arise by the fact, that perhaps more than one
+domain has the same name. This can be solved by specifying where the
+needed message catalog files can be found.
+
+
+Calling this function binds the given domain to a file in the specified
+directory (how this file is determined follows below). Especially a
+file in the systems default place is not favored against the specified
+file anymore (as it would be by solely using
+It is important to remember that relative path names for the
+dir_name parameter can be trouble. Since the path is always
+computed relative to the current directory different results will be
+achieved when the program executes a
+Because many different languages for many different packages have to be
+stored we need some way to add these information to file message catalog
+files. The way usually used in Unix environments is have this encoding
+in the file name. This is also done here. The directory name given in
+
+The default value for dir_name is system specific. For the GNU
+library, and for packages adhering to its conventions, it's:
+
+
+locale is the value of the locale whose name is this
+
+
+The output character set is, by default, the value of
+Note that the msgid argument to
+1 Introduction
+
+
+
+
+
+gettext
+is meant to be useful for people using computers, whatever their sex,
+race, religion or nationality!
+
+gettext
and the free Translation Project.
+Then, it explains a few broad concepts around
+Native Language Support, and positions message translation with regard
+to other aspects of national and cultural variance, as they apply to
+to programs. It also surveys those files used to convey the
+translations. It explains how the various tools interact in the
+initial generation of these files, and later, how the maintenance
+cycle should usually operate.
+
+
+Internet address:
+ bug-gnu-utils@gnu.org
+
+
+1.1 The Purpose of GNU
+
+gettext
gettext
is an important step for the Translation Project,
+as it is an asset on which we may build many other steps. This package
+offers to programmers, translators and even users, a well integrated
+set of tools and documentation. Specifically, the GNU gettext
+utilities are a set of tools that provides a framework within which
+other free packages may produce multi-lingual messages. These tools
+include
+
+
+
+
+gettext
is designed to minimize the impact of
+internationalization on program sources, keeping this impact as small
+and hardly noticeable as possible. Internationalization has better
+chances of succeeding if it is very light weighted, or at least,
+appear to be so, when looking at program sources.
+
+gettext
distribution
+as a vehicle for documenting its structure and methods. This goes
+beyond the strict technicalities of documenting the GNU gettext
+proper. By so doing, translators will find in a single place, as
+far as possible, all they need to know for properly doing their
+translating work. Also, this supplemental documentation might also
+help programmers, and even curious users, in understanding how GNU
+gettext
is related to the remainder of the Translation
+Project, and consequently, have a glimpse at the big picture.
+
+1.2 I18n, L10n, and Such
+
+gettext
offers one of these standards. See section 9 The Programmer's View.
+
+1.3 Aspects in Native Language Support
+
+
+
+
+gettext
offers a complete toolset for
+translating messages output by C programs. Perl scripts and shell
+scripts will also need to be translated. Even if there are today some hooks
+by which this can be done, these hooks are not integrated as well as they
+should be.
+
+autoconf
or bison
, are able
+to produce other programs (or scripts). Even if the generating
+programs themselves are internationalized, the generated programs they
+produce may need internationalization on their own, and this indirect
+internationalization could be automated right from the generating
+program. In fact, quite usually, generating and generated programs
+could be internationalized independently, as the effort needed is
+fairly orthogonal.
+
+recode
program is able to reconstruct at execution.
+Since these descriptions are extracted from the RFC by mechanical means,
+translating them properly would require a prior translation of the RFC
+itself.
+
+gcc
to allow diacriticized characters in identifiers or use
+translated keywords; `rm -i' might accept something else than
+`y' or `n' for replies, etc. Even if the program will
+eventually make most of its output in the foreign languages, one has
+to decide whether the input syntax, option values, etc., are to be
+localized or not.
+
+libc
. There
+are many attributes that are needed to define a country's cultural
+conventions. These attributes include beside the country's native
+language, the formatting of the date and time, the representation of
+numbers, the symbols for currency, etc. These local rules are
+termed the country's locale. The locale represents the knowledge
+needed to support the country's native attributes.
+
+libc
manual for details.
+
+
+
+
+
+
+12,345.67 English
+12.345,67 French
+1,2345.67 Asia
+
+
+Some programs could go further and use different unit systems, like
+English units or Metric units, or even take into account variants
+about how numbers are spelled in full.
+
+gettext
provides the means for developers and users to
+easily change the language that the software uses to communicate to
+the user.
+
+libc
+fully implements this, and most other modern systems provide a more
+or less reasonable support for at least some of the missing components.
+
+1.4 Files Conveying Translations
+
+xgettext
program, and later updated or refreshed through
+the msgmerge
program. Program xgettext
extracts all
+marked messages from a set of C files and initializes a PO file with
+empty translations. Program msgmerge
takes care of adjusting
+PO files between releases of the corresponding sources, commenting
+obsolete entries, initializing new ones, and updating all source
+line references. Files ending with `.pot' are kind of base
+translation files found in distributions, in PO file format, and
+`.pox' files are often temporary PO files.
+
+gettext
. Therefore GNU
+gettext
uses its own format for MO files. Files ending with
+`.gmo' are really MO files, when it is known that these files use
+the GNU format.
+
+1.5 Overview of GNU
+
+gettext
gettext
and the tools acting on these files.
+It is followed by a somewhat detailed explanations, which you should
+read while keeping an eye on the diagram. Having a clear understanding
+of these interrelations would surely help programmers, translators
+and maintainers.
+
+
+Original C Sources ---> PO mode ---> Marked C Sources ---.
+ |
+ .---------<--- GNU gettext Library |
+.--- make <---+ |
+| `---------<--------------------+-----------'
+| |
+| .-----<--- PACKAGE.pot <--- xgettext <---' .---<--- PO Compendium
+| | | ^
+| | `---. |
+| `---. +---> PO mode ---.
+| +----> msgmerge ------> LANG.pox --->--------' |
+| .---' |
+| | |
+| `-------------<---------------. |
+| +--- LANG.po <--- New LANG.pox <----'
+| .--- LANG.gmo <--- msgfmt <---'
+| |
+| `---> install ---> /.../LANG/PACKAGE.mo ---.
+| +---> "Hello world!"
+`-------> install ---> /.../bin/PROGRAM -------'
+
+
+gettext
+into your package is identifying, right in the C sources, those strings
+which are meant to be translatable, and those which are untranslatable.
+This tedious job can be done a little more comfortably using emacs PO
+mode, but you can use any means familiar to you for modifying your
+C sources. Beside this some other simple, standard changes are needed to
+properly initialize the translation library. See section 3 Preparing Program Sources, for
+more information about all this.
+
+gettext
approach makes this
+very easy. Simply put the following lines at the beginning of each file
+or in a central header file:
+
+
+#define _(String) (String)
+#define N_(String) (String)
+#define textdomain(Domain)
+#define bindtextdomain(Package, Directory)
+
+
+gettext
library
+simply replace these definitions by the following:
+
+
+#include <libintl.h>
+#define _(String) gettext (String)
+#define gettext_noop(String) (String)
+#define N_(String) gettext_noop (String)
+
+
+libintl
because the
+gettext
library functions are already contained in GNU libc.
+That is all you have to change.
+
+xgettext
program
+is used to find and extract all translatable strings, and create a
+PO template file out of all these. This `package.pot' file
+contains all original program strings. It has sets of pointers to
+exactly where in C sources each string is used. All translations
+are set to empty. The letter t in `.pot' marks this as
+a Template PO file, not yet oriented towards any particular language.
+See section 4.1 Invoking the xgettext
Program, for more details about how one calls the
+xgettext
program. If you are really lazy, you might
+be interested at working a lot more right away, and preparing the
+whole distribution setup (see section 11 The Maintainer's View). By doing so, you
+spare yourself typing the xgettext
command, as make
+should now generate the proper things automatically for you!
+
+msgmerge
step may be skipped and replaced by a mere copy of
+`package.pot' to `lang.pox', where lang
+represents the target language.
+
+xgettext
would construct `package.pot' files which are
+evolving over time, so the translations carried by `lang.po'
+are slowly fading out of date.
+
+msgmerge
program has the purpose of refreshing an already
+existing `lang.po' file, by comparing it with a newer
+`package.pot' template file, extracted by xgettext
+out of recent C sources. The refreshing operation adjusts all
+references to C source locations for strings, since these strings
+move as programs are modified. Also, msgmerge
comments out as
+obsolete, in `lang.pox', those already translated entries
+which are no longer used in the program sources (see section 6.5 Obsolete Entries). It finally discovers new strings and inserts them in
+the resulting PO file as untranslated entries (see section 6.4 Untranslated Entries). See section 6.1 Invoking the msgmerge
Program, for more information about what
+msgmerge
really does.
+
+msgfmt
program
+is used for turning the PO file into a machine-oriented format, which
+may yield efficient retrieval of translations by the programs of the
+package, whenever needed at runtime (see section 7.2 The Format of GNU MO Files). See section 7.1 Invoking the msgfmt
Program, for more information about all modalities of execution
+for the msgfmt
program.
+
+gettext
library, usually through the operation of
+make
, given a suitable `Makefile' exists for the project,
+and the resulting executable is installed somewhere users will find it.
+The MO files themselves should also be properly installed. Given the
+appropriate environment variables are set (see section 8.3 Magic for End Users), the
+program should localize itself automatically, whenever it executes.
+
+10 The Translator's View
+
+
+
+10.1 Introduction 0
+
+gettext
tool set contains everything maintainers
+need for internationalizing their packages for messages. It also
+contains quite useful tools for helping translators at localizing
+messages to their native language, once a package has already been
+internationalized.
+
+
+
+
+zh
, Czech cs
, Danish da
, Dutch nl
,
+Esperanto eo
, Finnish fi
, French fr
, Irish
+ga
, German de
, Greek el
, Italian it
,
+Japanese ja
, Indonesian in
, Norwegian no
, Polish
+pl
, Portuguese pt
, Russian ru
, Spanish es
,
+Swedish sv
and Turkish tr
.
+
+subscribe
+
+
+10.2 Introduction 1
+
+
+
+
+10.3 Discussions
+
+
+
+
+
+
+gettext
necessarily brings their package
+under the protective wing of the GNU General Public License, when they
+do not want to make their program free, or want other kinds of freedom.
+The simplest answer is yes.
+
+The mere marking of localizable strings in a package, or conditional
+inclusion of a few lines for initialization, is not really including
+GPL'ed code. However, the localization routines themselves are under
+the GPL and would bring the remainder of the package under the GPL
+if they were distributed with it. So, I presume that, for those
+for which this is a problem, it could be circumvented by letting to
+the end installers the burden of assembling a package prepared for
+localization, but not providing the localization routines themselves.
+
+10.4 Organization
+
+gettext
becomes an official reality. The e-mail address
+`translation@iro.umontreal.ca' has been setup for receiving
+offers from volunteers and general e-mail on these topics. This address
+reaches the Translation Project coordinator.
+
+10.4.1 Central Coordination
+
+10.4.2 National Teams
+
+10.4.2.1 Sub-Cultures
+
+gettext
become officially published. And I suspect that this
+means soon!
+
+10.4.2.2 Organizational Ideas
+
+
+
+
+
+
+10.4.3 Mailing Lists
+
+gettext
, send them on to:
+
+
+`translation@iro.umontreal.ca'
+
+
+majordomo
. These lists have been very dependable
+so far...
+
+10.5 Information Flow
+
+11 The Maintainer's View
+
+gettext
+might be integrated in a distribution, and this chapter does not cover
+them in all generality. Instead, it details one possible approach which
+is especially adequate for many free software distributions following GNU
+standards, or even better, Gnits standards, because GNU gettext
+is purposely for helping the internationalization of the whole GNU
+project, and as many other good free packages as possible. So, the
+maintainer's view presented here presumes that the package already has
+a `configure.in' file and uses GNU Autoconf.
+
+gettext
may surely be useful for free packages
+not following GNU standards and conventions, but the maintainers of such
+packages might have to show imagination and initiative in organizing
+their distributions so gettext
work for them in all situations.
+There are surely many, out there.
+
+gettext
methods are now stabilizing, slight adjustments
+might be needed between successive gettext
versions, so you
+should ideally revise this chapter in subsequent releases, looking
+for changes.
+
+11.1 Flat or Non-Flat Directory Structures
+
+tar
files which unpack
+in a single directory, these are said to be flat distributions.
+Other free software packages have a one level hierarchy of subdirectories, using
+for example a subdirectory named `doc/' for the Texinfo manual and
+man pages, another called `lib/' for holding functions meant to
+replace or complement C libraries, and a subdirectory `src/' for
+holding the proper sources for the package. These other distributions
+are said to be non-flat.
+
+gettext
. Also, if you have
+many PO files, this could somewhat pollute your single directory.
+Also, GNU gettext
's libintl sources consist of C sources, shell
+scripts, sed
scripts and complicated Makefile rules, which don't
+fit well into an existing flat structure. For these reasons, we
+recommend to use non-flat approach in this case as well.
+
+gettext
itself has a non-flat structure,
+we have more experience with this approach, and this is what will be
+described in the remaining of this chapter. Some maintainers might
+use this as an opportunity to unflatten their package structure.
+
+11.2 Prerequisite Works
+
+gettext
+in one of your package. These works have some kind of generality
+that escape the point by point descriptions used in the remainder
+of this chapter. So, we describe them here.
+
+
+
+
+gettextize
you should install some
+other packages first.
+Ensure that recent versions of GNU m4
, GNU Autoconf and GNU
+gettext
are already installed at your site, and if not, proceed
+to do this first. If you got to install these things, beware that
+GNU m4
must be fully installed before GNU Autoconf is even
+configured.
+
+To further ease the task of a package maintainer the automake
+package was designed and implemented. GNU gettext
now uses this
+tool and the `Makefile's in the `intl/' and `po/'
+therefore know about all the goals necessary for using automake
+and `libintl' in one project.
+
+Those four packages are only needed to you, as a maintainer; the
+installers of your own package and end users do not really need any of
+GNU m4
, GNU Autoconf, GNU gettext
, or GNU automake
+for successfully installing and running your package, with messages
+properly translated. But this is not completely true if you provide
+internationalized shell scripts within your own package: GNU
+gettext
shall then be installed at the user site if the end users
+want to see the translation of shell script messages.
+
+11.3 Invoking the
+
+gettextize
Programgettext
. As a matter of
+convenience, the gettextize
program puts all these files right
+in your package. This program has the following synopsis:
+
+
+gettextize [ option... ] [ directory ]
+
+
+
+
+
+
+gettext
code
+available on the system, but it might disturb some mechanism the
+maintainer is used to apply to the sources. Because running
+gettextize
is easy there shouldn't be problems with using copies.
+
+gettext
. If not given, it
+is assumed that the current directory is the top level directory of
+such a package.
+
+gettextize
provides the following files. However,
+no existing file will be replaced unless the option --force
+(-f
) is specified.
+
+
+
+
+gettextize
,
+if you have one handy. You may also fetch a more recent copy of file
+`ABOUT-NLS' from Translation Project sites, and from most GNU
+archive sites.
+
+gettext
distribution.
+(beware the double `.in' in the file name). If the `po/'
+directory already exists, it will be preserved along with the files
+it contains, and only `Makefile.in.in' will be overwritten.
+
+gettext
+distribution. Also, if option --force
(-f
) is given,
+the `intl/' directory is emptied first.
+
+gettextize
will not
+actually copy the files into your package, but establish symbolic
+links instead. This avoids duplicating the disk space needed in
+all packages. Merely using the `-h' option while creating the
+tar
archive of your distribution will resolve each link by an
+actual copy in the distribution archive. So, to insist, you really
+should use `-h' option with tar
within your dist
+goal of your main `Makefile.in'.
+
+gettext
facilities in one package go in `intl/'
+and `po/' subdirectories. One distinction between these two
+directories is that `intl/' is meant to be completely identical
+in all packages using GNU gettext
, while all newly created
+files, which have to be different, go into `po/'. There is a
+common `Makefile.in.in' in `po/', because the `po/'
+directory needs its own `Makefile', and it has been designed so
+it can be identical in all packages.
+
+11.4 Files You Must Create or Alter
+
+gettextize
,
+there are many files needing revision for properly interacting with
+GNU gettext
. If you are closely following GNU standards for
+Makefile engineering and auto-configuration, the adaptations should
+be easier to achieve. Here is a point by point description of the
+changes needed in each.
+
+gettext
0.10.37 distribution itself. You may indeed
+refer to the source code of the GNU gettext
package, as it
+is intended to be a good example and master implementation for using
+its own functionality.
+
+11.4.1 `POTFILES.in' in `po/'
+
+
+# List of source files containing translatable strings.
+# Copyright (C) 1995 Free Software Foundation, Inc.
+
+# Common library files
+lib/error.c
+lib/getopt.c
+lib/xmalloc.c
+
+# Package source files
+src/gettext.c
+src/msgfmt.c
+src/xgettext.c
+
+
+11.4.2 `configure.in' at top level
+
+
+
+
+
+
+
+
+PACKAGE=gettext
+VERSION=0.10.37
+AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE")
+AC_DEFINE_UNQUOTED(VERSION, "$VERSION")
+AC_SUBST(PACKAGE)
+AC_SUBST(VERSION)
+
+
+Of course, you replace `gettext' with the name of your package,
+and `0.10.37' by its version numbers, exactly as they
+should appear in the packaged tar
file name of your distribution
+(`gettext-0.10.37.tar.gz', here).
+
+ALL_LINGUAS
to the white separated,
+quoted list of available languages, in a single line, like this:
+
+
+
+ALL_LINGUAS="de fr"
+
+
+This example means that German and French PO files are available, so
+that these languages are currently supported by your package. If you
+want to further restrict, at installation time, the set of installed
+languages, this should not be done by modifying ALL_LINGUAS
in
+`configure.in', but rather by using the LINGUAS
environment
+variable (see section 8.2 Magic for Installers).
+
+m4
macro for triggering internationalization
+support. Just add this line to `configure.in':
+
+
+
+AM_GNU_GETTEXT
+
+
+This call is purposely simple, even if it generates a lot of configure
+time checking and actions.
+
+AC_OUTPUT
directive, at the end of your `configure.in'
+file, needs to be modified in two ways:
+
+
+
+AC_OUTPUT([existing configuration files intl/Makefile po/Makefile.in],
+existing additional actions])
+
+
+The modification to the first argument to AC_OUTPUT
asks
+for substitution in the `intl/' and `po/' directories.
+Note the `.in' suffix used for `po/' only. This is because
+the distributed file is really `po/Makefile.in.in'.
+
+11.4.3 `config.guess', `config.sub' at top level
+
+automake
and
+GNU libtool
packages.
+
+
+AC_CONFIG_AUX_DIR([subdir])
+
+
+
+
+11.4.4 `aclocal.m4' at top level
+
+gettext
's
+`m4/' directory into a single file.
+
+gettext
, you
+should most probably replace the macros (AM_GNU_GETTEXT
,
+AM_WITH_NLS
, etc.), as they usually
+change a little from one release of GNU gettext
to the next.
+Their contents may vary as we get more experience with strange systems
+out there.
+
+m4
code will be the same for all projects using GNU
+gettext
.
+
+11.4.5 `acconfig.h' at top level
+
+gettext
releases required to put definitions for
+ENABLE_NLS
, HAVE_GETTEXT
and HAVE_LC_MESSAGES
,
+HAVE_STPCPY
, PACKAGE
and VERSION
into an
+`acconfig.h' file. This is not needed any more; you can remove
+them from your `acconfig.h' file unless your package uses them
+independently from the `intl/' directory.
+
+11.4.6 `Makefile.in' at top level
+
+
+
+
+
+
+
+PACKAGE = @PACKAGE@
+VERSION = @VERSION@
+
+
+DISTFILES
definition, so the file gets
+distributed.
+
+SUBDIRS
in Makefile.in
for it
+to be further used in the `dist:' goal.
+
+
+
+SUBDIRS = doc intl lib src @POSUB@
+
+
+Note that you must arrange for `make' to descend into the
+intl
directory before descending into other directories containing
+code which make use of the libintl.h
header file. For this
+reason, here we mention intl
before lib
and src
.
+
+that you will have to adapt to your own package.
+
+
+distdir = $(PACKAGE)-$(VERSION)
+dist: Makefile
+ rm -fr $(distdir)
+ mkdir $(distdir)
+ chmod 777 $(distdir)
+ for file in $(DISTFILES); do \
+ ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \
+ done
+ for subdir in $(SUBDIRS); do \
+ mkdir $(distdir)/$$subdir || exit 1; \
+ chmod 777 $(distdir)/$$subdir; \
+ (cd $$subdir && $(MAKE) $@) || exit 1; \
+ done
+ tar chozf $(distdir).tar.gz $(distdir)
+ rm -fr $(distdir)
+
+
+11.4.7 `Makefile.in' in `src/'
+
+
+
+
+
+PACKAGE = @PACKAGE@
+VERSION = @VERSION@
+
+
+top_srcdir
+gets defined. This will serve for cpp
include files. Just add
+the line:
+
+
+
+top_srcdir = @top_srcdir@
+
+
+subdir
as `src', later
+allowing for almost uniform `dist:' goals in all your
+`Makefile.in'. At list, the `dist:' goal below assume that
+you used:
+
+
+
+subdir = src
+
+
+main
function of your program will normally call
+bindtextdomain
(see see section 3.1 Triggering gettext
Operations), like this:
+
+
+
+bindtextdomain (PACKAGE, LOCALEDIR);
+
+
+To make LOCALEDIR known to the program, add the following lines to
+Makefile.in:
+
+
+
+datadir = @datadir@
+localedir = $(datadir)/locale
+DEFS = -DLOCALEDIR=\"$(localedir)\" @DEFS@
+
+
+Note that @datadir@
defaults to `$(prefix)/share', thus
+$(localedir)
defaults to `$(prefix)/share/locale'.
+
+@INTLLIBS@
as
+a library. An easy way to achieve this is to manage that it gets into
+LIBS
, like this:
+
+
+
+LIBS = @INTLLIBS@ @LIBS@
+
+
+In most packages internationalized with GNU gettext
, one will
+find a directory `lib/' in which a library containing some helper
+functions will be build. (You need at least the few functions which the
+GNU gettext
Library itself needs.) However some of the functions
+in the `lib/' also give messages to the user which of course should be
+translated, too. Taking care of this it is not enough to place the support
+library (say `libsupport.a') just between the @INTLLIBS@
+and @LIBS@
in the above example. Instead one has to write this:
+
+
+
+LIBS = ../lib/libsupport.a @INTLLIBS@ ../lib/libsupport.a @LIBS@
+
+
+
+distdir = ../$(PACKAGE)-$(VERSION)/$(subdir)
+dist: Makefile $(DISTFILES)
+ for file in $(DISTFILES); do \
+ ln $$file $(distdir) 2>/dev/null || cp -p $$file $(distdir); \
+ done
+
+
+12 Concluding Remarks
+
+gettext
manual by presenting
+an history of the Translation Project so far. We finally give
+a few pointers for those who want to do further research or readings
+about Native Language Support matters.
+
+12.1 History of GNU
+
+gettext
libc
, maybe around the incoming Hurd
, or otherwise
+(nobody clearly remembers). And even then, when the work started for
+real, this was somewhat independently of these previous discussions.
+
+fileutils
.
+He then asked Jim Meyering, the maintainer, how to get those changes
+folded into an official release. That first draft was full of
+#ifdef
s and somewhat disconcerting, and Jim wanted to find
+nicer ways. Patrick and Jim shared some tries and experimentations
+in this area. Then, feeling that this might eventually have a deeper
+impact on GNU, Jim wanted to know what standards were, and contacted
+Richard Stallman, who very quickly and verbally described an overall
+design for what was meant to become glocale
, at that time.
+
+glocale
and got a lot of exhausting feedback
+from Patrick and Richard, of course, but also from Mitchum DSouza
+(who wrote a catgets
-like package), Roland McGrath, maybe David
+MacKenzie, Fran@,{c}ois Pinard, and Paul Eggert, all pushing and
+pulling in various directions, not always compatible, to the extent
+that after a couple of test releases, glocale
was torn apart.
+
+libc
internationalized, and
+got Ulrich Drepper involved in that project. Instead of starting
+from glocale
, Ulrich rewrote something from scratch, but
+more conformant to the set of guidelines who emerged out of the
+glocale
effort. Then, Ulrich got people from the previous
+forum to involve themselves into this new project, and the switch
+from glocale
to what was first named msgutils
, renamed
+nlsutils
, and later gettext
, became officially accepted
+by Richard in May 1995 or so.
+
+gettext
+in April 1995. The first official release of the package, including
+PO mode, occurred in July 1995, and was numbered 0.7. Other people
+contributed to the effort by providing a discussion forum around
+Ulrich, writing little pieces of code, or testing. These are quoted
+in the THANKS
file which comes with the GNU gettext
+distribution.
+
+glocale
first, then later to gettext
,
+putting them in pretest, so providing along the way an effective
+user environment for fine tuning the evolving tools. He also took
+the responsibility of organizing and coordinating the Translation
+Project. After nearly a year of informal exchanges between people from
+many countries, translator teams started to exist in May 1995, through
+the creation and support by Patrick D'Cruze of twenty unmoderated
+mailing lists for that many native languages, and two moderated
+lists: one for reaching all teams at once, the other for reaching
+all willing maintainers of internationalized free software packages.
+
+gettext
Texinfo manual.
+
+12.2 Related Readings
+
+
+ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/i18n-books.txt
+
+
+
+ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-programming
+
+
+
+ftp://sunsite.unc.edu/pub/Linux/utils/nls/catalogs/Incoming/...
+ ...locale-tutorial-0.8.txt.gz
+
+
+
+ftp://ftp.ibp.fr/pub/linux/sunsite/
+
+
+
+ftp://ftp.ibp.fr/pub/linux/french/docs/
+
+
+A Language Codes
+
+
+
+
+
+B Country Codes
+
+
+
+
+
+2 PO Files and PO Mode Basics
+
+gettext
toolset helps programmers and translators
+at producing, updating and using translation files, mainly those
+PO files which are textual, editable files. This chapter stresses
+the format of PO files, and contains a PO mode starter. PO mode
+description is spread throughout this manual instead of being concentrated
+in one place. Here we present only the basics of PO mode.
+
+2.1 Completing GNU
+
+gettext
Installationgettext
distribution, the `make install' command puts in
+place the programs xgettext
, msgfmt
, gettext
, and
+msgmerge
, as well as their available message catalogs. To
+top off a comfortable installation, you might also want to make the
+PO mode available to your Emacs users.
+
+
+(setq auto-mode-alist
+ (cons '("\\.po[tx]?\\'\\|\\.po\\." . po-mode) auto-mode-alist))
+(autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t)
+
+
+
+(modify-coding-system-alist 'file "\\.po[tx]?\\'\\|\\.po\\."
+ 'po-find-file-coding-system)
+(autoload 'po-find-file-coding-system "po-mode")
+
+
+2.2 The Format of PO Files
+
+
+white-space
+# translator-comments
+#. automatic-comments
+#: reference...
+#, flag...
+msgid untranslated-string
+msgstr translated-string
+
+
+gettext
tools, there is exactly one blank line
+between entries. Then comments follow, on lines all starting with the
+character #. There are two kinds of comments: those which have
+some white space immediately following the #, which comments are
+created and maintained exclusively by the translator, and those which
+have some non-white character just after the #, which comments
+are created and maintained automatically by GNU gettext
tools.
+All comments, of either kind, are optional.
+
+msgid
, and the translation,
+by msgstr
. The two strings, untranslated and translated,
+are quoted in various ways in the PO file, using "
+delimiters and \ escapes, but the translator does not really
+have to pay attention to the precise quoting format, as PO mode fully
+takes care of quoting for her.
+
+msgid
strings, as well as automatic comments, are produced
+and managed by other GNU gettext
tools, and PO mode does not
+provide means for the translator to alter these. The most she can
+do is merely deleting them, and only by deleting the whole entry.
+On the other hand, the msgstr
string, as well as translator
+comments, are really meant for the translator, and PO mode gives her
+the full control she needs.
+
+msgfmt
+program to give the user some better diagnostic messages. Currently
+there are two forms of flags defined:
+
+
+
+
+
+msgmerge
program or it can be
+inserted by the translator herself. It shows that the msgstr
+string might not be a correct translation (anymore). Only the translator
+can judge if the translation requires further modification, or is
+acceptable as is. Once satisfied with the translation, she then removes
+this fuzzy attribute. The msgmerge
program inserts this
+when it combined the msgid
and msgstr
entries after fuzzy
+search only. See section 6.3 Fuzzy Entries.
+
+xgettext
program adds them. In an automatized PO file processing
+system as proposed here the user changes would be thrown away again as
+soon as the xgettext
program generates a new template file.
+
+In case the c-format flag is given for a string the msgfmt
+does some more tests to check to validity of the translation.
+See section 7.1 Invoking the msgfmt
Program.
+
+
+white-space
+# translator-comments
+#. automatic-comments
+#: reference...
+#, flag...
+msgid untranslated-string-singular
+msgid_plural untranslated-string-plural
+msgstr[0] translated-string-case-0
+...
+msgstr[N] translated-string-case-n
+
+
+
+msgid ""
+"Here is an example of how one might continue a very long string\n"
+"for the common case the string represents multi-line output.\n"
+
+
+msgid
keyword is followed by three strings, which are meant
+to be concatenated. Concatenating the empty string does not change
+the resulting overall string, but it is a way for us to comply with
+the necessity of msgid
to be followed by a string on the same
+line, while keeping the multi-line presentation left-justified, as
+we find this to be a cleaner disposition. The empty string could have
+been omitted, but only if the string starting with `Here' was
+promoted on the first line, right after msgid
.(2) It was not really necessary
+either to switch between the two last quoted strings immediately after
+the newline `\n', the switch could have occurred after any
+other character, we just did it this way because it is neater.
+
+msgmerge
.
+
+2.3 Main PO mode Commands
+
+gettext
Installation, PO mode is activated for a window when Emacs finds a
+PO file in that window. This puts the window read-only and establishes a
+po-mode-map, which is a genuine Emacs mode, in a way that is not derived
+from text mode in any way. Functions found on po-mode-hook
,
+if any, will be executed.
+
+
+
+
+
+po-undo
) interfaces to the Emacs
+undo facility. See section `Undoing Changes' in The Emacs Editor. Each time U is typed, modifications which the translator
+did to the PO file are undone a little more. For the purpose of
+undoing, each PO mode command is atomic. This is especially true for
+the RET command: the whole edition made by using a single
+use of this command is undone at once, even if the edition itself
+implied several actions. However, while in the editing window, one
+can undo the edition work quite parsimoniously.
+
+po-quit
) and q
+(po-confirm-and-quit
) are used when the translator is done with the
+PO file. The former is a bit less verbose than the latter. If the file
+has been modified, it is saved to disk first. In both cases, and prior to
+all this, the commands check if some untranslated message remains in the
+PO file and, if yes, the translator is asked if she really wants to leave
+off working with this PO file. This is the preferred way of getting rid
+of an Emacs PO file buffer. Merely killing it through the usual command
+C-x k (kill-buffer
) is not the tidiest way to proceed.
+
+po-other-window
) is another, softer way,
+to leave PO mode, temporarily. It just moves the cursor to some other
+Emacs window, and pops one if necessary. For example, if the translator
+just got PO mode to show some source context in some other, she might
+discover some apparent bug in the program source that needs correction.
+This command allows the translator to change sex, become a programmer,
+and have the cursor right into the window containing the program she
+(or rather he) wants to modify. By later getting the cursor back
+in the PO file window, or by asking Emacs to edit this file once again,
+PO mode is then recovered.
+
+po-help
) displays a summary of all available PO
+mode commands. The translator should then type any character to resume
+normal PO mode operations. The command ? has the same effect
+as h.
+
+po-statistics
) computes the total number of
+entries in the PO file, the ordinal of the current entry (counted from
+1), the number of untranslated entries, the number of obsolete entries,
+and displays all these numbers.
+
+po-validate
) launches msgfmt
in verbose
+mode over the current PO file. This command first offers to save the
+current PO file on disk. The msgfmt
tool, from GNU gettext
,
+has the purpose of creating a MO file out of a PO file, and PO mode uses
+the features of this program for checking the overall format of a PO file,
+as well as all individual entries.
+
+msgfmt
runs asynchronously with Emacs, so the
+translator regains control immediately while her PO file is being studied.
+Error output is collected in the Emacs `*compilation*' buffer,
+displayed in another window. The regular Emacs command C-x`
+(next-error
), as well as other usual compile commands, allow the
+translator to reposition quickly to the offending parts of the PO file.
+Once the cursor is on the line in error, the translator may decide on
+any PO mode action which would help correcting the error.
+
+2.4 Entry Positioning
+
+
+
+
+
+po-current-entry
) has the sole purpose of redisplaying the
+current entry properly, after the current entry has been changed by
+means external to PO mode, or the Emacs screen otherwise altered.
+
+po-next-entry
) and p
+(po-previous-entry
) move the cursor the entry following,
+or preceding, the current one. If n is given while the
+cursor is on the last entry of the PO file, or if p
+is given while the cursor is on the first entry, no move is done.
+
+po-first-entry
) and >
+(po-last-entry
) move the cursor to the first entry, or last
+entry, of the PO file. When the cursor is located past the last
+entry in a PO file, most PO mode commands will return an error saying
+`After last entry'. Moreover, the commands < and >
+have the special property of being able to work even when the cursor
+is not into some PO file entry, and one may use them for nicely
+correcting this situation. But even these commands will fail on a
+truly empty PO file. There are development plans for the PO mode for it
+to interactively fill an empty PO file from sources. See section 3.3 Marking Translatable Strings.
+
+po-push-location
)
+merely adds the location of current entry to the stack, pushing
+the already saved locations under the new one. The command
+r (po-pop-location
) consumes the top stack element and
+repositions the cursor to the entry associated with that top element.
+This position is then lost, for the next r will move the cursor
+to the previously saved location, and so on until no locations remain
+on the stack.
+
+po-exchange-location
) simultaneously
+repositions the cursor to the entry associated with the top element of
+the stack of saved locations, and replaces that top element with the
+location of the current entry before the move. Consequently, repeating
+the x command toggles alternatively between two entries.
+For achieving this, the translator will position the cursor on the
+first entry, use m, then position to the second entry, and
+merely use x for making the switch.
+
+2.5 Normalizing Strings in Entries
+
+msgid
field of some entry.
+Even if PO mode has internally all the built-in machinery for
+implementing this recognition easily, doing it fast is technically
+difficult. To facilitate a solution to this efficiency problem,
+we decided on a canonical representation for strings.
+
+xgettext
and PO mode converging towards a uniform
+way of representing equivalent strings would be useful, as the internal
+normalization needed by PO mode could be automatically satisfied
+when using xgettext
from GNU gettext
. An explicit
+PO mode normalization should then be only necessary for PO files
+imported from elsewhere, or for when the convention itself evolves.
+
+
+
+
+
+msgid
string lookup for some other PO mode commands.
+
+gettext
0.6 and earlier, in which msgid
and msgstr
+fields were using K&R style C string syntax for multi-line strings.
+These heuristics may fail for comments not related to obsolete
+entries and ending with a backslash; they also depend on subsequent
+passes for finalizing the proper commenting of continued lines for
+obsolete entries. This first pass might disappear once all oldish PO
+files would have been adjusted. The second and third pass normalize
+all msgid
and msgstr
strings respectively. They also
+clean out those trailing backslashes used by XView's msgfmt
+for continued lines.
+
+gettext
tools
+should greatly automate conformance. A description of the canonical
+string format is given below, for the particular benefit of those not
+having Emacs handy, and who would nevertheless want to handcraft
+their PO files in nice ways.
+
+
+msgstr "\n\nHello, world!\n\n\n"
+
+
+
+msgstr ""
+"\n"
+"\n"
+"Hello,\n"
+"world!\n"
+"\n"
+"\n"
+
+
+
+msgstr "\n\n"
+"Hello,\n"
+"world!\n"
+"\n\n"
+
+
+3 Preparing Program Sources
+
+gettext
when the program
+initializes, usually from the main
function. Last, you should
+identify and especially mark all constant strings in your program
+needing translation.
+
+gettext
files are available, and your
+`Makefile' files are adjusted (see section 11 The Maintainer's View), each C module
+having translated C strings should contain the line:
+
+
+#include <libintl.h>
+
+
+3.1 Triggering
+
+gettext
Operations
+int
+main (argc, argv)
+ int argc;
+ char argv;
+{
+ ...
+ setlocale (LC_ALL, "");
+ bindtextdomain (PACKAGE, LOCALEDIR);
+ textdomain (PACKAGE);
+ ...
+}
+
+
+gettext
+sources for more information.
+
+LC_ALL
might not be appropriate for you.
+LC_ALL
includes all locale categories and especially
+LC_CTYPE
. This later category is responsible for determining
+character classes with the isalnum
etc. functions from
+`ctype.h' which could especially for programs, which process some
+kind of input language, be wrong. For example this would mean that a
+source code using the @,{c} (c-cedilla character) is runnable in
+France but not in the U.S.
+
+scanf
functions if an other but the LC_ALL
locale is used.
+The standards say that additional formats but the one known in the
+"C"
locale might be recognized. But some systems seem to reject
+numbers in the "C"
locale format. In some situation, it might
+also be a problem with the notation itself which makes it impossible to
+recognize whether the number is in the "C"
locale or the local
+format. This can happen if thousands separator characters are used.
+Some locales define this character accordfing to the national
+conventions to '.'
which is the same character used in the
+"C"
locale to denote the decimal point.
+
+LC_ALL
line in the
+code above by a sequence of setlocale
lines
+
+
+{
+ ...
+ setlocale (LC_CTYPE, "");
+ setlocale (LC_MESSAGES, "");
+ ...
+}
+
+
+LC_CTYPE
,
+LC_COLLATE
, LC_MONETARY
, LC_NUMERIC
, and
+LC_TIME
are available. On some modern systems there is also a
+locale LC_MESSAGES
which is called on some old, XPG2 compliant
+systems LC_RESPONSES
.
+
+LC_CTYPE
also affects the functions
+declared in the <ctype.h>
standard header. If this is not
+desirable in your application (for example in a compiler's parser),
+you can use a set of substitute functions which hardwire the C locale,
+such as found in the <c-ctype.h>
and <c-ctype.c>
files
+in the gettext source distribution.
+
+setlocale
call is expensive,
+because it is tedious to determine the places where a locale switch
+is needed in a large program's source, and because switching a locale
+is not multithread-safe.
+
+3.2 How Marks Appear in Sources
+
+sprintf
may have so many
+different instances that it is impractical to list them all in some
+`error_string_out()' routine, say.
+
+xgettext
+at properly extracting all translatable strings when it scans a set
+of program sources and produces PO file templates.
+
+gettext
+package. For packages making only light use of the `gettext'
+keyword, macro or function, it is easily used as is. However,
+for packages using the gettext
interface more heavily, it
+is usually more convenient to give the main keyword a shorter, less
+obtrusive name. Indeed, the keyword might appear on a lot of strings
+all over the package, and programmers usually do not want nor need
+their program sources to remind them forcefully, all the time, that they
+are internationalized. Further, a long keyword has the disadvantage
+of using more horizontal space, forcing more indentation work on
+sources for those trying to keep them within 79 or 80 columns.
+
+gettext
uses this convention internally,
+it does not offer it officially. The real, genuine keyword is truly
+`gettext' indeed. It is fairly easy for those wanting to use
+`_' instead of `gettext' to declare:
+
+
+#include <libintl.h>
+#define _(String) gettext (String)
+
+
+3.3 Marking Translatable Strings
+
+
+etags src/*.[hc] lib/*.[hc]
+
+
+tags
or TAGS
which constructs the tag files in
+all directories and for all files containing source code.
+
+
+
+
+
+po-tags-search
) command searches for the next
+occurrence of a string which looks like a possible candidate for
+translation, and displays the program source in another Emacs window,
+positioned in such a way that the string is near the top of this other
+window. If the string is too big to fit whole in this window, it is
+positioned so only its end is shown. In any case, the cursor
+is left in the PO file window. If the shown string would be better
+presented differently in different native languages, you may mark it
+using M-, or M-.. Otherwise, you might rather ignore it
+and skip to the next string by merely repeating the , command.
+
+tags-search
or
+tags-query-replace
commands may be used without disrupting the
+independent , search sequence. However, as implemented, the
+initial , command (or the , command is used with a
+prefix) might also reinitialize the regular Emacs tags searching to the
+first tags file, this reinitialization might be considered spurious.
+
+po-mark-translatable
) command will mark the
+recently found string with the `_' keyword. The M-.
+(po-select-mark-and-mark
) command will request that you type
+one keyword from the minibuffer and use that keyword for marking
+the string. Both commands will automatically create a new PO file
+untranslated entry for the string being marked, and make it the
+current entry (making it easy for you to immediately proceed to its
+translation, if you feel like doing it right away). It is possible
+that the modifications made to the program source by M-, or
+M-. render some source line longer than 80 columns, forcing you
+to break and re-indent this line differently. You may use the O
+command from PO mode, or any other window changing command from
+Emacs, to break out into the program source window, and do any
+needed adjustments. You will have to use some regular Emacs command
+to return the cursor to the PO file window, if you want command
+, for the next string, say.
+
+3.4 Special Comments preceding Keywords
+
+printf
family. The special thing about these format strings is
+that they can contain format specifiers introduced with %. Assume
+we have the code
+
+
+printf (gettext ("String `%s' has %d characters\n"), s, strlen (s));
+
+
+
+"%d Zeichen lang ist die Zeichenkette `%s'"
+
+
+printf
don't have.
+This will most probably lead to problems because now the length of the
+string is regarded as the address.
+
+msgfmt
+tool can check statically whether the arguments in the original and the
+translation string match in type and number. If this is not the case a
+warning will be given and the error cannot causes problems at runtime.
+
+
+"%2$d Zeichen lang ist die Zeichenkette `%1$s'"
+
+
+msgfmt
know about this special notation.
+
+msgfmt
to test all the strings in the `.po' file.
+This might cause problems because the string might contain what looks
+like a format specifier, but the string is not used in printf
.
+
+xgettext
adds a special tag to those messages it
+thinks might be a format string. There is no absolute rule for this,
+only a heuristic. In the `.po' file the entry is marked using the
+c-format
flag in the #, comment line (see section 2.2 The Format of PO Files).
+
+xgettext
knows about special kind of comment which lets
+the programmer take over the decision. If in the same line or
+the immediately preceding line of the gettext
keyword
+the xgettext
program find a comment containing the words
+xgettext:c-format it will mark the string in any case with
+the c-format flag. This kind of comment should be used when
+xgettext
does not recognize the string as a format string but
+is really is one and it should be tested. Please note that when the
+comment is in the same line of the gettext
keyword, it must be
+before the string to be translated.
+
+printf
function is often
+called with strings which do not contain a format specifier. Of course
+one would normally use fputs
but it does happen. In this case
+xgettext
does not recognize this as a format string but what
+happens if the translation introduces a valid format specifier? The
+printf
function will try to access one of the parameter but none
+exists because the original code does not refer to any parameter.
+
+xgettext
of course could make a wrong decision the other way
+round, i.e. a string marked as a format string actually is not a format
+string. In this case the msgfmt
might give too many warnings and
+would prevent translating the `.po' file. The method to prevent
+this wrong decision is similar to the one used above, only the comment
+to use must contain the string xgettext:no-c-format.
+
+xgettext
Program to see how the --debug option can be
+used for solving this problem.
+
+3.5 Special Cases of Translatable Strings
+
+gettext
or something like this.
+Consider the following case:
+
+
+{
+ static const char *messages[] = {
+ "some very meaningful message",
+ "and another one"
+ };
+ const char *string;
+ ...
+ string
+ = index > 1 ? "a default message" : messages[index];
+
+ fputs (string);
+ ...
+}
+
+
+"a default message"
it
+is not possible to mark the string initializers for messages
.
+What is to be done? We have to fulfill two tasks. First we have to mark the
+strings so that the xgettext
program (see section 4.1 Invoking the xgettext
Program)
+can find them, and second we have to translate the string at runtime
+before printing them.
+
+
+#define gettext_noop(String) (String)
+
+{
+ static const char *messages[] = {
+ gettext_noop ("some very meaningful message"),
+ gettext_noop ("and another one")
+ };
+ const char *string;
+ ...
+ string
+ = index > 1 ? gettext ("a default message") : gettext (messages[index]);
+
+ fputs (string);
+ ...
+}
+
+
+fputs
is translated in any case. How to get xgettext
know
+the additional keyword gettext_noop
is explained in section 4.1 Invoking the xgettext
Program.
+
+
+#define gettext_noop(String) (String)
+
+{
+ static const char *messages[] = {
+ gettext_noop ("some very meaningful message",
+ gettext_noop ("and another one")
+ };
+ const char *string;
+ ...
+ string
+ = index > 1 ? gettext_noop ("a default message") : messages[index];
+
+ fputs (gettext (string));
+ ...
+}
+
+
+gettext_noop
for the string "a default message"
.
+A use of gettext
could have in rare cases unpredictable results.
+The second reason is found in the internals of the GNU gettext
+Library which will make this solution less efficient.
+
+4 Making the PO Template File
+
+xgettext
for this purpose.
+
+4.1 Invoking the
+
+
+xgettext
Program
+xgettext [option] inputfile ...
+
+
+
+
+
+
+xgettext
program decided, the format form is used if
+the programmer prescribed it.
+
+By default only the c-format form is used. The translator should
+not have to care about these details.
+
+xgettext
looks
+for strings in the first argument of each call to the function or macro
+id. If keywordspec is of the form
+`id:argnum', xgettext
looks for strings in the
+argnumth argument of the call. If keywordspec is of the form
+`id:argnum1,argnum2', xgettext
looks for
+strings in the argnum1st argument and in the argnum2nd argument
+of the call, and treats them as singular/plural variants for a message
+with plural handling.
+
+The default keyword specifications, which are always looked for if not
+explicitly disabled, are gettext
, dgettext:2
,
+dcgettext:2
, ngettext:1,2
, dngettext:2,3
,
+dcngettext:2,3
, and gettext_noop
.
+
+.gmo
files. We can ship some of
+these files in the GNU gettext
package, and the result of
+regenerating them through msgfmt
should yield the same values.
+
+xgettext
is able to process a few awkward
+cases, like strings in preprocessor macros, ANSI concatenation of
+adjacent strings, and escaped end of lines for continued strings.
+
+5 Creating a New PO File
+
+
+
+
+
+xgettext
.
+
+msgmerge
and msgfmt
programs, as well as for users whose
+locale's character encoding differs from yours (see section 9.2.4 How to specify the output character set gettext
uses).
+
+You get the character encoding of your locale by running the shell command
+`locale charmap'. If the result is `C' or `ANSI_X3.4-1968',
+which is equivalent to `ASCII' (= `US-ASCII'), it means that your
+locale is not correctly configured. In this case, ask your translation
+team which charset to use. `ASCII' is not usable for any language
+except Latin.
+
+Because the PO files must be portable to operating systems with less advanced
+internationalization facilities, the character encodings that can be used
+are limited to those supported by both GNU libc
and GNU
+libiconv
. These are:
+ASCII
, ISO-8859-1
, ISO-8859-2
, ISO-8859-3
,
+ISO-8859-4
, ISO-8859-5
, ISO-8859-6
, ISO-8859-7
,
+ISO-8859-8
, ISO-8859-9
, ISO-8859-13
, ISO-8859-15
,
+KOI8-R
, KOI8-U
, CP850
, CP866
, CP874
,
+CP932
, CP949
, CP950
, CP1250
, CP1251
,
+CP1252
, CP1253
, CP1254
, CP1255
, CP1256
,
+CP1257
, GB2312
, EUC-JP
, EUC-KR
, EUC-TW
,
+BIG5
, BIG5HKSCS
, GBK
, GB18030
, SJIS
,
+JOHAB
, TIS-620
, VISCII
, UTF-8
.
+
+In the GNU system, the following encodings are frequently used for the
+corresponding languages.
+
+
+
+
+
+When single quote characters or double quote characters are used in
+translations for your language, and your locale's encoding is one of the
+ISO-8859-* charsets, it is best if you create your PO files in UTF-8
+encoding, instead of your locale's encoding. This is because in UTF-8
+the real quote characters can be represented (single quote characters:
+U+2018, U+2019, double quote characters: U+201C, U+201D), whereas none of
+ISO-8859-* charsets has them all. Users in UTF-8 locales will see the
+real quote characters, whereas users in ISO-8859-* locales will see the
+vertical apostrophe and the vertical double quote instead (because that's
+what the character set conversion will transliterate them to).
+
+To enter such quote characters under X11, you can change your keyboard
+mapping using the ISO-8859-1
for
+
+ Afrikaans, Albanian, Basque, Catalan, Dutch, English, Estonian, Faroese,
+ Finnish, French, Galician, German, Greenlandic, Icelandic, Indonesian,
+ Irish, Italian, Malay, Norwegian, Portuguese, Spanish, Swedish,
+ISO-8859-2
for
+
+ Croatian, Czech, Hungarian, Polish, Romanian, Serbian, Slovak, Slovenian,
+ISO-8859-3
for Maltese,
+
+ISO-8859-5
for Macedonian, Serbian,
+
+ISO-8859-6
for Arabic,
+
+ISO-8859-7
for Greek,
+
+ISO-8859-8
for Hebrew,
+
+ISO-8859-9
for Turkish,
+
+ISO-8859-13
for Latvian, Lithuanian,
+
+ISO-8859-15
for
+
+ Basque, Catalan, Dutch, English, Finnish, French, Galician, German, Irish,
+ Italian, Portuguese, Spanish, Swedish,
+KOI8-R
for Russian,
+
+KOI8-U
for Ukrainian,
+
+CP1251
for Bulgarian, Byelorussian,
+
+GB2312
, GBK
, GB18030
+
+ for simplified writing of Chinese,
+BIG5
, BIG5HKSCS
+
+ for traditional writing of Chinese,
+EUC-JP
for Japanese,
+
+EUC-KR
for Korean,
+
+TIS-620
for Thai,
+
+UTF-8
for any language, including those listed above.
+
+xmodmap
program. The X11 names of the quote
+characters are "leftsinglequotemark", "rightsinglequotemark",
+"leftdoublequotemark", "rightdoublequotemark", "singlelowquotemark",
+"doublelowquotemark".
+
+Note that only recent versions of GNU Emacs support the UTF-8 encoding:
+Emacs 20 with Mule-UCS, and Emacs 21. As of January 2001, XEmacs doesn't
+support the UTF-8 encoding.
+
+The character encoding name can be written in either upper or lower case.
+Usually upper case is preferred.
+
+8-bit
.
+
+6 Updating Existing PO Files
+
+
+
+6.1 Invoking the
+
+
+
+msgmerge
Program6.2 Translated Entries
+
+msgstr
field has been filled with
+a translation, and which is not marked as fuzzy (see section 6.3 Fuzzy Entries),
+is a said to be a translated entry. Only translated entries will
+later be compiled by GNU msgfmt
and become usable in programs.
+Other entry types will be excluded; translation will not occur for them.
+
+
+
+
+
+po-next-translated-entry
) and M-t
+(po-previous-transted-entry
) move forwards or backwards, chasing
+for an translated entry. If none is found, the search is extended and
+wraps around in the PO file buffer.
+
+po-auto-fuzzy-on-edit
is not nil
, the entry having
+received a new translation first becomes a fuzzy entry, which ought to
+be later unfuzzied before becoming an official, genuine translated entry.
+See section 6.3 Fuzzy Entries.
+
+6.3 Fuzzy Entries
+
+fuzzy
, and entries having this attribute are said
+to have a fuzzy translation. They are called fuzzy entries, for short.
+
+msgmerge
to
+update an older translated PO files according to a new PO template
+file, when this tool hypothesises that some new msgid
has
+been modified only slightly out of an older one, and chooses to pair
+what it thinks to be the old translation for the new modified entry.
+The slight alteration in the original string (the msgid
string)
+should often be reflected in the translated string, and this requires
+the intervention of the translator. For this reason, msgmerge
+might mark some entries as being fuzzy.
+
+
+
+
+
+po-next-fuzzy
) and M-f
+(po-previous-fuzzy
) move forwards or backwards, chasing for
+a fuzzy entry. If none is found, the search is extended and wraps
+around in the PO file buffer.
+
+po-unfuzzy
) removes the fuzzy
+attribute associated with an entry, usually leaving it translated.
+Further, if the variable po-auto-select-on-unfuzzy
has not
+the nil
value, the TAB command will automatically chase
+for another interesting entry to work on. The initial value of
+po-auto-select-on-unfuzzy
is nil
.
+
+po-auto-fuzzy-on-edit
is nil
. However,
+if the variable po-auto-fuzzy-on-edit
is set to t
, any entry
+edited through the RET command is marked fuzzy, as a way to
+ensure some kind of double check, later. In this case, the usual paradigm
+is that an entry becomes fuzzy (if not already) whenever the translator
+modifies it. If she is satisfied with the translation, she then uses
+TAB to pick another entry to work on, clearing the fuzzy attribute
+on the same blow. If she is not satisfied yet, she merely uses SPC
+to chase another entry, leaving the entry fuzzy.
+
+po-fade-out-entry
) over any translated entry to mark it as being
+fuzzy, when she wants to easily leave a trace she wants to later return
+working at this entry.
+
+6.4 Untranslated Entries
+
+xgettext
originally creates a PO file, unless told
+otherwise, it initializes the msgid
field with the untranslated
+string, and leaves the msgstr
string to be empty. Such entries,
+having an empty translation, are said to be untranslated entries.
+Later, when the programmer slightly modifies some string right in
+the program, this change is later reflected in the PO file
+by the appearance of a new untranslated entry for the modified string.
+
+
+
+
+
+po-next-untranslated-entry
) and M-u
+(po-previous-untransted-entry
) move forwards or backwards,
+chasing for an untranslated entry. If none is found, the search is
+extended and wraps around in the PO file buffer.
+
+po-kill-msgstr
). See section 6.6 Modifying Translations.
+
+6.5 Obsolete Entries
+
+msgmerge
when it found that the
+translation is not needed anymore by the package being localized.
+
+msgid
or msgstr
.
+
+
+
+
+
+po-next-obsolete-entry
) and M-o
+(po-previous-obsolete-entry
) move forwards or backwards,
+chasing for an obsolete entry. If none is found, the search is
+extended and wraps around in the PO file buffer.
+
+msgid
values.
+
+gettext
utilities will later react to the
+disappearance of a translation by using the untranslated string.
+The command DEL (po-fade-out-entry
) pushes the current entry
+a little further towards annihilation. If the entry is active (it is a
+translated entry), then it is first made fuzzy. If it is already fuzzy,
+then the entry is merely commented out, with confirmation. If the entry
+is already obsolete, then it is completely deleted from the PO file.
+It is easy to recycle the translation so deleted into some other PO file
+entry, usually one which is untranslated. See section 6.6 Modifying Translations.
+
+6.6 Modifying Translations
+
+
+
+
+
+po-edit-msgstr
) opens a new Emacs
+window meant to edit in a new translation, or to modify an already existing
+translation. The new window contains a copy of the translation taken from
+the current PO file entry, all ready for edition, expunged of all quoting
+marks, fully modifiable and with the complete extent of Emacs modifying
+commands. When the translator is done with her modifications, she may use
+C-c C-c to close the subedit window with the automatically requoted
+results, or C-c C-k to abort her modifications. See section 6.8 Details of Sub Edition,
+for more information.
+
+po-msgid-to-msgstr
) initializes, or
+reinitializes the translation with the original string. This command is
+normally used when the translator wants to redo a fresh translation of
+the original string, disregarding any previous work.
+
+po-auto-edit-with-msgid
to t
, the translation gets
+initialised with the original string, in case none exists already.
+The default value for po-auto-edit-with-msgid
is nil
.
+
+po-kill-msgstr
) merely empties the
+translation string, so turning the entry into an untranslated
+one. But while doing so, its previous contents is put apart in
+a special place, known as the kill ring. The command w
+(po-kill-ring-save-msgstr
) has also the effect of taking a
+copy of the translation onto the kill ring, but it otherwise leaves
+the entry alone, and does not remove the translation from the
+entry. Both commands use exactly the Emacs kill ring, which is shared
+between buffers, and which is well known already to Emacs lovers.
+
+po-yank-msgstr
) completely replaces the
+translation of the current entry by a string taken from the kill ring.
+Following Emacs terminology, we then say that the replacement
+string is yanked into the PO file buffer.
+See section `Yanking' in The Emacs Editor.
+The first time y is used, the translation receives the value of
+the most recent addition to the kill ring. If y is typed once
+again, immediately, without intervening keystrokes, the translation
+just inserted is taken away and replaced by the second most recent
+addition to the kill ring. By repeating y many times in a row,
+the translator may travel along the kill ring for saved strings,
+until she finds the string she really wanted.
+
+msgstr
field with
+this retrieved translation. Once this done, the obsolete entry is
+not wanted anymore, and may be safely deleted.
+
+msgstr
+field. The translator is then free to use RET for fine
+tuning the translation contents, and maybe to later use u,
+then m again, for going on with the next untranslated string.
+
+6.7 Modifying Comments
+
+gettext
tools.
+So, the commands below will never alter such system added comments,
+they are not meant for the translator to modify. See section 2.2 The Format of PO Files.
+
+
+
+
+
+po-edit-comment
) opens a new Emacs window
+containing a copy of the translator comments on the current PO file entry.
+If there are no such comments, PO mode understands that the translator wants
+to add a comment to the entry, and she is presented with an empty screen.
+Comment marks (#) and the space following them are automatically
+removed before edition, and reinstated after. For translator comments
+pertaining to obsolete entries, the uncommenting and recommenting operations
+are done twice. Once in the editing window, the keys C-c C-c
+allow the translator to tell she is finished with editing the comment.
+See section 6.8 Details of Sub Edition, for further details.
+
+po-subedit-mode-hook
, if any, are executed after
+the string has been inserted in the edit buffer.
+
+po-kill-comment
) gets rid of all
+translator comments, while saving those comments on the kill ring.
+The command W (po-kill-ring-save-comment
) takes
+a copy of the translator comments on the kill ring, but leaves
+them undisturbed in the current entry. The command Y
+(po-yank-comment
) completely replaces the translator comments
+by a string taken at the front of the kill ring. When this command
+is immediately repeated, the comments just inserted are withdrawn,
+and replaced by other strings taken along the kill ring.
+
+yank
) and M-y
+(yank-pop
) to get the previous translation where she likes.
+
+6.8 Details of Sub Edition
+
+
+
+
+
+po-subedit-exit
) may be used to return the edited translation into
+the PO file, replacing the original translation, even if it moved out of
+sight or if buffers were switched.
+
+po-subedit-abort
) to merely get rid of edition, while preserving
+the original translation or comment. Another way would be for her to exit
+normally with C-c C-c, then type U
once for undoing the
+whole effect of last edition.
+
+po-subedit-mode-hook
, if any, are executed after
+the string has been inserted in the edit buffer.
+
+6.9 C Sources Context
+
+gettext
utilities, as those utilities
+insert special comments in the PO files they generate.
+Some of these special comments relate the PO file entry to
+exactly where the untranslated string appears in the program sources.
+
+
+
+
+
+po-cycle-reference
) and M-s
+(po-select-source-reference
) both open another window displaying
+some source program file, and already positioned in such a way that
+it shows an actual use of the string to be translated. By doing
+so, the command gives source program context for the string. But if
+the entry has no source context references, or if all references
+are unresolved along the search path for program sources, then the
+command diagnoses this as an error.
+
+po-consider-source-path
) is used to interactively
+enter a new directory at the front of the search path, and the command
+M-S (po-ignore-source-path
) is used to select, with completion,
+one of the directories she does not want anymore on the search path.
+
+6.10 Consulting Auxiliary PO Files
+
+
+
+
+
+po-consider-as-auxiliary
) adds the current
+PO file to the list of auxiliary files, while command M-A
+(po-ignore-as-auxiliary
just removes it.
+
+po-cycle-auxiliary
) seeks all auxiliary PO
+files, round-robin, searching for a translated entry in some other language
+having an msgid
field identical as the one for the current entry.
+The found PO file, if any, takes the place of the current PO file in
+the display (its window gets on top). Before doing so, the current PO
+file is also made into an auxiliary file, if not already. So, a
+in this newly displayed PO file will seek another PO file, and so on,
+so repeating a will eventually yield back the original PO file.
+
+po-select-auxiliary
) asks the translator
+for her choice of a particular auxiliary file, with completion, and
+then switches to that selected PO file. The command also checks if
+the selected file has an msgid
field identical as the one for
+the current entry, and if yes, this entry becomes current. Otherwise,
+the cursor of the selected file is left undisturbed.
+
+msgid
fields should be written exactly
+the same way. It is possible to write msgid
fields in various
+ways for representing the same string, different writing would break the
+proper behaviour of the auxiliary file commands of PO mode. This is not
+expected to be much a problem in practice, as most existing PO files have
+their msgid
entries written by the same GNU gettext
tools.
+
+gettext
tools get
+fully resolved, the translator should stay aware of normalisation issues.
+
+6.11 Using Translation Compendiums
+
+7 Producing Binary MO Files
+
+
+
+7.1 Invoking the
+
+
+msgfmt
Program
+Usage: msgfmt [option] filename.po ...
+
+
+
+
+
+
+msgid
and msgstr
strings are
+studied and compared. It is considered abnormal that one string
+starts or ends with a newline while the other does not.
+
+Also, if the string represents a format string used in a
+printf
-like function both strings should have the same number of
+`%' format specifiers, with matching types. If the flag
+c-format
or possible-c-format
appears in the special
+comment #, for this entry a check is performed. For example, the
+check will diagnose using `%.*s' against `%s', or `%d'
+against `%s', or `%d' against `%x'. It can even handle
+positional parameters.
+
+Normally the xgettext
program automatically decides whether a
+string is a format string or not. This algorithm is not perfect,
+though. It might regard a string as a format string though it is not
+used in a printf
-like function and so msgfmt
might report
+errors where there are none. Or the other way round: a string is not
+regarded as a format string but it is used in a printf
-like
+function.
+
+So solve this problem the programmer can dictate the decision to the
+xgettext
program (see section 3.4 Special Comments preceding Keywords). The translator should not
+consider removing the flag from the #, line. This "fix" would be
+reversed again as soon as msgmerge
is called the next time.
+
+7.2 The Format of GNU MO Files
+
+0x950412de
and 0xde120495
. The second
+word describes the current revision of the file format. For now the
+revision is 0. This might change in future versions, and ensures
+that the readers of MO files can distinguish new formats from old
+ones, so that both can be handled correctly. The version is kept
+separate from the magic number, instead of using different magic
+numbers for different formats, mainly because `/etc/magic' is
+not updated often. It might be better to have magic separated from
+internal format version identification.
+
+gettext
is usually translated into
+some system information attached to that particular MO file, and the
+empty string necessarily becomes the first in both the original and
+translated tables, making the system information very easy to find.
+
+gettext
code, and is not documented here.
+
+msgfmt
+program has an option selecting the alignment for MO file strings.
+With this option, each string is separately aligned so it starts at
+an offset which is a multiple of the alignment value. On some RISC
+machines, a correct alignment will speed things up.
+
+gettext
development forum, and it is expectable that MO file
+format will evolve or change over time. It is even possible that many
+formats may later be supported concurrently. But surely, we have to
+start somewhere, and the MO file format described here is a good start.
+Nothing is cast in concrete, and the format may later evolve fairly
+easily, so we should feel comfortable with the current approach.
+
+
+ byte
+ +------------------------------------------+
+ 0 | magic number = 0x950412de |
+ | |
+ 4 | file format revision = 0 |
+ | |
+ 8 | number of strings | == N
+ | |
+ 12 | offset of table with original strings | == O
+ | |
+ 16 | offset of table with translation strings | == T
+ | |
+ 20 | size of hashing table | == S
+ | |
+ 24 | offset of hashing table | == H
+ | |
+ . .
+ . (possibly more entries later) .
+ . .
+ | |
+ O | length & offset 0th string ----------------.
+ O + 8 | length & offset 1st string ------------------.
+ ... ... | |
+O + ((N-1)*8)| length & offset (N-1)th string | | |
+ | | | |
+ T | length & offset 0th translation ---------------.
+ T + 8 | length & offset 1st translation -----------------.
+ ... ... | | | |
+T + ((N-1)*8)| length & offset (N-1)th translation | | | | |
+ | | | | | |
+ H | start hash table | | | | |
+ ... ... | | | |
+ H + S * 4 | end hash table | | | | |
+ | | | | | |
+ | NUL terminated 0th string <----------------' | | |
+ | | | | |
+ | NUL terminated 1st string <------------------' | |
+ | | | |
+ ... ... | |
+ | | | |
+ | NUL terminated 0th translation <---------------' |
+ | | |
+ | NUL terminated 1st translation <-----------------'
+ | |
+ ... ...
+ | |
+ +------------------------------------------+
+
+
+8 The User's View
+
+gettext
will truly have reached its goal, average users
+should feel some kind of astonished pleasure, seeing the effect of
+that strange kind of magic that just makes their own native language
+appear everywhere on their screens. As for naive users, they would
+ideally have no special pleasure about it, merely taking their own
+language for granted, and becoming rather unhappy otherwise.
+
+gettext
. All other software engineers:
+programmers, translators, maintainers, should work together in such a
+way that the magic becomes possible. This is a long and progressive
+undertaking, and information is available about the progress of the
+Translation Project.
+
+gettext
is offering magic
+for both installers and end users.
+
+8.1 The Current `ABOUT-NLS' Matrix
+
+gettext
. To know if some package uses GNU gettext
, one
+may check the distribution for the `ABOUT-NLS' information file, for
+some `ll.po' files, often kept together into some `po/'
+directory, or for an `intl/' directory. Internationalized packages
+have usually many `ll.po' files, where ll represents
+the language. section 8.3 Magic for End Users for a complete description of the format
+for ll.
+
+gettext
manual. This information is often found in
+file `ABOUT-NLS' from various distributions, but is also as old as
+the distribution itself. A recent copy of this `ABOUT-NLS' file,
+containing up-to-date information, should generally be found on the
+Translation Project sites, and also on most GNU archive sites.
+
+8.2 Magic for Installers
+
+gettext
, internally,
+are installed in such a way that they to allow translation of
+messages. At configuration time, those packages should
+automatically detect whether the underlying host system already provides
+the GNU gettext
functions. If not,
+the GNU gettext
library should be automatically prepared
+and used. Installers may use special options at configuration
+time for changing this behavior. The command `./configure
+--with-included-gettext' bypasses system gettext
to
+use the included GNU gettext
instead,
+while `./configure --disable-nls'
+produces programs totally unable to translate messages.
+
+LINGUAS
+may be set, prior to configuration, to limit the installed set.
+LINGUAS
should then contain a space separated list of two-letter
+codes, stating which languages are allowed.
+
+8.3 Magic for End Users
+
+gettext
internally,
+and for which the installers did not disable translation at
+configure time. Then, users only have to set the LANG
+environment variable to the appropriate `ll_CC'
+combination prior to using the programs in the package. See section 8.1 The Current `ABOUT-NLS' Matrix.
+For example, let's presume a German site. At the shell prompt, users
+merely have to execute `setenv LANG de_DE' (in csh
) or
+`export LANG; LANG=de_DE' (in sh
). They could even do
+this from their `.login' or `.profile' file.
+
+9 The Programmer's View
+
+gettext
was to use the systems message catalog handling, if the
+installer wishes to do so. So we perhaps should first take a look at
+the solutions we know about. The people in the POSIX committee did not
+manage to agree on one of the semi-official standards which we'll
+describe below. In fact they couldn't agree on anything, so they decided
+only to include an example of an interface. The major Unix vendors
+are split in the usage of the two most important specifications: X/Open's
+catgets vs. Uniforum's gettext interface. We'll describe them both and
+later explain our solution of this dilemma.
+
+9.1 About
+
+catgets
catgets
implementation is defined in the X/Open Portability
+Guide, Volume 3, XSI Supplementary Definitions, Chapter 5. But the
+process of creating this standard seemed to be too slow for some of
+the Unix vendors so they created their implementations on preliminary
+versions of the standard. Of course this leads again to problems while
+writing platform independent programs: even the usage of catgets
+does not guarantee a unique interface.
+
+9.1.1 The Interface
+
+catgets
implementation consists of three
+functions which correspond to those used in file access: catopen
+to open the catalog for using, catgets
for accessing the message
+tables, and catclose
for closing after work is done. Prototypes
+for the functions and the needed definitions are in the
+<nl_types.h>
header file.
+
+catopen
is used like in this:
+
+
+nl_catd catd = catopen ("catalog_name", 0);
+
+
+0
as the value. The return value is a handle to the
+message catalog, equivalent to handles to file returned by open
.
+
+catgets
function which can
+be used like this:
+
+
+char *translation = catgets (catd, set_no, msg_id, "original string");
+
+
+msg_id
is obtained. catgets
therefore uses a
+three-stage addressing:
+
+
+catalog name => set number => message ID => translation
+
+
+char *
the resulting string must not be changed. It
+should better be const char *
, but the standard is published in
+1988, one year before ANSI C.
+
+
+catclose (catd);
+
+
+catgets
call using the descriptor is legal anymore.
+
+9.1.2 Problems with the
+
+catgets
Interface?!catgets
: the unique
+message ID. This has to be a numeric value for all messages in a single
+set. Perhaps you could imagine the problems keeping such a list while
+changing the source code. Add a new message here, remove one there. Of
+course there have been developed a lot of tools helping to organize this
+chaos but one as the other fails in one aspect or the other. We don't
+want to say that the other approach has no problems but they are far
+more easy to manage.
+
+9.2 About
+
+gettext
gettext
interface comes from a Uniforum
+proposal and it is followed by at least one major Unix vendor
+(Sun) in its last developments. It is not specified in any official
+standard, though.
+
+gettext
Library. Programmers interested
+in using this library will be interested in this description.
+
+9.2.1 The Interface
+
+gettext
interface. It
+has a global domain which unqualified usages reference. Of course this
+domain is selectable by the user.
+
+
+char *textdomain (const char *domain_name);
+
+
+LC_MESSAGE
category. The
+argument is a null-terminated string, whose characters must be legal in
+the use in filenames. If the domain_name argument is NULL
,
+the function return the current value. If no value has been set
+before, the name of the default domain is returned: messages.
+Please note that although the return value of textdomain
is of
+type char *
no changing is allowed. It is also important to know
+that no checks of the availability are made. If the name is not
+available you will see this by the fact that no translations are provided.
+
+textdomain
the function
+
+
+char *gettext (const char *msgid);
+
+
+NULL
the result is undefined.
+
+LC_MESSAGES
locale is used. If this changes between two
+executions of the same gettext
call in the program, both calls
+reference a different message catalog.
+
+textdomain
+is issued, setting the domain to a unique name, normally the package
+name. In the following code all strings which have to be translated are
+filtered through the gettext function. That's all, the package speaks
+your language.
+
+9.2.2 Solving Ambiguities
+
+textdomain
, but this is really not convenient nor is it fast. A
+possible situation could be one case subject to discussion during this
+writing: all
+error messages of functions in the set of common used functions should
+go into a separate domain error
. By this mean we would only need
+to translate them once.
+Another case are messages from a library, as these have to be
+independent of the current domain set by the application.
+
+
+char *dgettext (const char *domain_name, const char *msgid);
+char *dcgettext (const char *domain_name, const char *msgid,
+ int category);
+
+
+textdomain
. The third argument of
+dcgettext
allows to use another locale but LC_MESSAGES
.
+But I really don't know where this can be useful. If the
+domain_name is NULL
or category has an value beside
+the known ones, the result is undefined. It should also be noted that
+this function is not part of the second known implementation of this
+function family, the one found in Solaris.
+
+
+char *bindtextdomain (const char *domain_name,
+ const char *dir_name);
+
+
+textdomain
). A
+NULL
pointer for the dir_name parameter returns the binding
+associated with domain_name. If domain_name itself is
+NULL
nothing happens and a NULL
pointer is returned. Here
+again as for all the other functions is true that none of the return
+value must be changed!
+
+chdir
command. Relative
+paths should always be avoided to avoid dependencies and
+unreliabilities.
+
+9.2.3 Locating Message Catalog Files
+
+bindtextdomain
s second argument (or the default directory),
+followed by the value and name of the locale and the domain name are
+concatenated:
+
+
+dir_name/locale/LC_category/domain_name.mo
+
+
+
+/usr/local/share/locale
+
+
+LC_category
. For gettext
and dgettext
this
+LC_category
is always LC_MESSAGES
.(3)
+The value of the locale is determined through
+setlocale (LC_category, NULL)
.
+(4)
+dcgettext
specifies the locale category by the third argument.
+
+9.2.4 How to specify the output character set
+
+gettext
usesgettext
not only looks up a translation in a message catalog. It
+also converts the translation on the fly to the desired output character
+set. This is useful if the user is working in a different character set
+than the translator who created the message catalog, because it avoids
+distributing variants of message catalogs which differ only in the
+character set.
+
+nl_langinfo
+(CODESET)
, which depends on the LC_CTYPE
part of the current
+locale. But programs which store strings in a locale independent way
+(e.g. UTF-8) can request that gettext
and related functions
+return the translations in that encoding, by use of the
+bind_textdomain_codeset
function.
+
+gettext
is not subject to
+character set conversion. Also, when gettext
does not find a
+translation for msgid, it returns msgid unchanged --
+independently of the current output character set. It is therefore
+recommended that all msgids be US-ASCII strings.
+
+
+
bind_textdomain_codeset
function can be used to specify the
+output character set for message catalogs for domain domainname.
+The codeset argument must be a valid codeset name which can be used
+for the iconv_open
function, or a null pointer.
+
+
+If the codeset parameter is the null pointer,
+bind_textdomain_codeset
returns the currently selected codeset
+for the domain with the name domainname. It returns NULL
if
+no codeset has yet been selected.
+
+
+The bind_textdomain_codeset
function can be used several times.
+If used multiple times with the same domainname argument, the
+later call overrides the settings made by the earlier one.
+
+
+The bind_textdomain_codeset
function returns a pointer to a
+string containing the name of the selected codeset. The string is
+allocated internally in the function and must not be changed by the
+user. If the system went out of core during the execution of
+bind_textdomain_codeset
, the return value is NULL
and the
+global variable errno is set accordingly.
+
+
+
+The functions of the gettext
family described so far (and all the
+catgets
functions as well) have one problem in the real world
+which have been neglected completely in all existing approaches. What
+is meant here is the handling of plural forms.
+
+
+Looking through Unix source code before the time anybody thought about +internationalization (and, sadly, even afterwards) one can often find +code similar to the following: + +
+ ++ printf ("%d file%s deleted", n, n == 1 ? "" : "s"); ++ +
+After the first complaints from people internationalizing the code people
+either completely avoided formulations like this or used strings like
+"file(s)"
. Both look unnatural and should be avoided. First
+tries to solve the problem correctly looked like this:
+
+
+ if (n == 1) + printf ("%d file deleted", n); + else + printf ("%d files deleted", n); ++ +
+But this does not solve the problem. It helps languages where the
+plural form of a noun is not simply constructed by adding an `s' but
+that is all. Once again people fell into the trap of believing the
+rules their language is using are universal. But the handling of plural
+forms differs widely between the language families. For example,
+Rafal Maszkowski <rzm@mat.uni.torun.pl>
reports:
+
+
++ ++In Polish we use e.g. plik (file) this way: + +
+1 plik +2,3,4 pliki +5-21 pliko'w +22-24 pliki +25-31 pliko'w ++ ++and so on (o' means 8859-2 oacute which should be rather okreska, +similar to aogonek). +
+There are two things which can differ between languages (and even inside +language families); + +
+ +
+The consequence of this is that application writers should not try to
+solve the problem in their code. This would be localization since it is
+only usable for certain, hardcoded language environments. Instead the
+extended gettext
interface should be used.
+
+
+These extra functions are taking instead of the one key string two
+strings and a numerical argument. The idea behind this is that using
+the numerical argument and the first string as a key, the implementation
+can select using rules specified by the translator the right plural
+form. The two string arguments then will be used to provide a return
+value in case no message catalog is found (similar to the normal
+gettext
behavior). In this case the rules for Germanic language
+is used and it is assumed that the first string argument is the singular
+form, the second the plural form.
+
+
+This has the consequence that programs without language catalogs can
+display the correct strings only if the program itself is written using
+a Germanic language. This is a limitation but since the GNU C library
+(as well as the GNU gettext
package) are written as part of the
+GNU package and the coding standards for the GNU project require program
+being written in English, this solution nevertheless fulfills its
+purpose.
+
+
+
ngettext
function is similar to the gettext
function
+as it finds the message catalogs in the same way. But it takes two
+extra arguments. The msgid1 parameter must contain the singular
+form of the string to be converted. It is also used as the key for the
+search in the catalog. The msgid2 parameter is the plural form.
+The parameter n is used to determine the plural form. If no
+message catalog is found msgid1 is returned if n == 1
,
+otherwise msgid2
.
+
+
++An example for the use of this function is: + +
+ ++printf (ngettext ("%d file removed", "%d files removed", n), n); ++ +
+Please note that the numeric value n has to be passed to the
+printf
function as well. It is not sufficient to pass it only to
+ngettext
.
+
+
dngettext
is similar to the dgettext
function in the
+way the message catalog is selected. The difference is that it takes
+two extra parameter to provide the correct plural form. These two
+parameters are handled in the same way ngettext
handles them.
++
dcngettext
is similar to the dcgettext
function in the
+way the message catalog is selected. The difference is that it takes
+two extra parameter to provide the correct plural form. These two
+parameters are handled in the same way ngettext
handles them.
++Now, how do these functions solve the problem of the plural forms? +Without the input of linguists (which was not available) it was not +possible to determine whether there are only a few different forms in +which plural forms are formed or whether the number can increase with +every new supported language. + +
++Therefore the solution implemented is to allow the translator to specify +the rules of how to select the plural form. Since the formula varies +with every language this is the only viable solution except for +hardcoding the information in the code (which still would require the +possibility of extensions to not prevent the use of new languages). + +
+
+The information about the plural form selection has to be stored in the
+header entry of the PO file (the one with the empty msgid
string).
+The plural form information looks like this:
+
+
+Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1; ++ +
+The nplurals
value must be a decimal number which specifies how
+many different plural forms exist for this language. The string
+following plural
is an expression which is using the C language
+syntax. Exceptions are that no negative numbers are allowed, numbers
+must be decimal, and the only variable allowed is n
. This
+expression will be evaluated whenever one of the functions
+ngettext
, dngettext
, or dcngettext
is called. The
+numeric value passed to these functions is then substituted for all uses
+of the variable n
in the expression. The resulting value then
+must be greater or equal to zero and smaller than the value given as the
+value of nplurals
.
+
+
+The following rules are known at this point. The language with families +are listed. But this does not necessarily mean the information can be +generalized for the whole family (as can be easily seen in the table +below).(5).} + +
++Plural-Forms: nplurals=1; plural=0; ++ +Languages with this property include: + +
+Plural-Forms: nplurals=2; plural=n != 1; ++ +(Note: this uses the feature of C expressions that boolean expressions +have to value zero or one.) + +Languages with this property include: + +
+Plural-Forms: nplurals=2; plural=n>1; ++ +Languages with this property include: + +
+Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2; ++ +Languages with this property include: + +
+Plural-Forms: nplurals=3; \ + plural=n%10==1 && n%100!=11 ? 0 : \ + n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2; ++ +Languages with this property include: + +
+Plural-Forms: nplurals=3; \ + plural=n%10==1 && n%100!=11 ? 0 : \ + n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; ++ +Languages with this property include: + +
+Plural-Forms: nplurals=3; \ + plural=n==1 ? 0 : \ + n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; ++ +(Continuation in the next line is possible.) + +Languages with this property include: + +
+Plural-Forms: nplurals=4; \ + plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3; ++ +Languages with this property include: + +
gettext
in GUI programs
+One place where the gettext
functions, if used normally, have big
+problems is within programs with graphical user interfaces (GUIs). The
+problem is that many of the strings which have to be translated are very
+short. They have to appear in pull-down menus which restricts the
+length. But strings which are not containing entire sentences or at
+least large fragments of a sentence may appear in more than one
+situation in the program but might have different translations. This is
+especially true for the one-word strings which are frequently used in
+GUI programs.
+
+
+As a consequence many people say that the gettext
approach is
+wrong and instead catgets
should be used which indeed does not
+have this problem. But there is a very simple and powerful method to
+handle these kind of problems with the gettext
functions.
+
+
+As as example consider the following fictional situation. A GUI program +has a menu bar with the following entries: + +
+ +++------------+------------+--------------------------------------+ +| File | Printer | | ++------------+------------+--------------------------------------+ +| Open | | Select | +| New | | Open | ++----------+ | Connect | + +----------+ ++ +
+To have the strings File
, Printer
, Open
,
+New
, Select
, and Connect
translated there has to be
+at some point in the code a call to a function of the gettext
+family. But in two places the string passed into the function would be
+Open
. The translations might not be the same and therefore we
+are in the dilemma described above.
+
+
+One solution to this problem is to artificially enlengthen the strings +to make them unambiguous. But what would the program do if no +translation is available? The enlengthened string is not what should be +printed. So we should use a little bit modified version of the functions. + +
++To enlengthen the strings a uniform method should be used. E.g., in the +example above the strings could be chosen as + +
+ ++Menu|File +Menu|Printer +Menu|File|Open +Menu|File|New +Menu|Printer|Select +Menu|Printer|Open +Menu|Printer|Connect ++ +
+Now all the strings are different and if now instead of gettext
+the following little wrapper function is used, everything works just
+fine:
+
+
+ char * + sgettext (const char *msgid) + { + char *msgval = gettext (msgid); + if (msgval == msgid) + msgval = strrchr (msgid, '|') + 1; + return msgval; + } ++ +
+What this little function does is to recognize the case when no
+translation is available. This can be done very efficiently by a
+pointer comparison since the return value is the input value. If there
+is no translation we know that the input string is in the format we used
+for the Menu entries and therefore contains a |
character. We
+simply search for the last occurrence of this character and return a
+pointer to the character following it. That's it!
+
+
+If one now consistently uses the enlengthened string form and replaces
+the gettext
calls with calls to sgettext
(this is normally
+limited to very few places in the GUI implementation) then it is
+possible to produce a program which can be internationalized.
+
+
+The other gettext
functions (dgettext
, dcgettext
+and the ngettext
equivalents) can and should have corresponding
+functions as well which look almost identical, except for the parameters
+and the call to the underlying function.
+
+
+Now there is of course the question why such functions do not exist in +the GNU gettext package? There are two parts of the answer to this question. + +
+ +|
which is a quite good choice because it
+resembles a notation frequently used in this context and it also is a
+character not often used in message strings.
+
+But what if the character is used in message strings? Or if the chose
+character is not available in the character set on the machine one
+compiles (e.g., |
is not required to exist for ISO C; this is
+why the `iso646.h' file exists in ISO C programming environments).
++There is only one more comment to be said. The wrapper function above +requires that the translations strings are not enlengthened themselves. +This is only logical. There is no need to disambiguate the strings +(since they are never used as keys for a search) and one also saves +quite some memory and disk space by doing this. + +
+ + +
+At this point of the discussion we should talk about an advantage of the
+GNU gettext
implementation. Some readers might have pointed out
+that an internationalized program might have a poor performance if some
+string has to be translated in an inner loop. While this is unavoidable
+when the string varies from one run of the loop to the other it is
+simply a waste of time when the string is always the same. Take the
+following example:
+
+
+{ + while (...) + { + puts (gettext ("Hello world")); + } +} ++ +
+When the locale selection does not change between two runs the resulting +string is always the same. One way to use this is: + +
+ ++{ + str = gettext ("Hello world"); + while (...) + { + puts (str); + } +} ++ +
+But this solution is not usable in all situation (e.g. when the locale +selection changes) nor does it lead to legible code. + +
+
+For this reason, GNU gettext
caches previous translation results.
+When the same translation is requested twice, with no new message
+catalogs being loaded in between, gettext
will, the second time,
+find the result through a single cache lookup.
+
+
+The following discussion is perhaps a little bit colored. As said
+above we implemented GNU gettext
following the Uniforum
+proposal and this surely has its reasons. But it should show how we
+came to this decision.
+
+
+First we take a look at the developing process. When we write an
+application using NLS provided by gettext
we proceed as always.
+Only when we come to a string which might be seen by the users and thus
+has to be translated we use gettext("...")
instead of
+"..."
. At the beginning of each source file (or in a central
+header file) we define
+
+
+#define gettext(String) (String) ++ +
+Even this definition can be avoided when the system supports the
+gettext
function in its C library. When we compile this code the
+result is the same as if no NLS code is used. When you take a look at
+the GNU gettext
code you will see that we use _("...")
+instead of gettext("...")
. This reduces the number of
+additional characters per translatable string to 3 (in words:
+three).
+
+
+When now a production version of the program is needed we simply replace +the definition + +
+ ++#define _(String) (String) ++ +
+by + +
+ ++#include <libintl.h> +#define _(String) gettext (String) ++ +
+Additionally we run the program `xgettext' on all source code file +which contain translatable strings and that's it: we have a running +program which does not depend on translations to be available, but which +can use any that becomes available. + +
+
+The same procedure can be done for the gettext_noop
invocations
+(see section 3.5 Special Cases of Translatable Strings). One usually defines gettext_noop
as a
+no-op macro. So you should consider the following code for your project:
+
+
+#define gettext_noop(String) (String) +#define N_(String) gettext_noop (String) ++ +
+N_
is a short form similar to _
. The `Makefile' in
+the `po/' directory of GNU gettext
knows by default both of the
+mentioned short forms so you are invited to follow this proposal for
+your own ease.
+
+
+Now to catgets
. The main problem is the work for the
+programmer. Every time he comes to a translatable string he has to
+define a number (or a symbolic constant) which has also be defined in
+the message catalog file. He also has to take care for duplicate
+entries, duplicate message IDs etc. If he wants to have the same
+quality in the message catalog as the GNU gettext
program
+provides he also has to put the descriptive comments for the strings and
+the location in all source code files in the message catalog. This is
+nearly a Mission: Impossible.
+
+
+But there are also some points people might call advantages speaking for
+catgets
. If you have a single word in a string and this string
+is used in different contexts it is likely that in one or the other
+language the word has different translations. Example:
+
+
+printf ("%s: %d", gettext ("number"), number_of_errors) + +printf ("you should see %d %s", number_count, + number_count == 1 ? gettext ("number") : gettext ("numbers")) ++ +
+Here we have to translate two times the string "number"
. Even
+if you do not speak a language beside English it might be possible to
+recognize that the two words have a different meaning. In German the
+first appearance has to be translated to "Anzahl"
and the second
+to "Zahl"
.
+
+
+Now you can say that this example is really esoteric. And you are +right! This is exactly how we felt about this problem and decide that +it does not weight that much. The solution for the above problem could +be very easy: + +
+ ++printf ("%s %d", gettext ("number:"), number_of_errors) + +printf (number_count == 1 ? gettext ("you should see %d number") + : gettext ("you should see %d numbers"), + number_count) ++ +
+We believe that we can solve all conflicts with this method. If it is +difficult one can also consider changing one of the conflicting string a +little bit. But it is not impossible to overcome. + +
+
+catgets
allows same original entry to have different translations,
+but gettext
has another, scalable approach for solving ambiguities
+of this kind: See section 9.2.2 Solving Ambiguities.
+
+
+Starting with version 0.9.4 the library libintl.h
should be
+self-contained. I.e., you can use it in your own programs without
+providing additional functions. The `Makefile' will put the header
+and the library in directories selected using the $(prefix)
.
+
+
+One exception of the above is found on HP-UX 10.01 systems. Here the C
+library does not contain the alloca
function (and the HP compiler
+does not generate it inlined). But it is not intended to rewrite the whole
+library just because of this dumb system. Instead include the
+alloca
function in all package you use the libintl.a
in.
+
+
gettext
grok
+To fully exploit the functionality of the GNU gettext
library it
+is surely helpful to read the source code. But for those who don't want
+to spend that much time in reading the (sometimes complicated) code here
+is a list comments:
+
+
gettext
+function. The method which is presented here only works correctly
+with the GNU implementation of the gettext
functions.
+
+In the function dcgettext
at every call the current setting of
+the highest priority environment variable is determined and used.
+Highest priority means here the following list with decreasing
+priority:
+
+
+LANGUAGE
+
+LC_ALL
+
+LC_xxx
, according to selected locale
+
+LANG
+
+LANGUAGE
changes. According
+to the process explained above the new value of this variable is found
+as soon as the dcgettext
function is called. But this also means
+the (perhaps) different message catalog file is loaded. In other
+words: the used language is changed.
+
+But there is one little hook. The code for gcc-2.7.0 and up provides
+some optimization. This optimization normally prevents the calling of
+the dcgettext
function as long as no new catalog is loaded. But
+if dcgettext
is not called the program also cannot find the
+LANGUAGE
variable be changed (see section 9.2.7 Optimization of the *gettext functions). A
+solution for this is very easy. Include the following code in the
+language switching function.
+
+
++ /* Change language. */ + setenv ("LANGUAGE", "fr", 1); + + /* Make change known. */ + { + extern int _nl_msg_cat_cntr; + ++_nl_msg_cat_cntr; + } ++ +The variable
_nl_msg_cat_cntr
is defined in `loadmsgcat.c'.
+The programmer will find himself in need for a construct like this only
+when developing programs which do run longer and provide the user to
+select the language at runtime. Non-interactive programs (like all
+these little Unix tools) should never need this.
+
+
+There are two competing methods for language independent messages:
+the X/Open catgets
method, and the Uniforum gettext
+method. The catgets
method indexes messages by integers; the
+gettext
method indexes them by their English translations.
+The catgets
method has been around longer and is supported
+by more vendors. The gettext
method is supported by Sun,
+and it has been heard that the COSE multi-vendor initiative is
+supporting it. Neither method is a POSIX standard; the POSIX.1
+committee had a lot of disagreement in this area.
+
+
+Neither one is in the POSIX standard. There was much disagreement
+in the POSIX.1 committee about using the gettext
routines
+vs. catgets
(XPG). In the end the committee couldn't
+agree on anything, so no messaging system was included as part
+of the standard. I believe the informative annex of the standard
+includes the XPG3 messaging interfaces, "...as an example of
+a messaging system that has been implemented..."
+
+
+They were very careful not to say anywhere that you should use one +set of interfaces over the other. For more on this topic please +see the Programming for Internationalization FAQ. + +
+ + +catgets
+There have been a few discussions of late on the use of
+catgets
as a base. I think it important to present both
+sides of the argument and hence am opting to play devil's advocate
+for a little bit.
+
+
+I'll not deny the fact that catgets
could have been designed
+a lot better. It currently has quite a number of limitations and
+these have already been pointed out.
+
+
+However there is a great deal to be said for consistency and +standardization. A common recurring problem when writing Unix +software is the myriad portability problems across Unix platforms. +It seems as if every Unix vendor had a look at the operating system +and found parts they could improve upon. Undoubtedly, these +modifications are probably innovative and solve real problems. +However, software developers have a hard time keeping up with all +these changes across so many platforms. + +
++And this has prompted the Unix vendors to begin to standardize their +systems. Hence the impetus for Spec1170. Every major Unix vendor +has committed to supporting this standard and every Unix software +developer waits with glee the day they can write software to this +standard and simply recompile (without having to use autoconf) +across different platforms. + +
+
+As I understand it, Spec1170 is roughly based upon version 4 of the
+X/Open Portability Guidelines (XPG4). Because catgets
and
+friends are defined in XPG4, I'm led to believe that catgets
+is a part of Spec1170 and hence will become a standardized component
+of all Unix systems.
+
+
+Now it seems kind of wasteful to me to have two different systems
+installed for accessing message catalogs. If we do want to remedy
+catgets
deficiencies why don't we try to expand catgets
+(in a compatible manner) rather than implement an entirely new system.
+Otherwise, we'll end up with two message catalog access systems installed
+with an operating system - one set of routines for packages using GNU
+gettext
for their internationalization, and another set of routines
+(catgets) for all other software. Bloated?
+
+
+Supposing another catalog access system is implemented. Which do
+we recommend? At least for Linux, we need to attract as many
+software developers as possible. Hence we need to make it as easy
+for them to port their software as possible. Which means supporting
+catgets
. We will be implementing the libintl
code
+within our libc
, but does this mean we also have to incorporate
+another message catalog access scheme within our libc
as well?
+And what about people who are going to be using the libintl
++ non-catgets
routines. When they port their software to
+other platforms, they're now going to have to include the front-end
+(libintl
) code plus the back-end code (the non-catgets
+access routines) with their software instead of just including the
+libintl
code with their software.
+
+
+Message catalog support is however only the tip of the iceberg.
+What about the data for the other locale categories. They also have
+a number of deficiencies. Are we going to abandon them as well and
+develop another duplicate set of routines (should libintl
+expand beyond message catalog support)?
+
+
+Like many parts of Unix that can be improved upon, we're stuck with balancing +compatibility with the past with useful improvements and innovations for +the future. + +
+ + ++X/Open agreed very late on the standard form so that many +implementations differ from the final form. Both of my system (old +Linux catgets and Ultrix-4) have a strange variation. + +
+
+OK. After incorporating the last changes I have to spend some time on
+making the GNU/Linux libc
gettext
functions. So in future
+Solaris is not the only system having gettext
.
+
+
+Go to the first, previous, next, last section, table of contents. + + diff --git a/doc/gettext_foot.html b/doc/gettext_foot.html new file mode 100644 index 000000000..2bebe6dd6 --- /dev/null +++ b/doc/gettext_foot.html @@ -0,0 +1,42 @@ + +
+ + ++
+
In this manual, all mentions of Emacs +refers to either GNU Emacs or to XEmacs, which people sometimes call FSF +Emacs and Lucid Emacs, respectively. +
This
+limitation is not imposed by GNU gettext
, but is for compatibility
+with the msgfmt
implementation on Solaris.
+
Some
+system, eg Ultrix, don't have LC_MESSAGES
. Here we use a more or
+less arbitrary value for it, namely 1729, the smallest positive integer
+which can be represented in two different ways as the sum of two cubes.
+
When the system does not support setlocale
its behavior
+in setting the locale values is simulated by looking at the environment
+variables.
+
Additions are welcome. Send appropriate information to +@email{bug-glibc-manual@gnu.org +
+This document was generated on 19 April 2001 using the +texi2html +translator version 1.51.
+ + diff --git a/doc/gettext_toc.html b/doc/gettext_toc.html new file mode 100644 index 000000000..c635c1633 --- /dev/null +++ b/doc/gettext_toc.html @@ -0,0 +1,146 @@ + + + + ++
+
msgmerge
Program
++This document was generated on 19 April 2001 using the +texi2html +translator version 1.51.
+ +