[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
10.2 GNU gettext
The facilities in GNU gettext
focus on messages; strings printed
by a program, either directly or via formatting with printf
or
sprintf()
.(53)
When using GNU gettext
, each application has its own
text domain. This is a unique name, such as ‘kpilot’ or ‘gawk’,
that identifies the application.
A complete application may have multiple components—programs written
in C or C++, as well as scripts written in sh
or awk
.
All of the components use the same text domain.
To make the discussion concrete, assume we’re writing an application
named guide
. Internationalization consists of the
following steps, in this order:
-
The programmer goes
through the source for all of
guide
’s components and marks each string that is a candidate for translation. For example,"`-F': option required"
is a good candidate for translation. A table with strings of option names is not (e.g.,gawk
’s ‘--profile’ option should remain the same, no matter what the local language). -
The programmer indicates the application’s text domain
(
"guide"
) to thegettext
library, by calling thetextdomain()
function. - Messages from the application are extracted from the source code and collected into a portable object template file (‘guide.pot’), which lists the strings and their translations. The translations are initially empty. The original (usually English) messages serve as the key for lookup of the translations.
-
For each language with a translator, ‘guide.pot’
is copied to a portable object file (
.po
) and translations are created and shipped with the application. For example, there might be a ‘fr.po’ for a French translation. - Each language’s ‘.po’ file is converted into a binary message object (‘.mo’) file. A message object file contains the original messages and their translations in a binary format that allows fast lookup of translations at runtime.
-
When
guide
is built and installed, the binary translation files are installed in a standard place. -
For testing and development, it is possible to tell
gettext
to use ‘.mo’ files in a different directory than the standard one by using thebindtextdomain()
function. -
At runtime,
guide
looks up each string via a call togettext()
. The returned string is the translated string if available, or the original string if not. - If necessary, it is possible to access messages from a different text domain than the one belonging to the application, without having to switch the application’s default text domain back and forth.
In C (or C++), the string marking and dynamic translation lookup
are accomplished by wrapping each string in a call to gettext()
:
printf("%s", gettext("Don't Panic!\n")); |
The tools that extract messages from source code pull out all
strings enclosed in calls to gettext()
.
The GNU gettext
developers, recognizing that typing
‘gettext(…)’ over and over again is both painful and ugly to look
at, use the macro ‘_’ (an underscore) to make things easier:
/* In the standard header file: */ #define _(str) gettext(str) /* In the program text: */ printf("%s", _("Don't Panic!\n")); |
This reduces the typing overhead to just three extra characters per string and is considerably easier to read as well.
There are locale categories
for different types of locale-related information.
The defined locale categories that gettext
knows about are:
-
LC_MESSAGES
Text messages. This is the default category for
gettext
operations, but it is possible to supply a different one explicitly, if necessary. (It is almost never necessary to supply a different category.)-
LC_COLLATE
Text-collation information; i.e., how different characters and/or groups of characters sort in a given language.
-
LC_CTYPE
Character-type information (alphabetic, digit, upper- or lowercase, and so on). This information is accessed via the POSIX character classes in regular expressions, such as
/[[:alnum:]]/
(see section Regular Expression Operators).-
LC_MONETARY
Monetary information, such as the currency symbol, and whether the symbol goes before or after a number.
-
LC_NUMERIC
Numeric information, such as which characters to use for the decimal point and the thousands separator.(54)
-
LC_RESPONSE
Response information, such as how “yes” and “no” appear in the local language, and possibly other information as well.
-
LC_TIME
Time- and date-related information, such as 12- or 24-hour clock, month printed before or after the day in a date, local month abbreviations, and so on.
-
LC_ALL
All of the above. (Not too useful in the context of
gettext
.)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |