info gettext

File: gettext.info, Node: Locale Names, Next: Locale Environment Variables, Up: Setting the POSIX Locale

2.3.1 Locale Names
------------------

A locale name usually has the form ‘LL_CC’. Here

• ‘LL’ is an ISO 639 two-letter language code. For some languages, a
two-letter code does not exist, and a three-letter code is used
instead.
• ‘CC’ is an ISO 3166 two-letter code of a country or territory.

For example, for German in Germany, LL is ‘de’, and CC is ‘DE’. You
find a list of the language codes in appendix *note Language Codes:: and
a list of the country codes in appendix *note Country Codes::.

You might think that the country code specification is redundant.
But in fact, some languages have dialects in different countries. For
example, ‘de_AT’ is used for Austria, and ‘pt_BR’ for Brazil. The
country code serves to distinguish the dialects.

Many locale names have an extended syntax ‘LL_CC.ENCODING’ that also
specifies the character encoding. These are in use because between 2000
and 2005, most users have switched to locales in UTF-8 encoding. For
example, the German locale on glibc systems is nowadays ‘de_DE.UTF-8’.
The older name ‘de_DE’ still refers to the German locale as of 2000 that
stores characters in ISO-8859-1 encoding - a text encoding that cannot
even accommodate the Euro currency sign.

Some locale names use ‘LL_CC@VARIANT’ instead of ‘LL_CC’. The
‘@VARIANT’ can denote any kind of characteristics that is not already
implied by the language LL and the country CC. It can denote a
particular monetary unit. For example, on glibc systems, ‘de_DE@euro’
denotes the locale that uses the Euro currency, in contrast to the older
locale ‘de_DE’ which implies the use of the currency before 2002. It
can also denote a dialect of the language, or the script used to write
text (for example, ‘sr_RS@latin’ uses the Latin script, whereas ‘sr_RS’
uses the Cyrillic script to write Serbian), or the orthography rules, or
similar.

On other systems, some variations of this scheme are used, such as
‘LL’. You can get the list of locales supported by your system for your
language by running the command ‘locale -a | grep '^LL'’.

There are also two special locales:
• The locale called ‘C’.
When it is used, it disables all localization: in this locale, all
programs standardized by POSIX use English messages and an
unspecified character encoding (often US-ASCII, but sometimes also
ISO-8859-1 or UTF-8, depending on the operating system).
• The locale called ‘C.UTF-8’.
This locale exists on all modern GNU and Unix systems, but not on
all operating systems. When it is used, it disables all
localization as well. It uses UTF-8 as character encoding.