Top |
Functions
PangoScript | pango_script_for_unichar () |
PangoLanguage * | pango_script_get_sample_language () |
PangoScriptIter * | pango_script_iter_new () |
void | pango_script_iter_get_range () |
gboolean | pango_script_iter_next () |
void | pango_script_iter_free () |
PangoLanguage * | pango_language_from_string () |
const char * | pango_language_to_string () |
gboolean | pango_language_matches () |
gboolean | pango_language_includes_script () |
const PangoScript * | pango_language_get_scripts () |
PangoLanguage * | pango_language_get_default () |
const char * | pango_language_get_sample_string () |
Types and Values
enum | PangoScript |
#define | PANGO_TYPE_SCRIPT |
PangoScriptIter | |
PangoLanguage | |
#define | PANGO_TYPE_LANGUAGE |
Description
The functions in this section are used to identify the writing system, or script of individual characters and of ranges within a larger text string.
Functions
pango_script_for_unichar ()
PangoScript
pango_script_for_unichar (gunichar ch
);
Looks up the PangoScript for a particular character (as defined by
Unicode Standard Annex #24). No check is made for ch
being a
valid Unicode character; if you pass in invalid character, the
result is undefined.
As of Pango 1.18, this function simply returns the return value of
g_unichar_get_script()
.
Since: 1.4
pango_script_get_sample_language ()
PangoLanguage *
pango_script_get_sample_language (PangoScript script
);
Given a script, finds a language tag that is reasonably
representative of that script. This will usually be the
most widely spoken or used language written in that script:
for instance, the sample language for PANGO_SCRIPT_CYRILLIC
is ru
(Russian), the sample language
for PANGO_SCRIPT_ARABIC
is ar
.
For some
scripts, no sample language will be returned because there
is no language that is sufficiently representative. The best
example of this is PANGO_SCRIPT_HAN
, where various different
variants of written Chinese, Japanese, and Korean all use
significantly different sets of Han characters and forms
of shared characters. No sample language can be provided
for many historical scripts as well.
As of 1.18, this function checks the environment variables
PANGO_LANGUAGE and LANGUAGE (checked in that order) first.
If one of them is set, it is parsed as a list of language tags
separated by colons or other separators. This function
will return the first language in the parsed list that Pango
believes may use script
for writing. This last predicate
is tested using pango_language_includes_script()
. This can
be used to control Pango's font selection for non-primary
languages. For example, a PANGO_LANGUAGE enviroment variable
set to "en:fa" makes Pango choose fonts suitable for Persian (fa)
instead of Arabic (ar) when a segment of Arabic text is found
in an otherwise non-Arabic text. The same trick can be used to
choose a default language for PANGO_SCRIPT_HAN
when setting
context language is not feasible.
Returns
a PangoLanguage that is representative
of the script, or NULL
if no such language exists.
[nullable]
Since: 1.4
pango_script_iter_new ()
PangoScriptIter * pango_script_iter_new (const char *text
,int length
);
Create a new PangoScriptIter, used to break a string of
Unicode text into runs by Unicode script. No copy is made of
text
, so the caller needs to make sure it remains valid until
the iterator is freed with pango_script_iter_free()
.
Returns
the new script iterator, initialized
to point at the first range in the text, which should be
freed with pango_script_iter_free()
. If the string is
empty, it will point at an empty range.
Since: 1.4
pango_script_iter_get_range ()
void pango_script_iter_get_range (PangoScriptIter *iter
,const char **start
,const char **end
,PangoScript *script
);
Gets information about the range to which iter
currently points.
The range is the set of locations p where *start <= p < *end.
(That is, it doesn't include the character stored at *end)
Parameters
iter |
||
start |
location to store start position of the range, or |
[out][allow-none] |
end |
location to store end position of the range, or |
[out][allow-none] |
script |
location to store script for range, or |
[out][allow-none] |
Since: 1.4
pango_script_iter_next ()
gboolean
pango_script_iter_next (PangoScriptIter *iter
);
Advances a PangoScriptIter to the next range. If iter
is already at the end, it is left unchanged and FALSE
is returned.
Since: 1.4
pango_script_iter_free ()
void
pango_script_iter_free (PangoScriptIter *iter
);
Frees a PangoScriptIter created with pango_script_iter_new()
.
Since: 1.4
pango_language_from_string ()
PangoLanguage *
pango_language_from_string (const char *language
);
Take a RFC-3066 format language tag as a string and convert it to a PangoLanguage pointer that can be efficiently copied (copy the pointer) and compared with other language tags (compare the pointer.)
This function first canonicalizes the string by converting it to lowercase, mapping '_' to '-', and stripping all characters other than letters and '-'.
Use pango_language_get_default()
if you want to get the PangoLanguage for
the current locale of the process.
Returns
an opaque pointer to a
PangoLanguage structure, or NULL
if language
was
NULL
. The returned pointer will be valid forever
after, and should not be freed.
[transfer none][nullable]
pango_language_to_string ()
const char *
pango_language_to_string (PangoLanguage *language
);
Gets the RFC-3066 format string representing the given language tag.
pango_language_matches ()
gboolean pango_language_matches (PangoLanguage *language
,const char *range_list
);
Checks if a language tag matches one of the elements in a list of language ranges. A language tag is considered to match a range in the list if the range is '*', the range is exactly the tag, or the range is a prefix of the tag, and the character after it in the tag is '-'.
Parameters
language |
a language tag (see |
[nullable] |
range_list |
a list of language ranges, separated by ';', ':',
',', or space characters.
Each element must either be '*', or a RFC 3066 language range
canonicalized as by |
pango_language_includes_script ()
gboolean pango_language_includes_script (PangoLanguage *language
,PangoScript script
);
Determines if script
is one of the scripts used to
write language
. The returned value is conservative;
if nothing is known about the language tag language
,
TRUE
will be returned, since, as far as Pango knows,
script
might be used to write language
.
This routine is used in Pango's itemization process when determining if a supplied language tag is relevant to a particular section of text. It probably is not useful for applications in most circumstances.
This function uses pango_language_get_scripts()
internally.
Returns
TRUE
if script
is one of the scripts used
to write language
or if nothing is known about language
(including the case that language
is NULL
),
FALSE
otherwise.
Since: 1.4
pango_language_get_scripts ()
const PangoScript * pango_language_get_scripts (PangoLanguage *language
,int *num_scripts
);
Determines the scripts used to to write language
.
If nothing is known about the language tag language
,
or if language
is NULL
, then NULL
is returned.
The list of scripts returned starts with the script that the
language uses most and continues to the one it uses least.
The value num_script
points at will be set to the number
of scripts in the returned array (or zero if NULL
is returned).
Most languages use only one script for writing, but there are
some that use two (Latin and Cyrillic for example), and a few
use three (Japanese for example). Applications should not make
any assumptions on the maximum number of scripts returned
though, except that it is positive if the return value is not
NULL
, and it is a small number.
The pango_language_includes_script()
function uses this function
internally.
Parameters
language |
a PangoLanguage, or |
[allow-none] |
num_scripts |
location to return number of scripts,
or |
[out caller-allocates][allow-none] |
Returns
An array of
PangoScript values, with the number of entries in the array stored
in num_scripts
, or NULL
if Pango does not have any information
about this particular language tag (also the case if language
is
NULL
). The returned array is owned by Pango and should not be
modified or freed.
[array length=num_scripts][nullable]
Since: 1.22
pango_language_get_default ()
PangoLanguage *
pango_language_get_default (void
);
Returns the PangoLanguage for the current locale of the process. Note that this can change over the life of an application.
On Unix systems, this is the return value is derived from
setlocale(LC_CTYPE, NULL)
, and the user can
affect this through the environment variables LC_ALL, LC_CTYPE or
LANG (checked in that order). The locale string typically is in
the form lang_COUNTRY, where lang is an ISO-639 language code, and
COUNTRY is an ISO-3166 country code. For instance, sv_FI for
Swedish as written in Finland or pt_BR for Portuguese as written in
Brazil.
On Windows, the C library does not use any such environment
variables, and setting them won't affect the behavior of functions
like ctime()
. The user sets the locale through the Regional Options
in the Control Panel. The C library (in the setlocale()
function)
does not use country and language codes, but country and language
names spelled out in English.
However, this function does check the above environment
variables, and does return a Unix-style locale string based on
either said environment variables or the thread's current locale.
Your application should call setlocale(LC_ALL, "");
for the user settings to take effect. Gtk+ does this in its initialization
functions automatically (by calling gtk_set_locale()
).
See man setlocale
for more details.
Since: 1.16
pango_language_get_sample_string ()
const char *
pango_language_get_sample_string (PangoLanguage *language
);
Get a string that is representative of the characters needed to render a particular language.
The sample text may be a pangram, but is not necessarily. It is chosen to be demonstrative of normal text in the language, as well as exposing font feature requirements unique to the language. It is suitable for use as sample text in a font selection dialog.
If language
is NULL
, the default language as found by
pango_language_get_default()
is used.
If Pango does not have a sample string for language
, the classic
"The quick brown fox..." is returned. This can be detected by
comparing the returned pointer value to that returned for (non-existent)
language code "xx". That is, compare to:
Types and Values
enum PangoScript
The PangoScript enumeration identifies different writing systems. The values correspond to the names as defined in the Unicode standard. Note that new types may be added in the future. Applications should be ready to handle unknown values. This enumeration is interchangeable with GUnicodeScript. See Unicode Standard Annex 24: Script names.
Members
a value never returned from |
||
a character used by multiple different scripts |
||
a mark glyph that takes its script from the base glyph to which it is attached |
||
Arabic |
||
Armenian |
||
Bengali |
||
Bopomofo |
||
Cherokee |
||
Coptic |
||
Cyrillic |
||
Deseret |
||
Devanagari |
||
Ethiopic |
||
Georgian |
||
Gothic |
||
Greek |
||
Gujarati |
||
Gurmukhi |
||
Han |
||
Hangul |
||
Hebrew |
||
Hiragana |
||
Kannada |
||
Katakana |
||
Khmer |
||
Lao |
||
Latin |
||
Malayalam |
||
Mongolian |
||
Myanmar |
||
Ogham |
||
Old Italic |
||
Oriya |
||
Runic |
||
Sinhala |
||
Syriac |
||
Tamil |
||
Telugu |
||
Thaana |
||
Thai |
||
Tibetan |
||
Canadian Aboriginal |
||
Yi |
||
Tagalog |
||
Hanunoo |
||
Buhid |
||
Tagbanwa |
||
Braille |
||
Cypriot |
||
Limbu |
||
Osmanya |
||
Shavian |
||
Linear B |
||
Tai Le |
||
Ugaritic |
||
New Tai Lue. Since 1.10 |
||
Buginese. Since 1.10 |
||
Glagolitic. Since 1.10 |
||
Tifinagh. Since 1.10 |
||
Syloti Nagri. Since 1.10 |
||
Old Persian. Since 1.10 |
||
Kharoshthi. Since 1.10 |
||
an unassigned code point. Since 1.14 |
||
Balinese. Since 1.14 |
||
Cuneiform. Since 1.14 |
||
Phoenician. Since 1.14 |
||
Phags-pa. Since 1.14 |
||
N'Ko. Since 1.14 |
||
Kayah Li. Since 1.20.1 |
||
Lepcha. Since 1.20.1 |
||
Rejang. Since 1.20.1 |
||
Sundanese. Since 1.20.1 |
||
Saurashtra. Since 1.20.1 |
||
Cham. Since 1.20.1 |
||
Ol Chiki. Since 1.20.1 |
||
Vai. Since 1.20.1 |
||
Carian. Since 1.20.1 |
||
Lycian. Since 1.20.1 |
||
Lydian. Since 1.20.1 |
||
Batak. Since 1.32 |
||
Brahmi. Since 1.32 |
||
Mandaic. Since 1.32 |
||
Chakma. Since: 1.32 |
||
Meroitic Cursive. Since: 1.32 |
||
Meroitic Hieroglyphs. Since: 1.32 |
||
Miao. Since: 1.32 |
||
Sharada. Since: 1.32 |
||
Sora Sompeng. Since: 1.32 |
||
Takri. Since: 1.32 |
||
Bassa. Since: 1.40 |
||
Caucasian Albanian. Since: 1.40 |
||
Duployan. Since: 1.40 |
||
Elbasan. Since: 1.40 |
||
Grantha. Since: 1.40 |
||
Kjohki. Since: 1.40 |
||
Khudawadi, Sindhi. Since: 1.40 |
||
Linear A. Since: 1.40 |
||
Mahajani. Since: 1.40 |
||
Manichaean. Since: 1.40 |
||
Mende Kikakui. Since: 1.40 |
||
Modi. Since: 1.40 |
||
Mro. Since: 1.40 |
||
Nabataean. Since: 1.40 |
||
Old North Arabian. Since: 1.40 |
||
Old Permic. Since: 1.40 |
||
Pahawh Hmong. Since: 1.40 |
||
Palmyrene. Since: 1.40 |
||
Pau Cin Hau. Since: 1.40 |
||
Psalter Pahlavi. Since: 1.40 |
||
Siddham. Since: 1.40 |
||
Tirhuta. Since: 1.40 |
||
Warang Citi. Since: 1.40 |
||
Ahom. Since: 1.40 |
||
Anatolian Hieroglyphs. Since: 1.40 |
||
Hatran. Since: 1.40 |
||
Multani. Since: 1.40 |
||
Old Hungarian. Since: 1.40 |
||
Signwriting. Since: 1.40 |
PangoScriptIter
typedef struct _PangoScriptIter PangoScriptIter;
A PangoScriptIter is used to iterate through a string and identify ranges in different scripts.
PangoLanguage
typedef struct _PangoLanguage PangoLanguage;
The PangoLanguage structure is used to represent a language.
PangoLanguage pointers can be efficiently copied and compared with each other.
PANGO_TYPE_LANGUAGE
#define PANGO_TYPE_LANGUAGE (pango_language_get_type ())
The GObject type for PangoLanguage.