info groff

File: groff.info, Node: Using Symbols, Next: Character Classes, Prev: Font Positions, Up: Using Fonts

5.19.4 Using Symbols
--------------------

A "glyph" is a graphical representation of a "character". While a
character is an abstraction of semantic information, a glyph is
something that can be seen on screen or paper. A character has many
possible representation forms (for example, the character 'A' can be
written in an upright or slanted typeface, producing distinct glyphs).
Sometimes, a sequence of characters map to a single glyph: this is a
"ligature"--the most common is 'fi'.

Space characters never become glyphs in GNU 'troff'. If not
discarded (as when trailing on text lines), they are represented by
horizontal motions in the output.

A "symbol" is simply a named glyph. Within 'gtroff', all glyph names
of a particular font are defined in its font file. If the user requests
a glyph not available in this font, 'gtroff' looks up an ordered list of
"special fonts". By default, the PostScript output device supports the
two special fonts 'SS' (slanted symbols) and 'S' (symbols) (the former
is looked up before the latter). Other output devices use different
names for special fonts. Fonts mounted with the 'fonts' keyword in the
'DESC' file are globally available. To install additional special fonts
locally (i.e., for a particular font), use the 'fspecial' request.

Here are the exact rules how 'gtroff' searches a given symbol:

* If the symbol has been defined with the 'char' request, use it.
This hides a symbol with the same name in the current font.

* Check the current font.

* If the symbol has been defined with the 'fchar' request, use it.

* Check whether the current font has a font-specific list of special
fonts; test all fonts in the order of appearance in the last
'fspecial' call if appropriate.

* If the symbol has been defined with the 'fschar' request for the
current font, use it.

* Check all fonts in the order of appearance in the last 'special'
call.

* If the symbol has been defined with the 'schar' request, use it.

* As a last resort, consult all fonts loaded up to now for special
fonts and check them, starting with the lowest font number. This
can sometimes lead to surprising results since the 'fonts' line in
the 'DESC' file often contains empty positions, which are filled
later on. For example, consider the following:

fonts 3 0 0 FOO

This mounts font 'foo' at font position 3. We assume that 'FOO' is
a special font, containing glyph 'foo', and that no font has been
loaded yet. The line

.fspecial BAR BAZ

makes font 'BAZ' special only if font 'BAR' is active. We further
assume that 'BAZ' is really a special font, i.e., the font
description file contains the 'special' keyword, and that it also
contains glyph 'foo' with a special shape fitting to font 'BAR'.
After executing 'fspecial', font 'BAR' is loaded at font
position 1, and 'BAZ' at position 2.

We now switch to a new font 'XXX', trying to access glyph 'foo'
that is assumed to be missing. There are neither font-specific
special fonts for 'XXX' nor any other fonts made special with the
'special' request, so 'gtroff' starts the search for special fonts
in the list of already mounted fonts, with increasing font
positions. Consequently, it finds 'BAZ' before 'FOO' even for
'XXX', which is not the intended behaviour.

*Note Device and Font Description Files::, and *note Special Fonts::,
for more details.

The 'groff_char(7)' man page houses a complete list of predefined
special character names, but the availability of any as a glyph is
device- and font-dependent. For example, say

man -Tdvi groff_char > groff_char.dvi

to obtain those available with the DVI device and default font
configuration.(1) (*note Using Symbols-Footnote-1::) If you want to use
an additional macro package to change the fonts used, 'groff' (or
'gtroff') must be run directly.

groff -Tdvi -mec -man groff_char.7 > groff_char.dvi

Special character names not listed in 'groff_char(7)' are derived
algorithmically, using a simplified version of the Adobe Glyph List
(AGL) algorithm, which is described in
. The (frozen) set of
names that can't be derived algorithmically is called the "'groff' glyph
list (GGL)".

* A glyph for Unicode character U+XXXX[X[X]], which is not a
composite character is named 'uXXXX[X[X]]'. X must be an uppercase
hexadecimal digit. Examples: 'u1234', 'u008E', 'u12DB8'. The
largest Unicode value is 0x10FFFF. There must be at least four 'X'
digits; if necessary, add leading zeroes (after the 'u'). No zero
padding is allowed for character codes greater than 0xFFFF.
Surrogates (i.e., Unicode values greater than 0xFFFF represented
with character codes from the surrogate area U+D800-U+DFFF) are not
allowed either.

* A glyph representing more than a single input character is named

'u' COMPONENT1 '_' COMPONENT2 '_' COMPONENT3 ...

Example: 'u0045_0302_0301'.

For simplicity, all Unicode characters that are composites must be
maximally decomposed to NFD;(2) (*note Using Symbols-Footnote-2::)
for example, 'u00CA_0301' is not a valid glyph name since U+00CA
(LATIN CAPITAL LETTER E WITH CIRCUMFLEX) can be further decomposed
into U+0045 (LATIN CAPITAL LETTER E) and U+0302 (COMBINING
CIRCUMFLEX ACCENT). 'u0045_0302_0301' is thus the glyph name for
U+1EBE, LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND ACUTE.

* groff maintains a table to decompose all algorithmically derived
glyph names that are composites itself. For example, 'u0100'
(LATIN LETTER A WITH MACRON) is automatically decomposed into
'u0041_0304'. Additionally, a glyph name of the GGL is preferred
to an algorithmically derived glyph name; 'groff' also
automatically does the mapping. Example: The glyph 'u0045_0302' is
mapped to '^E'.

* glyph names of the GGL can't be used in composite glyph names; for
example, '^E_u0301' is invalid.

-- Escape sequence: \(nm
-- Escape sequence: \[name]
-- Escape sequence: \[base-glyph combining-component ...]
Typeset a special character NAME (two-character name NM) or a
composite glyph consisting of BASE-GLYPH overlaid with one or more
COMBINING-COMPONENTs. For example, '\[A ho]' is a capital letter
"A" with a "hook accent" (ogonek).

There is no special syntax for one-character names--the analogous
form '\N' would collide with other escape sequences. However, the
four escape sequences '\'', '\-', '\_', and '\`', are translated on
input to the special character escape sequences '\[aa]', '\[-]',
'\[ul]', and '\[ga]', respectively.

A special character name of length one is not the same thing as an
ordinary character: that is, the character 'a' is not the same as
'\[a]'.

If NAME is undefined, a warning in category 'char' is produced and
the escape is ignored. *Note Warnings::, for information about the
enablement and suppression of warnings.

GNU 'troff' resolves '\[...]' with more than a single component as
follows:

* Any component that is found in the GGL is converted to the
'uXXXX' form.

* Any component 'uXXXX' that is found in the list of
decomposable glyphs is decomposed.

* The resulting elements are then concatenated with '_' in
between, dropping the leading 'u' in all elements but the
first.

No check for the existence of any component (similar to 'tr'
request) is done.

Examples:

'\[A ho]'
'A' maps to 'u0041', 'ho' maps to 'u02DB', thus the final
glyph name would be 'u0041_02DB'. This is not the expected
result: the ogonek glyph 'ho' is a spacing ogonek, but for a
proper composite a non-spacing ogonek (U+0328) is necessary.
Looking into the file 'composite.tmac', one can find
'.composite ho u0328', which changes the mapping of 'ho' while
a composite glyph name is constructed, causing the final glyph
name to be 'u0041_0328'.

'\[^E u0301]'
'\[^E aa]'
'\[E a^ aa]'
'\[E ^ ']'
'^E' maps to 'u0045_0302', thus the final glyph name is
'u0045_0302_0301' in all forms (assuming proper calls of the
'composite' request).

It is not possible to define glyphs with names like 'A ho' within a
'groff' font file. This is not really a limitation; instead, you
have to define 'u0041_0328'.

-- Escape sequence: \C'xxx'
Typeset the glyph of the special character XXX. Normally, it is
more convenient to use '\[XXX]', but '\C' has some advantages: it
is compatible with AT&T device-independent 'troff' (and therefore
available in compatibility mode(3) (*note Using
Symbols-Footnote-3::)) and can interpolate special characters with
']' in their names. The delimiter need not be a neutral
apostrophe; see *note Delimiters::.

-- Request: .composite id1 id2
Map special character name ID1 to ID2 if ID1 is used in '\[...]'
with more than one component. See above for examples. This is a
strict rewriting of the special character name; no check is
performed for the existence of a glyph for either. A set of
default mappings for many accents can be found in the file
'composite.tmac', loaded by the default 'troffrc' at startup.

-- Escape sequence: \N'n'
Typeset the glyph with code N in the current font ('n' is _not_ the
input character code). The number N can be any non-negative
decimal integer. Most devices only have glyphs with codes between
0 and 255; the Unicode output device uses codes in the range
0-65535. If the current font does not contain a glyph with that
code, special fonts are _not_ searched. The '\N' escape sequence
can be conveniently used in conjunction with the 'char' request:

.char \[phone] \f[ZD]\N'37'

The code of each glyph is given in the fourth column in the font
description file after the 'charset' command. It is possible to
include unnamed glyphs in the font description file by using a name
of '---'; the '\N' escape sequence is the only way to use these.

No kerning is applied to glyphs accessed with '\N'. The delimiter
need not be a neutral apostrophe; see *note Delimiters::.

A few escape sequences are also special characters.

-- Escape sequence: \'
An escaped neutral apostrophe is a synonym for '\[aa]' (acute
accent).

-- Escape sequence: \`
An escaped grave accent is a synonym for '\[ga]' (grave accent).

-- Escape sequence: \-
An escaped hyphen-minus is a synonym for '\[-]' (minus sign).

-- Escape sequence: \_
An escaped underscore ("low line") is a synonym for '\[ul]'
(underrule). On typesetting devices, the underrule is
font-invariant and drawn lower than the underscore '_'.

-- Request: .cflags n c1 c2 ...
Assign properties encoded by the number N to characters C1, C2, and
so on.

Input characters, including special characters introduced by an
escape, have certain properties associated with them.(4) (*note
Using Symbols-Footnote-4::) These properties can be modified with
this request. The first argument is the sum of the desired flags
and the remaining arguments are the characters to be assigned those
properties. Spaces between the CN arguments are optional. Any
argument CN can be a character class defined with the 'class'
request rather than an individual character. *Note Character
Classes::.

The non-negative integer N is the sum of any of the following.
Some combinations are nonsensical, such as '33' (1 + 32).

'1'
Recognize the character as ending a sentence if followed by a
newline or two spaces. Initially, characters '.?!' have this
property.

'2'
Enable breaks before the character. A line is not broken at a
character with this property unless the characters on each
side both have non-zero hyphenation codes. This exception can
be overridden by adding 64. Initially, no characters have
this property.

'4'
Enable breaks after the character. A line is not broken at a
character with this property unless the characters on each
side both have non-zero hyphenation codes. This exception can
be overridden by adding 64. Initially, characters
'\-\[hy]\[em]' have this property.

'8'
Mark the glyph associated with this character as overlapping
other instances of itself horizontally. Initially, characters
'\[ul]\[rn]\[ru]\[radicalex]\[sqrtex]' have this property.

'16'
Mark the glyph associated with this character as overlapping
other instances of itself vertically. Initially, the
character '\[br]' has this property.

'32'
Mark the character as transparent for the purpose of
end-of-sentence recognition. In other words, an
end-of-sentence character followed by any number of characters
with this property is treated as the end of a sentence if
followed by a newline or two spaces. This is the same as
having a zero space factor in TeX. Initially, characters
'"')]*\[dg]\[dd]\[rq]\[cq]' have this property.

'64'
Ignore hyphenation codes of the surrounding characters. Use
this in combination with values 2 and 4 (initially, no
characters have this property).

For example, if you need an automatic break point after the
en-dash in numeric ranges like "3000-5000", insert

.cflags 68 \[en]

into your document. However, this practice can lead to bad
layout if done thoughtlessly; in most situations, a better
solution instead of changing the 'cflags' value is to insert
'\:' right after the hyphen at the places that really need a
break point.

The remaining values were implemented for East Asian language
support; those who use alphabetic scripts exclusively can disregard
them.

'128'
Prohibit a line break before the character, but allow a line
break after the character. This works only in combination
with flags 256 and 512 and has no effect otherwise.
Initially, no characters have this property.

'256'
Prohibit a line break after the character, but allow a line
break before the character. This works only in combination
with flags 128 and 512 and has no effect otherwise.
Initially, no characters have this property.

'512'
Allow line break before or after the character. This works
only in combination with flags 128 and 256 and has no effect
otherwise. Initially, no characters have this property.

In contrast to values 2 and 4, the values 128, 256, and 512 work
pairwise. If, for example, the left character has value 512, and
the right character 128, no break will be automatically inserted
between them. If we use value 6 instead for the left character, a
break after the character can't be suppressed since the neighboring
character on the right doesn't get examined.

-- Request: .char c [contents]
-- Request: .fchar c [contents]
-- Request: .fschar f c [contents]
-- Request: .schar c [contents]
Define a new character or glyph C to be CONTENTS, which can be
empty. More precisely, 'char' defines a 'groff' object (or
redefines an existing one) that is accessed with the name C on
input, and produces CONTENTS on output. Every time glyph C needs
to be printed, CONTENTS is processed in a temporary environment and
the result is wrapped up into a single object. Compatibility mode
is turned off and the escape character is set to '\' while CONTENTS
is processed. Any emboldening, constant spacing, or track kerning
is applied to this object rather than to individual glyphs in
CONTENTS.

An object defined by these requests can be used just like a normal
glyph provided by the output device. In particular, other
characters can be translated to it with the 'tr' or 'trin'
requests; it can be made the leader character with the 'lc'
request; repeated patterns can be drawn with it using the '\l' and
'\L' escape sequences; and words containing C can be hyphenated
correctly if the 'hcode' request is used to give the object a
hyphenation code.

There is a special anti-recursion feature: use of the object within
its own definition is handled like a normal character (not defined
with 'char').

The 'tr' and 'trin' requests take precedence if 'char' accesses the
same symbol.

.tr XY
X
=> Y
.char X Z
X
=> Y
.tr XX
X
=> Z

The 'fchar' request defines a fallback glyph: 'gtroff' only checks
for glyphs defined with 'fchar' if it cannot find the glyph in the
current font. 'gtroff' carries out this test before checking
special fonts.

'fschar' defines a fallback glyph for font F: 'gtroff' checks for
glyphs defined with 'fschar' after the list of fonts declared as
font-specific special fonts with the 'fspecial' request, but before
the list of fonts declared as global special fonts with the
'special' request.

Finally, the 'schar' request defines a global fallback glyph:
'gtroff' checks for glyphs defined with 'schar' after the list of
fonts declared as global special fonts with the 'special' request,
but before the already mounted special fonts.

*Note Character Classes::.

-- Request: .rchar c ...
-- Request: .rfschar f c ...
Remove definition of each ordinary or special character C, undoing
the effect of a 'char', 'fchar', or 'schar' request. Those
supplied by font description files cannot be removed. Spaces and
tabs may separate C arguments.

The request 'rfschar' removes glyph definitions defined with
'fschar' for font F.