info groff

File: groff.info, Node: Gtroff Internals, Next: Debugging, Prev: Miscellaneous, Up: GNU troff Reference

5.36 'gtroff' Internals
=======================

'gtroff' processes input in three steps. One or more input characters
are converted to an "input token".(1) (*note Gtroff
Internals-Footnote-1::) Then, one or more input tokens are converted to
an "output node". Finally, output nodes are converted to the
intermediate output language understood by all output devices.

Actually, before step one happens, 'gtroff' converts certain escape
sequences into reserved input characters (not accessible by the user);
such reserved characters are used for other internal processing also -
this is the very reason why not all characters are valid input. *Note
Identifiers::, for more on this topic.

For example, the input string 'fi\[:u]' is converted into a character
token 'f', a character token 'i', and a special token ':u' (representing
u umlaut). Later on, the character tokens 'f' and 'i' are merged to a
single output node representing the ligature glyph 'fi' (provided the
current font has a glyph for this ligature); the same happens with ':u'.
All output glyph nodes are 'processed', which means that they are
invariably associated with a given font, font size, advance width, etc.
During the formatting process, 'gtroff' itself adds various nodes to
control the data flow.

Macros, diversions, and strings collect elements in two chained
lists: a list of input tokens that have been passed unprocessed, and a
list of output nodes. Consider the following diversion.

.di xxx
a
\!b
c
.br
.di

It contains these elements.

node list token list element number

line start node -- 1
glyph node 'a' -- 2
word space node -- 3
-- 'b' 4
-- '\n' 5
glyph node 'c' -- 6
vertical size node -- 7
vertical size node -- 8
-- '\n' 9

Elements 1, 7, and 8 are inserted by 'gtroff'; the latter two (which are
always present) specify the vertical extent of the last line, possibly
modified by '\x'. The 'br' request finishes the pending output line,
inserting a newline input token, which is subsequently converted to a
space when the diversion is reread. Note that the word space node has a
fixed width that isn't adjustable anymore. To convert horizontal space
nodes back to input tokens, use the 'unformat' request.

Macros only contain elements in the token list (and the node list is
empty); diversions and strings can contain elements in both lists.

The 'chop' request simply reduces the number of elements in a macro,
string, or diversion by one. Exceptions are "compatibility save" and
"compatibility ignore" input tokens, which are ignored. The 'substring'
request also ignores those input tokens.

Some requests like 'tr' or 'cflags' work on glyph identifiers only;
this means that the associated glyph can be changed without destroying
this association. This can be very helpful for substituting glyphs. In
the following example, we assume that glyph 'foo' isn't available by
default, so we provide a substitution using the 'fchar' request and map
it to input character 'x'.

.fchar \[foo] foo
.tr x \[foo]

Now let us assume that we install an additional special font 'bar' that
has glyph 'foo'.

.special bar
.rchar \[foo]

Since glyphs defined with 'fchar' are searched before glyphs in special
fonts, we must call 'rchar' to remove the definition of the fallback
glyph. Anyway, the translation is still active; 'x' now maps to the
real glyph 'foo'.

Macro and request arguments preserve compatibility mode enablement.

.cp 1 \" switch to compatibility mode
.de xx
\\$1
..
.cp 0 \" switch compatibility mode off
.xx caf\['e]
=> café

Since compatibility mode is enabled while 'de' is invoked, the macro
'xx' enables compatibility mode when it is called. Argument '$1' can
still be handled properly because it inherits the compatibility mode
enablement status that was active at the point where 'xx' was called.

After interpolation of the parameters, the compatibility save and
restore tokens are removed.