manpagez: man pages & more
info groff
Home | html | info | man

File: groff.info,  Node: Input Encodings,  Next: Input Conventions,  Prev: Macro Packages,  Up: Text

5.1.9 Input Encodings
---------------------

The 'groff' command's '-k' option calls the 'preconv' preprocessor to
perform input character encoding conversions.  Input to the GNU 'troff'
formatter itself, on the other hand, must be in one of two encodings it
can recognize.

'cp1047'
     The code page 1047 input encoding works only on EBCDIC platforms
     (and conversely, the other input encodings don't work with EBCDIC);
     the file 'cp1047.tmac' is loaded at startup.

'latin1'
     ISO Latin-1, an encoding for Western European languages, is the
     default input encoding on non-EBCDIC platforms; the file
     'latin1.tmac' is loaded at startup.

Any document that is encoded in ISO 646:1991 (a descendant of USAS
X3.4-1968 or "US-ASCII"), or, equivalently, uses only code points from
the "C0 Controls" and "Basic Latin" parts of the Unicode character set
is also a valid ISO Latin-1 document; the standards are interchangeable
in their first 128 code points.(1)  (*note Input Encodings-Footnote-1::)

   Other encodings are supported by means of macro packages.

'latin2'
     To use ISO Latin-2, an encoding for Central and Eastern European
     languages, invoke '.mso latin2.tmac' at the beginning of your
     document or supply '-mlatin2' as a command-line argument to
     'groff'.

'latin5'
     To use ISO Latin-5, an encoding for the Turkish language, invoke
     '.mso latin5.tmac' at the beginning of your document or supply
     '-mlatin5' as a command-line argument to 'groff'.

'latin9'
     ISO Latin-9 succeeds Latin-1; it includes a Euro sign and better
     glyph coverage for French.  To use this encoding, invoke
     '.mso latin9.tmac' at the beginning of your document or supply
     '-mlatin9' as a command-line argument to 'groff'.

   Some characters from an input encoding may not be available with a
particular output driver, or their glyphs may not have representation in
the font used.  For terminal devices, fallbacks are defined, like 'EUR'
for the Euro sign and '(C)' for the copyright sign.  For typesetter
devices, you may need to "mount" fonts that support glyphs required by
the document.  *Note Font Positions::.

   Because a Euro glyph was not historically defined in PostScript
fonts, 'groff' comes with a font called 'freeeuro.pfa' that provides the
Euro in several styles.  Standard PostScript fonts contain the glyphs
from Latin-5 and Latin-9 that Latin-1 lacks, so these encodings are
supported for the 'ps' and 'pdf' output devices as 'groff' ships, while
Latin-2 is not.

   Unicode supports characters from all other input encodings; the
'utf8' output driver for terminals therefore does as well.  The DVI
output driver supports the Latin-2 and Latin-9 encodings if the
command-line option '-mec' is used as well.  (2)  (*note Input
Encodings-Footnote-2::)

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.