manpagez: man pages & more
info groff
Home | html | info | man

File: groff.info,  Node: Manipulating Hyphenation,  Next: Manipulating Spacing,  Prev: Manipulating Filling and Adjustment,  Up: GNU troff Reference

5.10 Manipulating Hyphenation
=============================

When filling, GNU 'troff' hyphenates words as needed at user-specified
and automatically determined hyphenation points.  The machine-driven
determination of hyphenation points in words requires algorithms and
data, and is susceptible to conventions and preferences.  Before
tackling such "automatic hyphenation", let us consider how hyphenation
points can be set explicitly.

   Explicitly hyphenated words such as "mother-in-law" are eligible for
breaking after each of their hyphens.  Relatively few words in a
language offer such obvious break points, however, and automatic
detection of syllabic (or phonetic) boundaries for hyphenation is not
perfect,(1) (*note Manipulating Hyphenation-Footnote-1::) particularly
for unusual words found in technical literature.  We can instruct GNU
'troff' how to hyphenate specific words if the need arises.

 -- Request: .hw word ...
     Define each "hyphenation exception" WORD with each hyphen '-' in
     the word indicating a hyphenation point.  For example, the request

          .hw in-sa-lub-rious alpha

     marks potential hyphenation points in "insalubrious", and prevents
     "alpha" from being hyphenated at all.

     Besides the space character, any character whose hyphenation code
     is zero can be used to separate the arguments of 'hw' (see the
     'hcode' request below).  In addition, this request can be used more
     than once.

     Hyphenation points specified with 'hw' are not subject to the
     within-word placement restrictions imposed by the 'hy' request (see
     below).

     Hyphenation exceptions specified with the 'hw' request are
     associated with the hyphenation language (see the 'hla' request
     below) and environment (*note Environments::); invoking the 'hw'
     request in the absence of a hyphenation language is an error.

     The request is ignored if there are no parameters.

   These are known as hyphenation exceptions in the expectation that
most users will avail themselves of automatic hyphenation; these
exceptions override any rules that would normally apply to a word
matching a hyphenation exception defined with 'hw'.

   Situations also arise when only a specific occurrence of a word needs
its hyphenation altered or suppressed, or when a URL or similar string
needs to be breakable in sensible places without hyphenation.

 -- Escape sequence: \%
 -- Escape sequence: \:
     To tell GNU 'troff' how to hyphenate words as they occur in input,
     use the '\%' escape sequence; it is the default "hyphenation
     character".  Each instance within a word indicates to GNU 'troff'
     that the word may be hyphenated at that point, while prefixing a
     word with this escape sequence prevents it from being otherwise
     hyphenated.  This mechanism affects only that occurrence of the
     word; to change the hyphenation of a word for the remainder of
     input processing, use the 'hw' request.

     GNU 'troff' regards the escape sequences '\X' and '\Y' as starting
     a word; that is, the '\%' escape sequence in, say,
     '\X'...'\%foobar' or '\Y'...'\%foobar' no longer prevents
     hyphenation of 'foobar' but inserts a hyphenation point just prior
     to it; most likely this isn't what you want.  *Note Postprocessor
     Access::.

     '\:' inserts a non-printing break point; that is, a word can break
     there, but the soft hyphen glyph (see below) is not written to the
     output if it does.  This escape sequence is an input word boundary,
     so the remainder of the word is subject to hyphenation as normal.

     You can combine '\:' and '\%' to control breaking of a file name or
     URL, or to permit hyphenation only after certain explicit hyphens
     within a word.

          The \%Lethbridge-Stewart-\:\%Sackville-Baggins divorce
          was, in retrospect, inevitable once the contents of
          \%/var/log/\:\%httpd/\:\%access_log on the family web
          server came to light, revealing visitors from Hogwarts.

 -- Request: .hc [char]
     Change the hyphenation character to CHAR.  This character then
     works as the '\%' escape sequence normally does, and thus no longer
     appears in the output.(2)  (*note Manipulating
     Hyphenation-Footnote-2::) Without an argument, 'hc' resets the
     hyphenation character to '\%' (the default).  The hyphenation
     character is associated with the environment (*note
     Environments::).

 -- Request: .shc [c]
     Set the "soft hyphen character", inserted when a word is hyphenated
     automatically or at a hyphenation character, to the ordinary or
     special character C.(3)  (*note Manipulating
     Hyphenation-Footnote-3::) If the argument is omitted, the soft
     hyphen character is set to the default, '\[hy]'.  If no glyph for C
     exists in the font in use at a potential hyphenation point, then
     the line is not broken there.  Neither character definitions
     (specified with the 'char' and similar requests) nor translations
     (specified with the 'tr' request) are applied to C.

   Several requests influence automatic hyphenation.  Because
conventions vary, a variety of hyphenation modes is available to the
'hy' request; these determine whether hyphenation will apply to a word
prior to breaking a line at the end of a page (more or less; see below
for details), and at which positions within that word automatically
determined hyphenation points are permissible.  The places within a word
that are eligible for hyphenation are determined by language-specific
data and lettercase relationships.  Furthermore, hyphenation of a word
might be suppressed due to a limit on consecutive hyphenated lines
('hlm'), a minimum line length threshold ('hym'), or because the line
can instead be adjusted with additional inter-word space ('hys').

 -- Request: .hy [mode]
 -- Register: \n[.hy]
     Set automatic hyphenation mode to MODE, an integer encoding
     conditions for hyphenation; if omitted, '1' is implied.  The
     hyphenation mode is available in the read-only register '.hy'; it
     is associated with the environment (*note Environments::).  The
     default hyphenation mode depends on the localization file loaded
     when GNU 'troff' starts up; see the 'hpf' request below.

     Typesetting practice generally does not avail itself of every
     opportunity for hyphenation, but the details differ by language and
     site mandates.  The hyphenation modes of AT&T 'troff' were
     implemented with English-language publishing practices of the 1970s
     in mind, not a scrupulous enumeration of conceivable parameters.
     GNU 'troff' extends those modes such that finer-grained control is
     possible, favoring compatibility with older implementations over a
     more intuitive arrangement.  The means of hyphenation mode control
     is a set of numbers that can be added up to encode the behavior
     sought.(4)  (*note Manipulating Hyphenation-Footnote-4::) The
     entries in the following table are termed "values"; the sum of the
     desired values is the "mode".

     '0'
          disables hyphenation.

     '1'
          enables hyphenation except after the first and before the last
          character of a word.

     The remaining values "imply" 1; that is, they enable hyphenation
     under the same conditions as '.hy 1', and then apply or lift
     restrictions relative to that basis.

     '2'
          disables hyphenation of the last word on a page,(5) (*note
          Manipulating Hyphenation-Footnote-5::) even for explicitly
          hyphenated words.

     '4'
          disables hyphenation before the last two characters of a word.

     '8'
          disables hyphenation after the first two characters of a word.

     '16'
          enables hyphenation before the last character of a word.

     '32'
          enables hyphenation after the first character of a word.

     Apart from value 2, restrictions imposed by the hyphenation mode
     are _not_ respected for words whose hyphenations have been
     specified with the hyphenation character ('\%' by default) or the
     'hw' request.

     Nonzero values in the previous table are additive.  For example,
     mode 12 causes GNU 'troff' to hyphenate neither the last two nor
     the first two characters of a word.  Some values cannot be used
     together because they contradict; for instance, values 4 and 16,
     and values 8 and 32.  As noted, it is superfluous to add 1 to any
     non-zero even mode.

     The automatic placement of hyphens in words is determined by
     "pattern files", which are derived from TeX and available for
     several languages.  The number of characters at the beginning of a
     word after which the first hyphenation point should be inserted is
     determined by the patterns themselves; it can't be reduced further
     without introducing additional, invalid hyphenation points
     (unfortunately, this information is not part of a pattern file--you
     have to know it in advance).  The same is true for the number of
     characters at the end of a word before the last hyphenation point
     should be inserted.  For example, you can supply the following
     input to 'echo $(nroff)'.

          .ll 1
          .hy 48
          splitting

     You will get

          s- plit- t- in- g

     instead of the correct 'split- ting'.  English patterns as
     distributed with GNU 'troff' need two characters at the beginning
     and three characters at the end; this means that value 4 of 'hy' is
     mandatory.  Value 8 is possible as an additional restriction, but
     values 16 and 32 should be avoided, as should mode 1.  Modes 4
     and 6 are typical.

     A table of left and right minimum character counts for hyphenation
     as needed by the patterns distributed with GNU 'troff' follows; see
     the 'groff_tmac(5)' man page for more information on GNU 'troff''s
     language macro files.

     language             pattern name   left min   right min
     -----------------------------------------------------------
     Czech                cs             2          2
     English              en             2          3
     French               fr             2          3
     German traditional   det            2          2
     German reformed      den            2          2
     Italian              it             2          2
     Swedish              sv             1          2

     Hyphenation exceptions within pattern files (i.e., the words within
     a TeX '\hyphenation' group) obey the hyphenation restrictions given
     by 'hy'.

 -- Request: .nh
     Disable automatic hyphenation; i.e., set the hyphenation mode to 0
     (see above).  The hyphenation mode of the last call to 'hy' is not
     remembered.

 -- Request: .hpf pattern-file
 -- Request: .hpfa pattern-file
 -- Request: .hpfcode a b [c d] ...
     Read hyphenation patterns from PATTERN-FILE, which is sought in the
     same way that macro files are with the 'mso' request or the
     '-mNAME' command-line option to 'groff'.  The PATTERN-FILE should
     have the same format as (simple) TeX pattern files.  More
     specifically, the following scanning rules are implemented.

        * A percent sign starts a comment (up to the end of the line)
          even if preceded by a backslash.

        * "Digraphs" like '\$' are not supported.

        * '^^XX' (where each X is 0-9 or a-f) and '^^C' (character C in
          the code point range 0-127 decimal) are recognized; other uses
          of '^' cause an error.

        * No macro expansion is performed.

        * 'hpf' checks for the expression '\patterns{...}' (possibly
          with whitespace before or after the braces).  Everything
          between the braces is taken as hyphenation patterns.
          Consequently, '{' and '}' are not allowed in patterns.

        * Similarly, '\hyphenation{...}' gives a list of hyphenation
          exceptions.

        * '\endinput' is recognized also.

        * For backward compatibility, if '\patterns' is missing, the
          whole file is treated as a list of hyphenation patterns
          (except that the '%' character is recognized as the start of a
          comment).

     The 'hpfa' request appends a file of patterns to the current list.

     The 'hpfcode' request defines mapping values for character codes in
     pattern files.  It is an older mechanism no longer used by GNU
     'troff''s own macro files; for its successor, see 'hcode' below.
     'hpf' or 'hpfa' apply the mapping after reading the patterns but
     before replacing or appending to the active list of patterns.  Its
     arguments are pairs of character codes--integers from 0 to 255.
     The request maps character code A to code B, code C to code D, and
     so on.  Character codes that would otherwise be invalid in GNU
     'troff' can be used.  By default, every code maps to itself except
     those for letters 'A' to 'Z', which map to those for 'a' to 'z'.

     The set of hyphenation patterns is associated with the language set
     by the 'hla' request (see below).  The 'hpf' request is usually
     invoked by a localization file loaded by the 'troffrc' file.(6)
     (*note Manipulating Hyphenation-Footnote-6::)

     A second call to 'hpf' (for the same language) replaces the
     hyphenation patterns with the new ones.  Invoking 'hpf' or 'hpfa'
     causes an error if there is no hyphenation language.  If no 'hpf'
     request is specified (either in the document, in a file loaded at
     startup, or in a macro package), GNU 'troff' won't automatically
     hyphenate at all.

 -- Request: .hcode c1 code1 [c2 code2] ...
     Set the hyphenation code of character C1 to CODE1, that of C2 to
     CODE2, and so on.  A hyphenation code must be an ordinary character
     (not a special character escape sequence) other than a digit or a
     space.  The request is ignored if given no arguments.

     For hyphenation to work, hyphenation codes must be set up.  At
     startup, GNU 'troff' assigns hyphenation codes to the letters
     'a'-'z' (mapped to themselves), to the letters 'A'-'Z' (mapped to
     'a'-'z'), and zero to all other characters.  Normally, hyphenation
     patterns contain only lowercase letters which should be applied
     regardless of case.  In other words, they assume that the words
     'FOO' and 'Foo' should be hyphenated exactly as 'foo' is.  The
     'hcode' request extends this principle to letters outside the
     Unicode basic Latin alphabet; without it, words containing such
     letters won't be hyphenated properly even if the corresponding
     hyphenation patterns contain them.

     For example, the following 'hcode' requests are necessary to assign
     hyphenation codes to the letters 'ÄäÖöÜüß', needed for German.

          .hcode ä ä  Ä ä
          .hcode ö ö  Ö ö
          .hcode ü ü  Ü ü
          .hcode ß ß

     Without these assignments, GNU 'troff' treats the German word
     'Kindergärten' (the plural form of 'kindergarten') as two words
     'kinderg' and 'rten' because the hyphenation code of the umlaut a
     is zero by default, just like a space.  There is a German
     hyphenation pattern that covers 'kinder', so GNU 'troff' finds the
     hyphenation 'kin-der'.  The other two hyphenation points
     ('kin-der-gär-ten') are missed.

 -- Request: .hla lang
 -- Register: \n[.hla]
     Set the hyphenation language to LANG.  Hyphenation exceptions
     specified with the 'hw' request and hyphenation patterns and
     exceptions specified with the 'hpf' and 'hpfa' requests are
     associated with the hyphenation language.  The 'hla' request is
     usually invoked by a localization file, which is turn loaded by the
     'troffrc' or 'troffrc-end' file; see the 'hpf' request above.

     The hyphenation language is available in the read-only
     string-valued register '.hla'; it is associated with the
     environment (*note Environments::).

 -- Request: .hlm [n]
 -- Register: \n[.hlm]
 -- Register: \n[.hlc]
     Set the maximum quantity of consecutive hyphenated lines to N.  If
     N is negative, there is no maximum.  If omitted, N is -1.  This
     value is associated with the environment (*note Environments::).
     Only lines output from a given environment count toward the maximum
     associated with that environment.  Hyphens resulting from '\%' are
     counted; explicit hyphens are not.

     The '.hlm' read-only register stores this maximum.  The count of
     immediately preceding consecutive hyphenated lines is available in
     the read-only register '.hlc'.

 -- Request: .hym [length]
 -- Register: \n[.hym]
     Set the (right) hyphenation margin to LENGTH.  If the adjustment
     mode is not 'b' or 'n', the line is not hyphenated if it is shorter
     than LENGTH.  Without an argument, the hyphenation margin is reset
     to its default value, 0.  The default scaling unit is 'm'.  The
     hyphenation margin is associated with the environment (*note
     Environments::).

     A negative argument resets the hyphenation margin to zero, emitting
     a warning in category 'range'.

     The hyphenation margin is available in the '.hym' read-only
     register.

 -- Request: .hys [hyphenation-space]
 -- Register: \n[.hys]
     Suppress hyphenation of the line in adjustment modes 'b' or 'n' if
     it can be justified by adding no more than HYPHENATION-SPACE extra
     space to each inter-word space.  Without an argument, the
     hyphenation space adjustment threshold is set to its default value,
     0.  The default scaling unit is 'm'.  The hyphenation space
     adjustment threshold is associated with the environment (*note
     Environments::).

     A negative argument resets the hyphenation space adjustment
     threshold to zero, emitting a warning in category 'range'.

     The hyphenation space adjustment threshold is available in the
     '.hys' read-only register.

© manpagez.com 2000-2024
Individual documents may contain additional copyright information.