File: groff.info, Node: Manipulating Hyphenation, Next: Manipulating Spacing, Prev: Manipulating Filling and Adjustment, Up: GNU troff Reference 5.10 Manipulating Hyphenation ============================= When filling, GNU 'troff' hyphenates words as needed at user-specified and automatically determined hyphenation points. The machine-driven determination of hyphenation points in words requires algorithms and data, and is susceptible to conventions and preferences. Before tackling such "automatic hyphenation", let us consider how hyphenation points can be set explicitly. Explicitly hyphenated words such as "mother-in-law" are eligible for breaking after each of their hyphens. Relatively few words in a language offer such obvious break points, however, and automatic detection of syllabic (or phonetic) boundaries for hyphenation is not perfect,(1) (*note Manipulating Hyphenation-Footnote-1::) particularly for unusual words found in technical literature. We can instruct GNU 'troff' how to hyphenate specific words if the need arises. -- Request: .hw word ... Define each "hyphenation exception" WORD with each hyphen '-' in the word indicating a hyphenation point. For example, the request .hw in-sa-lub-rious alpha marks potential hyphenation points in "insalubrious", and prevents "alpha" from being hyphenated at all. Besides the space character, any character whose hyphenation code is zero can be used to separate the arguments of 'hw' (see the 'hcode' request below). In addition, this request can be used more than once. Hyphenation points specified with 'hw' are not subject to the within-word placement restrictions imposed by the 'hy' request (see below). Hyphenation exceptions specified with the 'hw' request are associated with the hyphenation language (see the 'hla' request below) and environment (*note Environments::); invoking the 'hw' request in the absence of a hyphenation language is an error. The request is ignored if there are no parameters. These are known as hyphenation exceptions in the expectation that most users will avail themselves of automatic hyphenation; these exceptions override any rules that would normally apply to a word matching a hyphenation exception defined with 'hw'. Situations also arise when only a specific occurrence of a word needs its hyphenation altered or suppressed, or when a URL or similar string needs to be breakable in sensible places without hyphenation. -- Escape sequence: \% -- Escape sequence: \: To tell GNU 'troff' how to hyphenate words as they occur in input, use the '\%' escape sequence; it is the default "hyphenation character". Each instance within a word indicates to GNU 'troff' that the word may be hyphenated at that point, while prefixing a word with this escape sequence prevents it from being otherwise hyphenated. This mechanism affects only that occurrence of the word; to change the hyphenation of a word for the remainder of input processing, use the 'hw' request. GNU 'troff' regards the escape sequences '\X' and '\Y' as starting a word; that is, the '\%' escape sequence in, say, '\X'...'\%foobar' or '\Y'...'\%foobar' no longer prevents hyphenation of 'foobar' but inserts a hyphenation point just prior to it; most likely this isn't what you want. *Note Postprocessor Access::. '\:' inserts a non-printing break point; that is, a word can break there, but the soft hyphen glyph (see below) is not written to the output if it does. This escape sequence is an input word boundary, so the remainder of the word is subject to hyphenation as normal. You can combine '\:' and '\%' to control breaking of a file name or URL, or to permit hyphenation only after certain explicit hyphens within a word. The \%Lethbridge-Stewart-\:\%Sackville-Baggins divorce was, in retrospect, inevitable once the contents of \%/var/log/\:\%httpd/\:\%access_log on the family web server came to light, revealing visitors from Hogwarts. -- Request: .hc [char] Change the hyphenation character to CHAR. This character then works as the '\%' escape sequence normally does, and thus no longer appears in the output.(2) (*note Manipulating Hyphenation-Footnote-2::) Without an argument, 'hc' resets the hyphenation character to '\%' (the default). The hyphenation character is associated with the environment (*note Environments::). -- Request: .shc [c] Set the "soft hyphen character", inserted when a word is hyphenated automatically or at a hyphenation character, to the ordinary or special character C.(3) (*note Manipulating Hyphenation-Footnote-3::) If the argument is omitted, the soft hyphen character is set to the default, '\[hy]'. If no glyph for C exists in the font in use at a potential hyphenation point, then the line is not broken there. Neither character definitions (specified with the 'char' and similar requests) nor translations (specified with the 'tr' request) are applied to C. Several requests influence automatic hyphenation. Because conventions vary, a variety of hyphenation modes is available to the 'hy' request; these determine whether hyphenation will apply to a word prior to breaking a line at the end of a page (more or less; see below for details), and at which positions within that word automatically determined hyphenation points are permissible. The places within a word that are eligible for hyphenation are determined by language-specific data and lettercase relationships. Furthermore, hyphenation of a word might be suppressed due to a limit on consecutive hyphenated lines ('hlm'), a minimum line length threshold ('hym'), or because the line can instead be adjusted with additional inter-word space ('hys'). -- Request: .hy [mode] -- Register: \n[.hy] Set automatic hyphenation mode to MODE, an integer encoding conditions for hyphenation; if omitted, '1' is implied. The hyphenation mode is available in the read-only register '.hy'; it is associated with the environment (*note Environments::). The default hyphenation mode depends on the localization file loaded when GNU 'troff' starts up; see the 'hpf' request below. Typesetting practice generally does not avail itself of every opportunity for hyphenation, but the details differ by language and site mandates. The hyphenation modes of AT&T 'troff' were implemented with English-language publishing practices of the 1970s in mind, not a scrupulous enumeration of conceivable parameters. GNU 'troff' extends those modes such that finer-grained control is possible, favoring compatibility with older implementations over a more intuitive arrangement. The means of hyphenation mode control is a set of numbers that can be added up to encode the behavior sought.(4) (*note Manipulating Hyphenation-Footnote-4::) The entries in the following table are termed "values"; the sum of the desired values is the "mode". '0' disables hyphenation. '1' enables hyphenation except after the first and before the last character of a word. The remaining values "imply" 1; that is, they enable hyphenation under the same conditions as '.hy 1', and then apply or lift restrictions relative to that basis. '2' disables hyphenation of the last word on a page,(5) (*note Manipulating Hyphenation-Footnote-5::) even for explicitly hyphenated words. '4' disables hyphenation before the last two characters of a word. '8' disables hyphenation after the first two characters of a word. '16' enables hyphenation before the last character of a word. '32' enables hyphenation after the first character of a word. Apart from value 2, restrictions imposed by the hyphenation mode are _not_ respected for words whose hyphenations have been specified with the hyphenation character ('\%' by default) or the 'hw' request. Nonzero values in the previous table are additive. For example, mode 12 causes GNU 'troff' to hyphenate neither the last two nor the first two characters of a word. Some values cannot be used together because they contradict; for instance, values 4 and 16, and values 8 and 32. As noted, it is superfluous to add 1 to any non-zero even mode. The automatic placement of hyphens in words is determined by "pattern files", which are derived from TeX and available for several languages. The number of characters at the beginning of a word after which the first hyphenation point should be inserted is determined by the patterns themselves; it can't be reduced further without introducing additional, invalid hyphenation points (unfortunately, this information is not part of a pattern file--you have to know it in advance). The same is true for the number of characters at the end of a word before the last hyphenation point should be inserted. For example, you can supply the following input to 'echo $(nroff)'. .ll 1 .hy 48 splitting You will get s- plit- t- in- g instead of the correct 'split- ting'. English patterns as distributed with GNU 'troff' need two characters at the beginning and three characters at the end; this means that value 4 of 'hy' is mandatory. Value 8 is possible as an additional restriction, but values 16 and 32 should be avoided, as should mode 1. Modes 4 and 6 are typical. A table of left and right minimum character counts for hyphenation as needed by the patterns distributed with GNU 'troff' follows; see the 'groff_tmac(5)' man page for more information on GNU 'troff''s language macro files. language pattern name left min right min ----------------------------------------------------------- Czech cs 2 2 English en 2 3 French fr 2 3 German traditional det 2 2 German reformed den 2 2 Italian it 2 2 Swedish sv 1 2 Hyphenation exceptions within pattern files (i.e., the words within a TeX '\hyphenation' group) obey the hyphenation restrictions given by 'hy'. -- Request: .nh Disable automatic hyphenation; i.e., set the hyphenation mode to 0 (see above). The hyphenation mode of the last call to 'hy' is not remembered. -- Request: .hpf pattern-file -- Request: .hpfa pattern-file -- Request: .hpfcode a b [c d] ... Read hyphenation patterns from PATTERN-FILE, which is sought in the same way that macro files are with the 'mso' request or the '-mNAME' command-line option to 'groff'. The PATTERN-FILE should have the same format as (simple) TeX pattern files. More specifically, the following scanning rules are implemented. * A percent sign starts a comment (up to the end of the line) even if preceded by a backslash. * "Digraphs" like '\$' are not supported. * '^^XX' (where each X is 0-9 or a-f) and '^^C' (character C in the code point range 0-127 decimal) are recognized; other uses of '^' cause an error. * No macro expansion is performed. * 'hpf' checks for the expression '\patterns{...}' (possibly with whitespace before or after the braces). Everything between the braces is taken as hyphenation patterns. Consequently, '{' and '}' are not allowed in patterns. * Similarly, '\hyphenation{...}' gives a list of hyphenation exceptions. * '\endinput' is recognized also. * For backward compatibility, if '\patterns' is missing, the whole file is treated as a list of hyphenation patterns (except that the '%' character is recognized as the start of a comment). The 'hpfa' request appends a file of patterns to the current list. The 'hpfcode' request defines mapping values for character codes in pattern files. It is an older mechanism no longer used by GNU 'troff''s own macro files; for its successor, see 'hcode' below. 'hpf' or 'hpfa' apply the mapping after reading the patterns but before replacing or appending to the active list of patterns. Its arguments are pairs of character codes--integers from 0 to 255. The request maps character code A to code B, code C to code D, and so on. Character codes that would otherwise be invalid in GNU 'troff' can be used. By default, every code maps to itself except those for letters 'A' to 'Z', which map to those for 'a' to 'z'. The set of hyphenation patterns is associated with the language set by the 'hla' request (see below). The 'hpf' request is usually invoked by a localization file loaded by the 'troffrc' file.(6) (*note Manipulating Hyphenation-Footnote-6::) A second call to 'hpf' (for the same language) replaces the hyphenation patterns with the new ones. Invoking 'hpf' or 'hpfa' causes an error if there is no hyphenation language. If no 'hpf' request is specified (either in the document, in a file loaded at startup, or in a macro package), GNU 'troff' won't automatically hyphenate at all. -- Request: .hcode c1 code1 [c2 code2] ... Set the hyphenation code of character C1 to CODE1, that of C2 to CODE2, and so on. A hyphenation code must be an ordinary character (not a special character escape sequence) other than a digit or a space. The request is ignored if given no arguments. For hyphenation to work, hyphenation codes must be set up. At startup, GNU 'troff' assigns hyphenation codes to the letters 'a'-'z' (mapped to themselves), to the letters 'A'-'Z' (mapped to 'a'-'z'), and zero to all other characters. Normally, hyphenation patterns contain only lowercase letters which should be applied regardless of case. In other words, they assume that the words 'FOO' and 'Foo' should be hyphenated exactly as 'foo' is. The 'hcode' request extends this principle to letters outside the Unicode basic Latin alphabet; without it, words containing such letters won't be hyphenated properly even if the corresponding hyphenation patterns contain them. For example, the following 'hcode' requests are necessary to assign hyphenation codes to the letters 'ÄäÖöÜüß', needed for German. .hcode ä ä Ä ä .hcode ö ö Ö ö .hcode ü ü Ü ü .hcode ß ß Without these assignments, GNU 'troff' treats the German word 'Kindergärten' (the plural form of 'kindergarten') as two words 'kinderg' and 'rten' because the hyphenation code of the umlaut a is zero by default, just like a space. There is a German hyphenation pattern that covers 'kinder', so GNU 'troff' finds the hyphenation 'kin-der'. The other two hyphenation points ('kin-der-gär-ten') are missed. -- Request: .hla lang -- Register: \n[.hla] Set the hyphenation language to LANG. Hyphenation exceptions specified with the 'hw' request and hyphenation patterns and exceptions specified with the 'hpf' and 'hpfa' requests are associated with the hyphenation language. The 'hla' request is usually invoked by a localization file, which is turn loaded by the 'troffrc' or 'troffrc-end' file; see the 'hpf' request above. The hyphenation language is available in the read-only string-valued register '.hla'; it is associated with the environment (*note Environments::). -- Request: .hlm [n] -- Register: \n[.hlm] -- Register: \n[.hlc] Set the maximum quantity of consecutive hyphenated lines to N. If N is negative, there is no maximum. If omitted, N is -1. This value is associated with the environment (*note Environments::). Only lines output from a given environment count toward the maximum associated with that environment. Hyphens resulting from '\%' are counted; explicit hyphens are not. The '.hlm' read-only register stores this maximum. The count of immediately preceding consecutive hyphenated lines is available in the read-only register '.hlc'. -- Request: .hym [length] -- Register: \n[.hym] Set the (right) hyphenation margin to LENGTH. If the adjustment mode is not 'b' or 'n', the line is not hyphenated if it is shorter than LENGTH. Without an argument, the hyphenation margin is reset to its default value, 0. The default scaling unit is 'm'. The hyphenation margin is associated with the environment (*note Environments::). A negative argument resets the hyphenation margin to zero, emitting a warning in category 'range'. The hyphenation margin is available in the '.hym' read-only register. -- Request: .hys [hyphenation-space] -- Register: \n[.hys] Suppress hyphenation of the line in adjustment modes 'b' or 'n' if it can be justified by adding no more than HYPHENATION-SPACE extra space to each inter-word space. Without an argument, the hyphenation space adjustment threshold is set to its default value, 0. The default scaling unit is 'm'. The hyphenation space adjustment threshold is associated with the environment (*note Environments::). A negative argument resets the hyphenation space adjustment threshold to zero, emitting a warning in category 'range'. The hyphenation space adjustment threshold is available in the '.hys' read-only register.