[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
8.4 Changing the lexical structure of words
The macro
changeword
and all associated functionality is experimental. It is only available if the ‘--enable-changeword’ option was given toconfigure
, at GNUm4
installation time. The functionality will go away in the future, to be replaced by other new features that are more efficient at providing the same capabilities. Do not rely on it. Please direct your comments about it the same way you would do for bugs.
A file being processed by m4
is split into quoted strings, words
(potential macro names) and simple tokens (any other single character).
Initially a word is defined by the following regular expression:
[_a-zA-Z][_a-zA-Z0-9]*
Using changeword
, you can change this regular expression:
- Optional builtin: changeword (regex)
Changes the regular expression for recognizing macro names to be regex. If regex is empty, use ‘[_a-zA-Z][_a-zA-Z0-9]*’. regex must obey the constraint that every prefix of the desired final pattern is also accepted by the regular expression. If regex contains grouping parentheses, the macro invoked is the portion that matched the first group, rather than the entire matching string.
The expansion of
changeword
is void. The macrochangeword
is recognized only with parameters.
Relaxing the lexical rules of m4
might be useful (for example) if
you wanted to apply translations to a file of numbers:
ifdef(`changeword', `', `errprint(` skipping: no changeword support ')m4exit(`77')')dnl changeword(`[_a-zA-Z0-9]+') ⇒ define(`1', `0')1 ⇒0
Tightening the lexical rules is less useful, because it will generally make some of the builtins unavailable. You could use it to prevent accidental call of builtins, for example:
ifdef(`changeword', `', `errprint(` skipping: no changeword support ')m4exit(`77')')dnl define(`_indir', defn(`indir')) ⇒ changeword(`_[_a-zA-Z0-9]*') ⇒ esyscmd(`foo') ⇒esyscmd(foo) _indir(`esyscmd', `echo hi') ⇒hi ⇒
Because m4
constructs its words a character at a time, there
is a restriction on the regular expressions that may be passed to
changeword
. This is that if your regular expression accepts
‘foo’, it must also accept ‘f’ and ‘fo’.
ifdef(`changeword', `', `errprint(` skipping: no changeword support ')m4exit(`77')')dnl define(`foo ', `bar ') ⇒ dnl This example wants to recognize changeword, dnl, and `foo\n'. dnl First, we check that our regexp will match. regexp(`changeword', `[cd][a-z]*\|foo[ ]') ⇒0 regexp(`foo ', `[cd][a-z]*\|foo[ ]') ⇒0 regexp(`f', `[cd][a-z]*\|foo[ ]') ⇒-1 foo ⇒foo changeword(`[cd][a-z]*\|foo[ ]') ⇒ dnl Even though `foo\n' matches, we forgot to allow `f'. foo ⇒foo changeword(`[cd][a-z]*\|fo*[ ]?') ⇒ dnl Now we can call `foo\n'. foo ⇒bar
changeword
has another function. If the regular expression
supplied contains any grouped subexpressions, then text outside
the first of these is discarded before symbol lookup. So:
ifdef(`changeword', `', `errprint(` skipping: no changeword support ')m4exit(`77')')dnl ifdef(`__unix__', , `errprint(` skipping: syscmd does not have unix semantics ')m4exit(`77')')dnl changecom(`/*', `*/')dnl define(`foo', `bar')dnl changeword(`#\([_a-zA-Z0-9]*\)') ⇒ #esyscmd(`echo foo \#foo') ⇒foo bar ⇒
m4
now requires a ‘#’ mark at the beginning of every
macro invocation, so one can use m4
to preprocess plain
text without losing various words like ‘divert’.
In m4
, macro substitution is based on text, while in TeX, it
is based on tokens. changeword
can throw this difference into
relief. For example, here is the same idea represented in TeX and
m4
. First, the TeX version:
\def\a{\message{Hello}} \catcode`\@=0 \catcode`\\=12 @a @bye ⇒Hello
Then, the m4
version:
ifdef(`changeword', `', `errprint(` skipping: no changeword support ')m4exit(`77')')dnl define(`a', `errprint(`Hello')')dnl changeword(`@\([_a-zA-Z0-9]*\)') ⇒ @a ⇒errprint(Hello)
In the TeX example, the first line defines a macro a
to
print the message ‘Hello’. The second line defines <@> to
be usable instead of <\> as an escape character. The third line
defines <\> to be a normal printing character, not an escape.
The fourth line invokes the macro a
. So, when TeX is run
on this file, it displays the message ‘Hello’.
When the m4
example is passed through m4
, it outputs
‘errprint(Hello)’. The reason for this is that TeX does
lexical analysis of macro definition when the macro is defined.
m4
just stores the text, postponing the lexical analysis until
the macro is used.
You should note that using changeword
will slow m4
down
by a factor of about seven, once it is changed to something other
than the default regular expression. You can invoke changeword
with the empty string to restore the default word definition, and regain
the parsing speed.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on September 29, 2013 using texi2html 5.0.