[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
20 Incompatibilities with Lex and Posix
flex
is a rewrite of the AT&T Unix lex tool (the two
implementations do not share any code, though), with some extensions and
incompatibilities, both of which are of concern to those who wish to
write scanners acceptable to both implementations. flex
is fully
compliant with the POSIX lex
specification, except that when
using %pointer
(the default), a call to unput()
destroys
the contents of yytext
, which is counter to the POSIX
specification. In this section we discuss all of the known areas of
incompatibility between flex
, AT&T lex
, and the POSIX
specification. flex
’s ‘-l’ option turns on maximum
compatibility with the original AT&T lex
implementation, at the
cost of a major loss in the generated scanner’s performance. We note
below which incompatibilities can be overcome using the ‘-l’
option. flex
is fully compatible with lex
with the
following exceptions:
-
The undocumented
lex
scanner internal variableyylineno
is not supported unless ‘-l’ or%option yylineno
is used. -
yylineno
should be maintained on a per-buffer basis, rather than a per-scanner (single global variable) basis. -
yylineno
is not part of the POSIX specification. -
The
input()
routine is not redefinable, though it may be called to read characters following whatever has been matched by a rule. Ifinput()
encounters an end-of-file the normalyywrap()
processing is done. A “real” end-of-file is returned byinput()
asEOF
. -
Input is instead controlled by defining the
YY_INPUT()
macro. -
The
flex
restriction thatinput()
cannot be redefined is in accordance with the POSIX specification, which simply does not specify any way of controlling the scanner’s input other than by making an initial assignment to ‘yyin’. -
The
unput()
routine is not redefinable. This restriction is in accordance with POSIX. -
flex
scanners are not as reentrant aslex
scanners. In particular, if you have an interactive scanner and an interrupt handler which long-jumps out of the scanner, and the scanner is subsequently called again, you may get the following message:fatal @code{flex} scanner internal error--end of buffer missed
To reenter the scanner, first use:
yyrestart( yyin );
Note that this call will throw away any buffered input; usually this isn’t a problem with an interactive scanner. See section Reentrant C Scanners, for
flex
’s reentrant API. -
Also note that
flex
C++ scanner classes are reentrant, so if using C++ is an option for you, you should use them instead. See section Generating C++ Scanners, and Reentrant C Scanners for details. -
output()
is not supported. Output from the ECHO macro is done to the file-pointeryyout
(default ‘stdout)’. -
output()
is not part of the POSIX specification. -
lex
does not support exclusive start conditions (%x), though they are in the POSIX specification. -
When definitions are expanded,
flex
encloses them in parentheses. Withlex
, the following:NAME [A-Z][A-Z0-9]* %% foo{NAME}? printf( "Found it\n" ); %%
will not match the string ‘foo’ because when the macro is expanded the rule is equivalent to ‘foo[A-Z][A-Z0-9]*?’ and the precedence is such that the ‘?’ is associated with ‘[A-Z0-9]*’. With
flex
, the rule will be expanded to ‘foo([A-Z][A-Z0-9]*)?’ and so the string ‘foo’ will match. -
Note that if the definition begins with ‘^’ or ends with ‘$’
then it is not expanded with parentheses, to allow these
operators to appear in definitions without losing their special
meanings. But the ‘<s>’, ‘/’, and
<<EOF>>
operators cannot be used in aflex
definition. -
Using ‘-l’ results in the
lex
behavior of no parentheses around the definition. - The POSIX specification is that the definition be enclosed in parentheses.
-
Some implementations of
lex
allow a rule’s action to begin on a separate line, if the rule’s pattern has trailing whitespace:%% foo|bar<space here> { foobar_action();}
flex
does not support this feature. -
The
lex
%r
(generate a Ratfor scanner) option is not supported. It is not part of the POSIX specification. -
After a call to
unput()
, yytext is undefined until the next token is matched, unless the scanner was built using%array
. This is not the case withlex
or the POSIX specification. The ‘-l’ option does away with this incompatibility. -
The precedence of the ‘{,}’ (numeric range) operator is
different. The AT&T and POSIX specifications of
lex
interpret ‘abc{1,3}’ as match one, two, or three occurrences of ‘abc’”, whereasflex
interprets it as “match ‘ab’ followed by one, two, or three occurrences of ‘c’”. The ‘-l’ and ‘--posix’ options do away with this incompatibility. -
The precedence of the ‘^’ operator is different.
lex
interprets ‘^foo|bar’ as “match either ’foo’ at the beginning of a line, or ’bar’ anywhere”, whereasflex
interprets it as “match either ‘foo’ or ‘bar’ if they come at the beginning of a line”. The latter is in agreement with the POSIX specification. -
The special table-size declarations such as
%a
supported bylex
are not required byflex
scanners..flex
ignores them. -
The name
FLEX_SCANNER
is#define
’d so scanners may be written for use with eitherflex
orlex
. Scanners also includeYY_FLEX_MAJOR_VERSION
,YY_FLEX_MINOR_VERSION
andYY_FLEX_SUBMINOR_VERSION
indicating which version offlex
generated the scanner. For example, for the 2.5.22 release, these defines would be 2, 5 and 22 respectively. If the version offlex
being used is a beta version, then the symbolFLEX_BETA
is defined. - The symbols ‘[[’ and ‘]]’ in the code sections of the input may conflict with the m4 delimiters. See section M4 Dependency.
The following flex
features are not included in lex
or the
POSIX specification:
- C++ scanners
- %option
- start condition scopes
- start condition stacks
- interactive/non-interactive scanners
- yy_scan_string() and friends
- yyterminate()
- yy_set_interactive()
- yy_set_bol()
- YY_AT_BOL() <<EOF>>
- <*>
- YY_DECL
- YY_START
- YY_USER_ACTION
- YY_USER_INIT
- #line directives
- %{}’s around actions
- reentrant C API
- multiple actions on a line
-
almost all of the
flex
command-line options
The feature “multiple actions on a line”
refers to the fact that with flex
you can put multiple actions on
the same line, separated with semi-colons, while with lex
, the
following:
foo handle_foo(); ++num_foos_seen;
is (rather surprisingly) truncated to
foo handle_foo();
flex
does not truncate the action. Actions that are not enclosed
in braces are simply terminated at the end of the line.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on November 4, 2011 using texi2html 5.0.