[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
6.1.2 Using Regular Expression Constants
When used on the righthand side of the ‘~’ or ‘!~’
operators, a regexp constant merely stands for the regexp that is to be
matched.
However, regexp constants (such as /foo/
) may be used like simple expressions.
When a
regexp constant appears by itself, it has the same meaning as if it appeared
in a pattern, i.e., ‘($0 ~ /foo/)’
(d.c.)
See section Expressions as Patterns.
This means that the following two code segments:
if ($0 ~ /barfly/ || $0 ~ /camelot/) print "found" |
and:
if (/barfly/ || /camelot/) print "found" |
are exactly equivalent. One rather bizarre consequence of this rule is that the following Boolean expression is valid, but does not do what the user probably intended:
# Note that /foo/ is on the left of the ~ if (/foo/ ~ $1) print "found foo" |
This code is “obviously” testing $1
for a match against the regexp
/foo/
. But in fact, the expression ‘/foo/ ~ $1’ really means
‘($0 ~ /foo/) ~ $1’. In other words, first match the input record
against the regexp /foo/
. The result is either zero or one,
depending upon the success or failure of the match. That result
is then matched against the first field in the record.
Because it is unlikely that you would ever really want to make this kind of
test, gawk
issues a warning when it sees this construct in
a program.
Another consequence of this rule is that the assignment statement:
matches = /foo/ |
assigns either zero or one to the variable matches
, depending
upon the contents of the current input record.
Constant regular expressions are also used as the first argument for
the gensub()
, sub()
, and gsub()
functions, as the
second argument of the match()
function,
and as the third argument of the patsplit()
function
(see section String-Manipulation Functions).
Modern implementations of awk
, including gawk
, allow
the third argument of split()
to be a regexp constant, but some
older implementations do not.
(d.c.)
This can lead to confusion when attempting to use regexp constants
as arguments to user-defined functions
(see section User-Defined Functions).
For example:
function mysub(pat, repl, str, global) { if (global) gsub(pat, repl, str) else sub(pat, repl, str) return str } { … text = "hi! hi yourself!" mysub(/hi/, "howdy", text, 1) … } |
In this example, the programmer wants to pass a regexp constant to the
user-defined function mysub
, which in turn passes it on to
either sub()
or gsub()
. However, what really happens is that
the pat
parameter is either one or zero, depending upon whether
or not $0
matches /hi/
.
gawk
issues a warning when it sees a regexp constant used as
a parameter to a user-defined function, since passing a truth value in
this way is probably not what was intended.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |