manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: Variable Typing,  Next: Comparison Operators,  Up: Typing and Comparison

6.3.2.1 String Type versus Numeric Type
.......................................

Scalar objects in 'awk' (variables, array elements, and fields) are
_dynamically_ typed.  This means their type can change as the program
runs, from "untyped" before any use,(1) to string or number, and then
from string to number or number to string, as the program progresses.
('gawk' also provides regexp-typed scalars, but let's ignore that for
now; *note Strong Regexp Constants::.)

   You can't do much with untyped variables, other than tell that they
are untyped.  The following program tests 'a' against '""' and '0'; the
test succeeds when 'a' has never been assigned a value.  It also uses
the built-in 'typeof()' function (not presented yet; *note Type
Functions::) to show 'a''s type:

     $ gawk 'BEGIN { print (a == "" && a == 0 ?
     > "a is untyped" : "a has a type!") ; print typeof(a) }'
     -| a is untyped
     -| unassigned

   A scalar has numeric type when assigned a numeric value, such as from
a numeric constant, or from another scalar with numeric type:

     $ gawk 'BEGIN { a = 42 ; print typeof(a)
     > b = a ; print typeof(b) }'
     number
     number

   Similarly, a scalar has string type when assigned a string value,
such as from a string constant, or from another scalar with string type:

     $ gawk 'BEGIN { a = "forty two" ; print typeof(a)
     > b = a ; print typeof(b) }'
     string
     string

   So far, this is all simple and straightforward.  What happens,
though, when 'awk' has to process data from a user?  Let's start with
field data.  What should the following command produce as output?

     echo hello | awk '{ printf("%s %s < 42\n", $1,
                                ($1 < 42 ? "is" : "is not")) }'

Since 'hello' is alphabetic data, 'awk' can only do a string comparison.
Internally, it converts '42' into '"42"' and compares the two string
values '"hello"' and '"42"'.  Here's the result:

     $ echo hello | awk '{ printf("%s %s < 42\n", $1,
     >                            ($1 < 42 ? "is" : "is not")) }'
     -| hello is not < 42

   However, what happens when data from a user _looks like_ a number?
On the one hand, in reality, the input data consists of characters, not
binary numeric values.  But, on the other hand, the data looks numeric,
and 'awk' really ought to treat it as such.  And indeed, it does:

     $ echo 37 | awk '{ printf("%s %s < 42\n", $1,
     >                         ($1 < 42 ? "is" : "is not")) }'
     -| 37 is < 42

   Here are the rules for when 'awk' treats data as a number, and for
when it treats data as a string.

   The POSIX standard uses the term "numeric string" for input data that
looks numeric.  The '37' in the previous example is a numeric string.
So what is the type of a numeric string?  Answer: numeric.

   The type of a variable is important because the types of two
variables determine how they are compared.  Variable typing follows
these definitions and rules:

   * A numeric constant or the result of a numeric operation has the
     "numeric" attribute.

   * A string constant or the result of a string operation has the
     "string" attribute.

   * Fields, 'getline' input, 'FILENAME', 'ARGV' elements, 'ENVIRON'
     elements, and the elements of an array created by 'match()',
     'split()', and 'patsplit()' that are numeric strings have the
     "strnum" attribute.(2)  Otherwise, they have the "string"
     attribute.  Uninitialized variables also have the "strnum"
     attribute.

   * Attributes propagate across assignments but are not changed by any
     use.

   The last rule is particularly important.  In the following program,
'a' has numeric type, even though it is later used in a string
operation:

     BEGIN {
          a = 12.345
          b = a " is a cute number"
          print b
     }

   When two operands are compared, either string comparison or numeric
comparison may be used.  This depends upon the attributes of the
operands, according to the following symmetric matrix:

        +----------------------------------------------
        |       STRING          NUMERIC         STRNUM
--------+----------------------------------------------
        |
STRING  |       string          string          string
        |
NUMERIC |       string          numeric         numeric
        |
STRNUM  |       string          numeric         numeric
--------+----------------------------------------------

   The basic idea is that user input that looks numeric--and _only_ user
input--should be treated as numeric, even though it is actually made of
characters and is therefore also a string.  Thus, for example, the
string constant '" +3.14"', when it appears in program source code, is a
string--even though it looks numeric--and is _never_ treated as a number
for comparison purposes.

   In short, when one operand is a "pure" string, such as a string
constant, then a string comparison is performed.  Otherwise, a numeric
comparison is performed.  (The primary difference between a number and a
strnum is that for strnums 'gawk' preserves the original string value
that the scalar had when it came in.)

   This point bears additional emphasis: Input that looks numeric _is_
numeric.  All other input is treated as strings.

   Thus, the six-character input string ' +3.14' receives the strnum
attribute.  In contrast, the eight characters '" +3.14"' appearing in
program text comprise a string constant.  The following examples print
'1' when the comparison between the two different constants is true, and
'0' otherwise:

     $ echo ' +3.14' | awk '{ print($0 == " +3.14") }'    True
     -| 1
     $ echo ' +3.14' | awk '{ print($0 == "+3.14") }'     False
     -| 0
     $ echo ' +3.14' | awk '{ print($0 == "3.14") }'      False
     -| 0
     $ echo ' +3.14' | awk '{ print($0 == 3.14) }'        True
     -| 1
     $ echo ' +3.14' | awk '{ print($1 == " +3.14") }'    False
     -| 0
     $ echo ' +3.14' | awk '{ print($1 == "+3.14") }'     True
     -| 1
     $ echo ' +3.14' | awk '{ print($1 == "3.14") }'      False
     -| 0
     $ echo ' +3.14' | awk '{ print($1 == 3.14) }'        True
     -| 1

   You can see the type of an input field (or other user input) using
'typeof()':

     $ echo hello 37 | gawk '{ print typeof($1), typeof($2) }'
     -| string strnum

   ---------- Footnotes ----------

   (1) 'gawk' calls this "unassigned", as the following example shows.

   (2) Thus, a POSIX numeric string and 'gawk''s strnum are the same
thing.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.