File: gawk.info, Node: Variable Typing, Next: Comparison Operators, Up: Typing and Comparison 6.3.2.1 String Type versus Numeric Type ....................................... Scalar objects in 'awk' (variables, array elements, and fields) are _dynamically_ typed. This means their type can change as the program runs, from "untyped" before any use,(1) to string or number, and then from string to number or number to string, as the program progresses. ('gawk' also provides regexp-typed scalars, but let's ignore that for now; *note Strong Regexp Constants::.) You can't do much with untyped variables, other than tell that they are untyped. The following program tests 'a' against '""' and '0'; the test succeeds when 'a' has never been assigned a value. It also uses the built-in 'typeof()' function (not presented yet; *note Type Functions::) to show 'a''s type: $ gawk 'BEGIN { print (a == "" && a == 0 ? > "a is untyped" : "a has a type!") ; print typeof(a) }' -| a is untyped -| unassigned A scalar has numeric type when assigned a numeric value, such as from a numeric constant, or from another scalar with numeric type: $ gawk 'BEGIN { a = 42 ; print typeof(a) > b = a ; print typeof(b) }' number number Similarly, a scalar has string type when assigned a string value, such as from a string constant, or from another scalar with string type: $ gawk 'BEGIN { a = "forty two" ; print typeof(a) > b = a ; print typeof(b) }' string string So far, this is all simple and straightforward. What happens, though, when 'awk' has to process data from a user? Let's start with field data. What should the following command produce as output? echo hello | awk '{ printf("%s %s < 42\n", $1, ($1 < 42 ? "is" : "is not")) }' Since 'hello' is alphabetic data, 'awk' can only do a string comparison. Internally, it converts '42' into '"42"' and compares the two string values '"hello"' and '"42"'. Here's the result: $ echo hello | awk '{ printf("%s %s < 42\n", $1, > ($1 < 42 ? "is" : "is not")) }' -| hello is not < 42 However, what happens when data from a user _looks like_ a number? On the one hand, in reality, the input data consists of characters, not binary numeric values. But, on the other hand, the data looks numeric, and 'awk' really ought to treat it as such. And indeed, it does: $ echo 37 | awk '{ printf("%s %s < 42\n", $1, > ($1 < 42 ? "is" : "is not")) }' -| 37 is < 42 Here are the rules for when 'awk' treats data as a number, and for when it treats data as a string. The POSIX standard uses the term "numeric string" for input data that looks numeric. The '37' in the previous example is a numeric string. So what is the type of a numeric string? Answer: numeric. The type of a variable is important because the types of two variables determine how they are compared. Variable typing follows these definitions and rules: * A numeric constant or the result of a numeric operation has the "numeric" attribute. * A string constant or the result of a string operation has the "string" attribute. * Fields, 'getline' input, 'FILENAME', 'ARGV' elements, 'ENVIRON' elements, and the elements of an array created by 'match()', 'split()', and 'patsplit()' that are numeric strings have the "strnum" attribute.(2) Otherwise, they have the "string" attribute. Uninitialized variables also have the "strnum" attribute. * Attributes propagate across assignments but are not changed by any use. The last rule is particularly important. In the following program, 'a' has numeric type, even though it is later used in a string operation: BEGIN { a = 12.345 b = a " is a cute number" print b } When two operands are compared, either string comparison or numeric comparison may be used. This depends upon the attributes of the operands, according to the following symmetric matrix: +---------------------------------------------- | STRING NUMERIC STRNUM --------+---------------------------------------------- | STRING | string string string | NUMERIC | string numeric numeric | STRNUM | string numeric numeric --------+---------------------------------------------- The basic idea is that user input that looks numeric--and _only_ user input--should be treated as numeric, even though it is actually made of characters and is therefore also a string. Thus, for example, the string constant '" +3.14"', when it appears in program source code, is a string--even though it looks numeric--and is _never_ treated as a number for comparison purposes. In short, when one operand is a "pure" string, such as a string constant, then a string comparison is performed. Otherwise, a numeric comparison is performed. (The primary difference between a number and a strnum is that for strnums 'gawk' preserves the original string value that the scalar had when it came in.) This point bears additional emphasis: Input that looks numeric _is_ numeric. All other input is treated as strings. Thus, the six-character input string ' +3.14' receives the strnum attribute. In contrast, the eight characters '" +3.14"' appearing in program text comprise a string constant. The following examples print '1' when the comparison between the two different constants is true, and '0' otherwise: $ echo ' +3.14' | awk '{ print($0 == " +3.14") }' True -| 1 $ echo ' +3.14' | awk '{ print($0 == "+3.14") }' False -| 0 $ echo ' +3.14' | awk '{ print($0 == "3.14") }' False -| 0 $ echo ' +3.14' | awk '{ print($0 == 3.14) }' True -| 1 $ echo ' +3.14' | awk '{ print($1 == " +3.14") }' False -| 0 $ echo ' +3.14' | awk '{ print($1 == "+3.14") }' True -| 1 $ echo ' +3.14' | awk '{ print($1 == "3.14") }' False -| 0 $ echo ' +3.14' | awk '{ print($1 == 3.14) }' True -| 1 You can see the type of an input field (or other user input) using 'typeof()': $ echo hello 37 | gawk '{ print typeof($1), typeof($2) }' -| string strnum ---------- Footnotes ---------- (1) 'gawk' calls this "unassigned", as the following example shows. (2) Thus, a POSIX numeric string and 'gawk''s strnum are the same thing.