[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
6.1.4 Conversion of Strings and Numbers
Strings are converted to numbers and numbers are converted to strings, if the context
of the awk
program demands it. For example, if the value of
either foo
or bar
in the expression ‘foo + bar’
happens to be a string, it is converted to a number before the addition
is performed. If numeric values appear in string concatenation, they
are converted to strings. Consider the following:
two = 2; three = 3 print (two three) + 4 |
This prints the (numeric) value 27. The numeric values of
the variables two
and three
are converted to strings and
concatenated together. The resulting string is converted back to the
number 23, to which 4 is then added.
If, for some reason, you need to force a number to be converted to a
string, concatenate that number with the empty string, ""
.
To force a string to be converted to a number, add zero to that string.
A string is converted to a number by interpreting any numeric prefix
of the string as numerals:
"2.5"
converts to 2.5, "1e3"
converts to 1000, and "25fix"
has a numeric value of 25.
Strings that can’t be interpreted as valid numbers convert to zero.
The exact manner in which numbers are converted into strings is controlled
by the awk
built-in variable CONVFMT
(see section Built-in Variables).
Numbers are converted using the sprintf()
function
with CONVFMT
as the format
specifier
(see section String-Manipulation Functions).
CONVFMT
’s default value is "%.6g"
, which prints a value with
at most six significant digits. For some applications, you might want to
change it to specify more precision.
On most modern machines,
17 digits is usually enough to capture a floating-point number’s
value exactly.(29)
Strange results can occur if you set CONVFMT
to a string that doesn’t
tell sprintf()
how to format floating-point numbers in a useful way.
For example, if you forget the ‘%’ in the format, awk
converts
all numbers to the same constant string.
As a special case, if a number is an integer, then the result of converting
it to a string is always an integer, no matter what the value of
CONVFMT
may be. Given the following code fragment:
CONVFMT = "%2.2f" a = 12 b = a "" |
b
has the value "12"
, not "12.00"
.
(d.c.)
Prior to the POSIX standard, awk
used the value
of OFMT
for converting numbers to strings. OFMT
specifies the output format to use when printing numbers with print
.
CONVFMT
was introduced in order to separate the semantics of
conversion from the semantics of printing. Both CONVFMT
and
OFMT
have the same default value: "%.6g"
. In the vast majority
of cases, old awk
programs do not change their behavior.
However, these semantics for OFMT
are something to keep in mind if you must
port your new-style program to older implementations of awk
.
We recommend
that instead of changing your programs, just port gawk
itself.
See section The print
Statement,
for more information on the print
statement.
And, once again, where you are can matter when it comes to converting
between numbers and strings. In Where You Are Makes A Difference, we mentioned that
the local character set and language (the locale) can affect how
gawk
matches characters. The locale also affects numeric
formats. In particular, for awk
programs, it affects the
decimal point character. The "C"
locale, and most English-language
locales, use the period character (‘.’) as the decimal point.
However, many (if not most) European and non-English locales use the comma
(‘,’) as the decimal point character.
The POSIX standard says that awk
always uses the period as the decimal
point when reading the awk
program source code, and for command-line
variable assignments (see section Other Command-Line Arguments).
However, when interpreting input data, for print
and printf
output,
and for number to string conversion, the local decimal point character is used.
Here are some examples indicating the difference in behavior,
on a GNU/Linux system:
$ gawk 'BEGIN { printf "%g\n", 3.1415927 }' -| 3.14159 $ LC_ALL=en_DK gawk 'BEGIN { printf "%g\n", 3.1415927 }' -| 3,14159 $ echo 4,321 | gawk '{ print $1 + 1 }' -| 5 $ echo 4,321 | LC_ALL=en_DK gawk '{ print $1 + 1 }' -| 5,321 |
The ‘en_DK’ locale is for English in Denmark, where the comma acts as
the decimal point separator. In the normal "C"
locale, gawk
treats ‘4,321’ as ‘4’, while in the Danish locale, it’s treated
as the full number, 4.321.
Some earlier versions of gawk
fully complied with this aspect
of the standard. However, many users in non-English locales complained
about this behavior, since their data used a period as the decimal
point, so the default behavior was restored to use a period as the
decimal point character. You can use the ‘--use-lc-numeric’
option (see section Command-Line Options) to force gawk
to use the locale’s
decimal point character. (gawk
also uses the locale’s decimal
point character when in POSIX mode, either via ‘--posix’, or the
POSIXLY_CORRECT
environment variable.)
table-locale-affects describes the cases in which the locale’s decimal point character is used and when a period is used. Some of these features have not been described yet.
Feature | Default | ‘--posix’ or ‘--use-lc-numeric’ |
---|---|---|
%'g | Use locale | Use locale |
%g | Use period | Use locale |
Input | Use period | Use locale |
strtonum() | Use period | Use locale |
Table 6.1: Locale Decimal Point versus A Period
Finally, modern day formal standards and IEEE standard floating point
representation can have an unusual but important effect on the way
gawk
converts some special string values to numbers. The details
are presented in Standards Versus Existing Practice.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |