[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
4.5.5 Field-Splitting Summary
It is important to remember that when you assign a string constant
as the value of FS
, it undergoes normal awk
string
processing. For example, with Unix awk
and gawk
,
the assignment ‘FS = "\.."’ assigns the character string ".."
to FS
(the backslash is stripped). This creates a regexp meaning
“fields are separated by occurrences of any two characters.”
If instead you want fields to be separated by a literal period followed
by any single character, use ‘FS = "\\.."’.
The following table summarizes how fields are split, based on the value
of FS
(‘==’ means “is equal to”):
-
FS == " "
Fields are separated by runs of whitespace. Leading and trailing whitespace are ignored. This is the default.
-
FS == any other single character
Fields are separated by each occurrence of the character. Multiple successive occurrences delimit empty fields, as do leading and trailing occurrences. The character can even be a regexp metacharacter; it does not need to be escaped.
-
FS == regexp
Fields are separated by occurrences of characters that match regexp. Leading and trailing matches of regexp delimit empty fields.
-
FS == ""
Each individual character in the record becomes a separate field. (This is a
gawk
extension; it is not specified by the POSIX standard.)
Advanced Notes: Changing FS
Does Not Affect the Fields
According to the POSIX standard, awk
is supposed to behave
as if each record is split into fields at the time it is read.
In particular, this means that if you change the value of FS
after a record is read, the value of the fields (i.e., how they were split)
should reflect the old value of FS
, not the new one.
However, many older implementations of awk
do not work this way. Instead,
they defer splitting the fields until a field is actually
referenced. The fields are split
using the current value of FS
!
(d.c.)
This behavior can be difficult
to diagnose. The following example illustrates the difference
between the two methods.
(The sed
(21)
command prints just the first line of ‘/etc/passwd’.)
sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }' |
which usually prints:
root |
on an incorrect implementation of awk
, while gawk
prints something like:
root:nSijPlPhZZwgE:0:0:Root:/: |
Advanced Notes: FS
and IGNORECASE
The IGNORECASE
variable
(see section Built-in Variables That Control awk
)
affects field splitting only when the value of FS
is a regexp.
It has no effect when FS
is a single character, even if
that character is a letter. Thus, in the following code:
FS = "c" IGNORECASE = 1 $0 = "aCa" print $1 |
The output is ‘aCa’. If you really want to split fields on an
alphabetic character while ignoring case, use a regexp that will
do it for you. E.g., ‘FS = "[c]"’. In this case, IGNORECASE
will take effect.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |