info gawk

File: gawk.info, Node: Field Splitting Summary, Prev: Full Line Fields, Up: Field Separators

4.5.6 Field-Splitting Summary
-----------------------------

It is important to remember that when you assign a string constant as
the value of 'FS', it undergoes normal 'awk' string processing. For
example, with Unix 'awk' and 'gawk', the assignment 'FS = "\.."' assigns
the character string '".."' to 'FS' (the backslash is stripped). This
creates a regexp meaning "fields are separated by occurrences of any two
characters." If instead you want fields to be separated by a literal
period followed by any single character, use 'FS = "\\.."'.

The following list summarizes how fields are split, based on the
value of 'FS' ('==' means "is equal to"):

'FS == " "'
Fields are separated by runs of whitespace. Leading and trailing
whitespace are ignored. This is the default.

'FS == ANY OTHER SINGLE CHARACTER'
Fields are separated by each occurrence of the character. Multiple
successive occurrences delimit empty fields, as do leading and
trailing occurrences. The character can even be a regexp
metacharacter; it does not need to be escaped.

'FS == REGEXP'
Fields are separated by occurrences of characters that match
REGEXP. Leading and trailing matches of REGEXP delimit empty
fields.

'FS == ""'
Each individual character in the record becomes a separate field.
(This is a common extension; it is not specified by the POSIX
standard.)

'FS' and 'IGNORECASE'

The 'IGNORECASE' variable (*note User-modified::) affects field
splitting _only_ when the value of 'FS' is a regexp. It has no effect
when 'FS' is a single character, even if that character is a letter.
Thus, in the following code:

FS = "c"
IGNORECASE = 1
$0 = "aCa"
print $1

The output is 'aCa'. If you really want to split fields on an
alphabetic character while ignoring case, use a regexp that will do it
for you (e.g., 'FS = "[c]"'). In this case, 'IGNORECASE' will take
effect.