manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: Field Separators,  Next: Constant Size,  Prev: Changing Fields,  Up: Reading Files

4.5 Specifying How Fields Are Separated
=======================================

* Menu:

* Default Field Splitting::      How fields are normally separated.
* Regexp Field Splitting::       Using regexps as the field separator.
* Single Character Fields::      Making each character a separate field.
* Command Line Field Separator:: Setting 'FS' from the command line.
* Full Line Fields::             Making the full line be a single field.
* Field Splitting Summary::      Some final points and a summary table.

The "field separator", which is either a single character or a regular
expression, controls the way 'awk' splits an input record into fields.
'awk' scans the input record for character sequences that match the
separator; the fields themselves are the text between the matches.

   In the examples that follow, we use the bullet symbol (*) to
represent spaces in the output.  If the field separator is 'oo', then
the following line:

     moo goo gai pan

is split into three fields: 'm', '*g', and '*gai*pan'.  Note the leading
spaces in the values of the second and third fields.

   The field separator is represented by the predefined variable 'FS'.
Shell programmers take note: 'awk' does _not_ use the name 'IFS' that is
used by the POSIX-compliant shells (such as the Unix Bourne shell, 'sh',
or Bash).

   The value of 'FS' can be changed in the 'awk' program with the
assignment operator, '=' (*note Assignment Ops::).  Often, the right
time to do this is at the beginning of execution before any input has
been processed, so that the very first record is read with the proper
separator.  To do this, use the special 'BEGIN' pattern (*note
BEGIN/END::).  For example, here we set the value of 'FS' to the string
'","':

     awk 'BEGIN { FS = "," } ; { print $2 }'

Given the input line:

     John Q. Smith, 29 Oak St., Walamazoo, MI 42139

this 'awk' program extracts and prints the string '*29*Oak*St.'.

   Sometimes the input data contains separator characters that don't
separate fields the way you thought they would.  For instance, the
person's name in the example we just used might have a title or suffix
attached, such as:

     John Q. Smith, LXIX, 29 Oak St., Walamazoo, MI 42139

The same program would extract '*LXIX' instead of '*29*Oak*St.'.  If you
were expecting the program to print the address, you would be surprised.
The moral is to choose your data layout and separator characters
carefully to prevent such problems.  (If the data is not in a form that
is easy to process, perhaps you can massage it first with a separate
'awk' program.)

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.