[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
4.6 Reading Fixed-Width Data
NOTE: This section discusses an advanced feature of
gawk
. If you are a noviceawk
user, you might want to skip it on the first reading.
gawk
provides a facility for dealing with
fixed-width fields with no distinctive field separator. For example,
data of this nature arises in the input for old Fortran programs where
numbers are run together, or in the output of programs that did not
anticipate the use of their output as input for other programs.
An example of the latter is a table where all the columns are lined up by
the use of a variable number of spaces and empty fields are just
spaces. Clearly, awk
’s normal field splitting based on FS
does not work well in this case. Although a portable awk
program
can use a series of substr()
calls on $0
(see section String-Manipulation Functions),
this is awkward and inefficient for a large number of fields.
The splitting of an input record into fixed-width fields is specified by
assigning a string containing space-separated numbers to the built-in
variable FIELDWIDTHS
. Each number specifies the width of the field,
including columns between fields. If you want to ignore the columns
between fields, you can specify the width as a separate field that is
subsequently ignored.
It is a fatal error to supply a field width that is not a positive number.
The following data is the output of the Unix w
utility. It is useful
to illustrate the use of FIELDWIDTHS
:
10:06pm up 21 days, 14:04, 23 users User tty login idle JCPU PCPU what hzuo ttyV0 8:58pm 9 5 vi p24.tex hzang ttyV3 6:37pm 50 -csh eklye ttyV5 9:53pm 7 1 em thes.tex dportein ttyV6 8:17pm 1:47 -csh gierd ttyD3 10:00pm 1 elm dave ttyD4 9:47pm 4 4 w brent ttyp0 26Jun91 4:46 26:46 4:41 bash dave ttyq4 26Jun9115days 46 46 wnewmail |
The following program takes the above input, converts the idle time to number of seconds, and prints out the first two fields and the calculated idle time:
NOTE: This program uses a number of
awk
features that haven’t been introduced yet.
BEGIN { FIELDWIDTHS = "9 6 10 6 7 7 35" } NR > 2 { idle = $4 sub(/^ */, "", idle) # strip leading spaces if (idle == "") idle = 0 if (idle ~ /:/) { split(idle, t, ":") idle = t[1] * 60 + t[2] } if (idle ~ /days/) idle *= 24 * 60 * 60 print $1, $2, idle } |
Running the program on the data produces the following results:
hzuo ttyV0 0 hzang ttyV3 50 eklye ttyV5 0 dportein ttyV6 107 gierd ttyD3 1 dave ttyD4 0 brent ttyp0 286 dave ttyq4 1296000 |
Another (possibly more practical) example of fixed-width input data
is the input from a deck of balloting cards. In some parts of
the United States, voters mark their choices by punching holes in computer
cards. These cards are then processed to count the votes for any particular
candidate or on any particular issue. Because a voter may choose not to
vote on some issue, any column on the card may be empty. An awk
program for processing such data could use the FIELDWIDTHS
feature
to simplify reading the data. (Of course, getting gawk
to run on
a system with card readers is another story!)
Assigning a value to FS
causes gawk
to use
FS
for field splitting again. Use ‘FS = FS’ to make this happen,
without having to know the current value of FS
.
In order to tell which kind of field splitting is in effect,
use PROCINFO["FS"]
(see section Built-in Variables That Convey Information).
The value is "FS"
if regular field splitting is being used,
or it is "FIELDWIDTHS"
if fixed-width field splitting is being used:
if (PROCINFO["FS"] == "FS") regular field splitting … else if (PROCINFO["FS"] == "FIELDWIDTHS") fixed-width field splitting … else content-based field splitting … (see next section) |
This information is useful when writing a function
that needs to temporarily change FS
or FIELDWIDTHS
,
read some records, and then restore the original settings
(see section Reading the User Database,
for an example of such a function).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |