[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
12.3.1 Noting Data File Boundaries
The BEGIN
and END
rules are each executed exactly once at
the beginning and end of your awk
program, respectively
(see section The BEGIN
and END
Special Patterns).
We (the gawk
authors) once had a user who mistakenly thought that the
BEGIN
rule is executed at the beginning of each data file and the
END
rule is executed at the end of each data file.
When informed
that this was not the case, the user requested that we add new special
patterns to gawk
, named BEGIN_FILE
and END_FILE
, that
would have the desired behavior. He even supplied us the code to do so.
Adding these special patterns to gawk
wasn’t necessary;
the job can be done cleanly in awk
itself, as illustrated
by the following library program.
It arranges to call two user-supplied functions, beginfile()
and
endfile()
, at the beginning and end of each data file.
Besides solving the problem in only nine(!) lines of code, it does so
portably; this works with any implementation of awk
:
# transfile.awk # # Give the user a hook for filename transitions # # The user must supply functions beginfile() and endfile() # that each take the name of the file being started or # finished, respectively. FILENAME != _oldfilename \ { if (_oldfilename != "") endfile(_oldfilename) _oldfilename = FILENAME beginfile(FILENAME) } END { endfile(FILENAME) } |
This file must be loaded before the user’s “main” program, so that the rule it supplies is executed first.
This rule relies on awk
’s FILENAME
variable that
automatically changes for each new data file. The current file name is
saved in a private variable, _oldfilename
. If FILENAME
does
not equal _oldfilename
, then a new data file is being processed and
it is necessary to call endfile()
for the old file. Because
endfile()
should only be called if a file has been processed, the
program first checks to make sure that _oldfilename
is not the null
string. The program then assigns the current file name to
_oldfilename
and calls beginfile()
for the file.
Because, like all awk
variables, _oldfilename
is
initialized to the null string, this rule executes correctly even for the
first data file.
The program also supplies an END
rule to do the final processing for
the last file. Because this END
rule comes before any END
rules
supplied in the “main” program, endfile()
is called first. Once
again the value of multiple BEGIN
and END
rules should be clear.
If the same data file occurs twice in a row on the command line, then
endfile()
and beginfile()
are not executed at the end of the
first pass and at the beginning of the second pass.
The following version solves the problem:
# ftrans.awk --- handle data file transitions # # user supplies beginfile() and endfile() functions FNR == 1 { if (_filename_ != "") endfile(_filename_) _filename_ = FILENAME beginfile(FILENAME) } END { endfile(_filename_) } |
Counting Things, shows how this library function can be used and how it simplifies writing the main program.
Advanced Notes: So Why Does gawk
have BEGINFILE
and ENDFILE
?
You are probably wondering, if beginfile()
and endfile()
functions can do the job, why does gawk
have
BEGINFILE
and ENDFILE
patterns (see section The BEGINFILE
and ENDFILE
Special Patterns)?
Good question. Normally, if awk
cannot open a file, this
causes an immediate fatal error. In this case, there is no way for a
user-defined function to deal with the problem, since the mechanism for
calling it relies on the file being open and at the first record. Thus,
the main reason for BEGINFILE
is to give you a “hook” to catch
files that cannot be processed. ENDFILE
exists for symmetry,
and because it provides an easy way to do per-file cleanup processing.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |