[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
11.5 Profiling Your awk
Programs
You may produce execution
traces of your awk
programs.
This is done with a specially compiled version of gawk
,
called pgawk
(“profiling gawk
”).
pgawk
is identical in every way to gawk
, except that when
it has finished running, it creates a profile of your program in a file
named ‘awkprof.out’.
Because it is profiling, it also executes up to 45% slower than
gawk
normally does.
As shown in the following example,
the ‘--profile’ option can be used to change the name of the file
where pgawk
will write the profile:
pgawk --profile=myprog.prof -f myprog.awk data1 data2 |
In the above example, pgawk
places the profile in
‘myprog.prof’ instead of in ‘awkprof.out’.
Here is a sample
session showing a simple awk
program, its input data, and the
results from running pgawk
. First, the awk
program:
BEGIN { print "First BEGIN rule" } END { print "First END rule" } /foo/ { print "matched /foo/, gosh" for (i = 1; i <= 3; i++) sing() } { if (/foo/) print "if is true" else print "else is true" } BEGIN { print "Second BEGIN rule" } END { print "Second END rule" } function sing( dummy) { print "I gotta be me!" } |
Following is the input data:
foo bar baz foo junk |
Here is the ‘awkprof.out’ that results from running pgawk
on this program and data (this example also illustrates that awk
programmers sometimes have to work late):
# gawk profile, created Sun Aug 13 00:00:15 2000 # BEGIN block(s) BEGIN { 1 print "First BEGIN rule" 1 print "Second BEGIN rule" } # Rule(s) 5 /foo/ { # 2 2 print "matched /foo/, gosh" 6 for (i = 1; i <= 3; i++) { 6 sing() } } 5 { 5 if (/foo/) { # 2 2 print "if is true" 3 } else { 3 print "else is true" } } # END block(s) END { 1 print "First END rule" 1 print "Second END rule" } # Functions, listed alphabetically 6 function sing(dummy) { 6 print "I gotta be me!" } |
This example illustrates many of the basic features of profiling output. They are as follows:
-
The program is printed in the order
BEGIN
rule,BEGINFILE
rule, pattern/action rules,ENDFILE
rule,END
rule and functions, listed alphabetically. MultipleBEGIN
andEND
rules are merged together, as are multipleBEGINFILE
andENDFILE
rules. - Pattern-action rules have two counts. The first count, to the left of the rule, shows how many times the rule’s pattern was tested. The second count, to the right of the rule’s opening left brace in a comment, shows how many times the rule’s action was executed. The difference between the two indicates how many times the rule’s pattern evaluated to false.
-
Similarly,
the count for an
if
-else
statement shows how many times the condition was tested. To the right of the opening left brace for theif
’s body is a count showing how many times the condition was true. The count for theelse
indicates how many times the test failed. -
The count for a loop header (such as
for
orwhile
) shows how many times the loop test was executed. (Because of this, you can’t just look at the count on the first statement in a rule to determine how many times the rule was executed. If the first statement is a loop, the count is misleading.) -
For user-defined functions, the count next to the
function
keyword indicates how many times the function was called. The counts next to the statements in the body show how many times those statements were executed. -
The layout uses “K&R” style with TABs.
Braces are used everywhere, even when
the body of an
if
,else
, or loop is only a single statement. - Parentheses are used only where needed, as indicated by the structure of the program and the precedence rules. For example, ‘(3 + 5) * 4’ means add three plus five, then multiply the total by four. However, ‘3 + 5 * 4’ has no parentheses, and means ‘3 + (5 * 4)’.
-
Parentheses are used around the arguments to
print
andprintf
only when theprint
orprintf
statement is followed by a redirection. Similarly, if the target of a redirection isn’t a scalar, it gets parenthesized. -
pgawk
supplies leading comments in front of theBEGIN
andEND
rules, the pattern/action rules, and the functions.
The profiled version of your program may not look exactly like what you
typed when you wrote it. This is because pgawk
creates the
profiled version by “pretty printing” its internal representation of
the program. The advantage to this is that pgawk
can produce
a standard representation. The disadvantage is that all source-code
comments are lost, as are the distinctions among multiple BEGIN
,
END
, BEGINFILE
, and ENDFILE
rules. Also, things such as:
/foo/ |
come out as:
/foo/ { print $0 } |
which is correct, but possibly surprising.
Besides creating profiles when a program has completed,
pgawk
can produce a profile while it is running.
This is useful if your awk
program goes into an
infinite loop and you want to see what has been executed.
To use this feature, run pgawk
in the background:
$ pgawk -f myprog & [1] 13992 |
The shell prints a job number and process ID number; in this case, 13992.
Use the kill
command to send the USR1
signal
to pgawk
:
$ kill -USR1 13992 |
As usual, the profiled version of the program is written to ‘awkprof.out’, or to a different file if you use the ‘--profile’ option.
Along with the regular profile, as shown earlier, the profile includes a trace of any active functions:
# Function Call Stack: # 3. baz # 2. bar # 1. foo # -- main -- |
You may send pgawk
the USR1
signal as many times as you like.
Each time, the profile and function call trace are appended to the output
profile file.
If you use the HUP
signal instead of the USR1
signal,
pgawk
produces the profile and the function call trace and then exits.
When pgawk
runs on MS-Windows systems, it uses the
INT
and QUIT
signals for producing the profile and, in
the case of the INT
signal, pgawk
exits. This is
because these systems don’t support the kill
command, so the
only signals you can deliver to a program are those generated by the
keyboard. The INT
signal is generated by the
Ctrl-<C> or Ctrl-<BREAK> key, while the
QUIT
signal is generated by the Ctrl-<\> key.
Finally, regular gawk
also accepts the ‘--profile’ option.
When called this way, gawk
“pretty prints” the program into
‘awkprof.out’, without any execution counts.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |