manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: Definition Syntax,  Next: Function Example,  Up: User-defined

9.2.1 Function Definition Syntax
--------------------------------

     It's entirely fair to say that the awk syntax for local variable
     definitions is appallingly awful.
                         -- _Brian Kernighan_

   Definitions of functions can appear anywhere between the rules of an
'awk' program.  Thus, the general form of an 'awk' program is extended
to include sequences of rules _and_ user-defined function definitions.
There is no need to put the definition of a function before all uses of
the function.  This is because 'awk' reads the entire program before
starting to execute any of it.

   The definition of a function named NAME looks like this:

     'function' NAME'('[PARAMETER-LIST]')'
     '{'
          BODY-OF-FUNCTION
     '}'

Here, NAME is the name of the function to define.  A valid function name
is like a valid variable name: a sequence of letters, digits, and
underscores that doesn't start with a digit.  Here too, only the 52
upper- and lowercase English letters may be used in a function name.
Within a single 'awk' program, any particular name can only be used as a
variable, array, or function.

   PARAMETER-LIST is an optional list of the function's arguments and
local variable names, separated by commas.  When the function is called,
the argument names are used to hold the argument values given in the
call.

   A function cannot have two parameters with the same name, nor may it
have a parameter with the same name as the function itself.

     CAUTION: According to the POSIX standard, function parameters
     cannot have the same name as one of the special predefined
     variables (*note Built-in Variables::), nor may a function
     parameter have the same name as another function.

     Not all versions of 'awk' enforce these restrictions.  (d.c.)
     'gawk' always enforces the first restriction.  With '--posix'
     (*note Options::), it also enforces the second restriction.

   Local variables act like the empty string if referenced where a
string value is required, and like zero if referenced where a numeric
value is required.  This is the same as the behavior of regular
variables that have never been assigned a value.  (There is more to
understand about local variables; *note Dynamic Typing::.)

   The BODY-OF-FUNCTION consists of 'awk' statements.  It is the most
important part of the definition, because it says what the function
should actually _do_.  The argument names exist to give the body a way
to talk about the arguments; local variables exist to give the body
places to keep temporary values.

   Argument names are not distinguished syntactically from local
variable names.  Instead, the number of arguments supplied when the
function is called determines how many argument variables there are.
Thus, if three argument values are given, the first three names in
PARAMETER-LIST are arguments and the rest are local variables.

   It follows that if the number of arguments is not the same in all
calls to the function, some of the names in PARAMETER-LIST may be
arguments on some occasions and local variables on others.  Another way
to think of this is that omitted arguments default to the null string.

   Usually when you write a function, you know how many names you intend
to use for arguments and how many you intend to use as local variables.
It is conventional to place some extra space between the arguments and
the local variables, in order to document how your function is supposed
to be used.

   During execution of the function body, the arguments and local
variable values hide, or "shadow", any variables of the same names used
in the rest of the program.  The shadowed variables are not accessible
in the function definition, because there is no way to name them while
their names have been taken away for the arguments and local variables.
All other variables used in the 'awk' program can be referenced or set
normally in the function's body.

   The arguments and local variables last only as long as the function
body is executing.  Once the body finishes, you can once again access
the variables that were shadowed while the function was running.

   The function body can contain expressions that call functions.  They
can even call this function, either directly or by way of another
function.  When this happens, we say the function is "recursive".  The
act of a function calling itself is called "recursion".

   All the built-in functions return a value to their caller.
User-defined functions can do so also, using the 'return' statement,
which is described in detail in *note Return Statement::.  Many of the
subsequent examples in this minor node use the 'return' statement.

   In many 'awk' implementations, including 'gawk', the keyword
'function' may be abbreviated 'func'.  (c.e.)  However, POSIX only
specifies the use of the keyword 'function'.  This actually has some
practical implications.  If 'gawk' is in POSIX-compatibility mode (*note
Options::), then the following statement does _not_ define a function:

     func foo() { a = sqrt($1) ; print a }

Instead, it defines a rule that, for each record, concatenates the value
of the variable 'func' with the return value of the function 'foo'.  If
the resulting string is non-null, the action is executed.  This is
probably not what is desired.  ('awk' accepts this input as
syntactically valid, because functions may be used before they are
defined in 'awk' programs.(1))

   To ensure that your 'awk' programs are portable, always use the
keyword 'function' when defining a function.

   ---------- Footnotes ----------

   (1) This program won't actually run, because 'foo()' is undefined.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.