manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: Auto-set,  Next: ARGC and ARGV,  Prev: User-modified,  Up: Built-in Variables

7.5.2 Built-in Variables That Convey Information
------------------------------------------------

The following is an alphabetical list of variables that 'awk' sets
automatically on certain occasions in order to provide information to
your program.

   The variables that are specific to 'gawk' are marked with a pound
sign ('#').  These variables are 'gawk' extensions.  In other 'awk'
implementations or if 'gawk' is in compatibility mode (*note Options::),
they are not special:

'ARGC', 'ARGV'
     The command-line arguments available to 'awk' programs are stored
     in an array called 'ARGV'.  'ARGC' is the number of command-line
     arguments present.  *Note Other Arguments::.  Unlike most 'awk'
     arrays, 'ARGV' is indexed from 0 to 'ARGC' - 1.  In the following
     example:

          $ awk 'BEGIN {
          >         for (i = 0; i < ARGC; i++)
          >             print ARGV[i]
          >      }' inventory-shipped mail-list
          -| awk
          -| inventory-shipped
          -| mail-list

     'ARGV[0]' contains 'awk', 'ARGV[1]' contains 'inventory-shipped',
     and 'ARGV[2]' contains 'mail-list'.  The value of 'ARGC' is three,
     one more than the index of the last element in 'ARGV', because the
     elements are numbered from zero.

     The names 'ARGC' and 'ARGV', as well as the convention of indexing
     the array from 0 to 'ARGC' - 1, are derived from the C language's
     method of accessing command-line arguments.

     The value of 'ARGV[0]' can vary from system to system.  Also, you
     should note that the program text is _not_ included in 'ARGV', nor
     are any of 'awk''s command-line options.  *Note ARGC and ARGV:: for
     information about how 'awk' uses these variables.  (d.c.)

'ARGIND #'
     The index in 'ARGV' of the current file being processed.  Every
     time 'gawk' opens a new data file for processing, it sets 'ARGIND'
     to the index in 'ARGV' of the file name.  When 'gawk' is processing
     the input files, 'FILENAME == ARGV[ARGIND]' is always true.

     This variable is useful in file processing; it allows you to tell
     how far along you are in the list of data files as well as to
     distinguish between successive instances of the same file name on
     the command line.

     While you can change the value of 'ARGIND' within your 'awk'
     program, 'gawk' automatically sets it to a new value when it opens
     the next file.

'ENVIRON'
     An associative array containing the values of the environment.  The
     array indices are the environment variable names; the elements are
     the values of the particular environment variables.  For example,
     'ENVIRON["HOME"]' might be '/home/arnold'.

     For POSIX 'awk', changing this array does not affect the
     environment passed on to any programs that 'awk' may spawn via
     redirection or the 'system()' function.

     However, beginning with version 4.2, if not in POSIX compatibility
     mode, 'gawk' does update its own environment when 'ENVIRON' is
     changed, thus changing the environment seen by programs that it
     creates.  You should therefore be especially careful if you modify
     'ENVIRON["PATH"]', which is the search path for finding executable
     programs.

     This can also affect the running 'gawk' program, since some of the
     built-in functions may pay attention to certain environment
     variables.  The most notable instance of this is 'mktime()' (*note
     Time Functions::), which pays attention the value of the 'TZ'
     environment variable on many systems.

     Some operating systems may not have environment variables.  On such
     systems, the 'ENVIRON' array is empty (except for
     'ENVIRON["AWKPATH"]' and 'ENVIRON["AWKLIBPATH"]'; *note AWKPATH
     Variable:: and *note AWKLIBPATH Variable::).

'ERRNO #'
     If a system error occurs during a redirection for 'getline', during
     a read for 'getline', or during a 'close()' operation, then 'ERRNO'
     contains a string describing the error.

     In addition, 'gawk' clears 'ERRNO' before opening each command-line
     input file.  This enables checking if the file is readable inside a
     'BEGINFILE' pattern (*note BEGINFILE/ENDFILE::).

     Otherwise, 'ERRNO' works similarly to the C variable 'errno'.
     Except for the case just mentioned, 'gawk' _never_ clears it (sets
     it to zero or '""').  Thus, you should only expect its value to be
     meaningful when an I/O operation returns a failure value, such as
     'getline' returning -1.  You are, of course, free to clear it
     yourself before doing an I/O operation.

     If the value of 'ERRNO' corresponds to a system error in the C
     'errno' variable, then 'PROCINFO["errno"]' will be set to the value
     of 'errno'.  For non-system errors, 'PROCINFO["errno"]' will be
     zero.

'FILENAME'
     The name of the current input file.  When no data files are listed
     on the command line, 'awk' reads from the standard input and
     'FILENAME' is set to '"-"'.  'FILENAME' changes each time a new
     file is read (*note Reading Files::).  Inside a 'BEGIN' rule, the
     value of 'FILENAME' is '""', because there are no input files being
     processed yet.(1)  (d.c.)  Note, though, that using 'getline'
     (*note Getline::) inside a 'BEGIN' rule can give 'FILENAME' a
     value.

'FNR'
     The current record number in the current file.  'awk' increments
     'FNR' each time it reads a new record (*note Records::).  'awk'
     resets 'FNR' to zero each time it starts a new input file.

'NF'
     The number of fields in the current input record.  'NF' is set each
     time a new record is read, when a new field is created, or when
     '$0' changes (*note Fields::).

     Unlike most of the variables described in this node, assigning a
     value to 'NF' has the potential to affect 'awk''s internal
     workings.  In particular, assignments to 'NF' can be used to create
     fields in or remove fields from the current record.  *Note Changing
     Fields::.

'FUNCTAB #'
     An array whose indices and corresponding values are the names of
     all the built-in, user-defined, and extension functions in the
     program.

          NOTE: Attempting to use the 'delete' statement with the
          'FUNCTAB' array causes a fatal error.  Any attempt to assign
          to an element of 'FUNCTAB' also causes a fatal error.

'NR'
     The number of input records 'awk' has processed since the beginning
     of the program's execution (*note Records::).  'awk' increments
     'NR' each time it reads a new record.

'PROCINFO #'
     The elements of this array provide access to information about the
     running 'awk' program.  The following elements (listed
     alphabetically) are guaranteed to be available:

     'PROCINFO["argv"]'
          The 'PROCINFO["argv"]' array contains all of the command-line
          arguments (after glob expansion and redirection processing on
          platforms where that must be done manually by the program)
          with subscripts ranging from 0 through 'argc' - 1.  For
          example, 'PROCINFO["argv"][0]' will contain the name by which
          'gawk' was invoked.  Here is an example of how this feature
          may be used:

               gawk '
               BEGIN {
                       for (i = 0; i < length(PROCINFO["argv"]); i++)
                               print i, PROCINFO["argv"][i]
               }'

          Please note that this differs from the standard 'ARGV' array
          which does not include command-line arguments that have
          already been processed by 'gawk' (*note ARGC and ARGV::).

     'PROCINFO["egid"]'
          The value of the 'getegid()' system call.

     'PROCINFO["errno"]'
          The value of the C 'errno' variable when 'ERRNO' is set to the
          associated error message.

     'PROCINFO["euid"]'
          The value of the 'geteuid()' system call.

     'PROCINFO["FS"]'
          This is '"FS"' if field splitting with 'FS' is in effect,
          '"FIELDWIDTHS"' if field splitting with 'FIELDWIDTHS' is in
          effect, '"FPAT"' if field matching with 'FPAT' is in effect,
          or '"API"' if field splitting is controlled by an API input
          parser.

     'PROCINFO["gid"]'
          The value of the 'getgid()' system call.

     'PROCINFO["identifiers"]'
          A subarray, indexed by the names of all identifiers used in
          the text of the 'awk' program.  An "identifier" is simply the
          name of a variable (be it scalar or array), built-in function,
          user-defined function, or extension function.  For each
          identifier, the value of the element is one of the following:

          '"array"'
               The identifier is an array.

          '"builtin"'
               The identifier is a built-in function.

          '"extension"'
               The identifier is an extension function loaded via
               '@load' or '-l'.

          '"scalar"'
               The identifier is a scalar.

          '"untyped"'
               The identifier is untyped (could be used as a scalar or
               an array; 'gawk' doesn't know yet).

          '"user"'
               The identifier is a user-defined function.

          The values indicate what 'gawk' knows about the identifiers
          after it has finished parsing the program; they are _not_
          updated while the program runs.

     'PROCINFO["platform"]'
          This element gives a string indicating the platform for which
          'gawk' was compiled.  The value will be one of the following:

          '"mingw"'
               Microsoft Windows, using MinGW.

          '"os390"'
               OS/390 (also known as z/OS).

          '"posix"'
               GNU/Linux, Cygwin, macOS, and legacy Unix systems.

          '"vms"'
               OpenVMS.

     'PROCINFO["pgrpid"]'
          The process group ID of the current process.

     'PROCINFO["pid"]'
          The process ID of the current process.

     'PROCINFO["ppid"]'
          The parent process ID of the current process.

     'PROCINFO["strftime"]'
          The default time format string for 'strftime()'.  Assigning a
          new value to this element changes the default.  *Note Time
          Functions::.

     'PROCINFO["uid"]'
          The value of the 'getuid()' system call.

     'PROCINFO["version"]'
          The version of 'gawk'.

     The following additional elements in the array are available to
     provide information about the MPFR and GMP libraries if your
     version of 'gawk' supports arbitrary-precision arithmetic (*note
     Arbitrary Precision Arithmetic::):

     'PROCINFO["gmp_version"]'
          The version of the GNU MP library.

     'PROCINFO["mpfr_version"]'
          The version of the GNU MPFR library.

     'PROCINFO["prec_max"]'
          The maximum precision supported by MPFR.

     'PROCINFO["prec_min"]'
          The minimum precision required by MPFR.

     The following additional elements in the array are available to
     provide information about the version of the extension API, if your
     version of 'gawk' supports dynamic loading of extension functions
     (*note Dynamic Extensions::):

     'PROCINFO["api_major"]'
          The major version of the extension API.

     'PROCINFO["api_minor"]'
          The minor version of the extension API.

     On some systems, there may be elements in the array, '"group1"'
     through '"groupN"' for some N.  N is the number of supplementary
     groups that the process has.  Use the 'in' operator to test for
     these elements (*note Reference to Elements::).

     The following elements allow you to change 'gawk''s behavior:

     'PROCINFO["NONFATAL"]'
          If this element exists, then I/O errors for all redirections
          become nonfatal.  *Note Nonfatal::.

     'PROCINFO["NAME", "NONFATAL"]'
          Make I/O errors for NAME be nonfatal.  *Note Nonfatal::.

     'PROCINFO["COMMAND", "pty"]'
          For two-way communication to COMMAND, use a pseudo-tty instead
          of setting up a two-way pipe.  *Note Two-way I/O:: for more
          information.

     'PROCINFO["INPUT_NAME", "READ_TIMEOUT"]'
          Set a timeout for reading from input redirection INPUT_NAME.
          *Note Read Timeout:: for more information.

     'PROCINFO["INPUT_NAME", "RETRY"]'
          If an I/O error that may be retried occurs when reading data
          from INPUT_NAME, and this array entry exists, then 'getline'
          returns -2 instead of following the default behavior of
          returning -1 and configuring INPUT_NAME to return no further
          data.  An I/O error that may be retried is one where 'errno'
          has the value 'EAGAIN', 'EWOULDBLOCK', 'EINTR', or
          'ETIMEDOUT'.  This may be useful in conjunction with
          'PROCINFO["INPUT_NAME", "READ_TIMEOUT"]' or situations where a
          file descriptor has been configured to behave in a
          non-blocking fashion.  *Note Retrying Input:: for more
          information.

     'PROCINFO["sorted_in"]'
          If this element exists in 'PROCINFO', its value controls the
          order in which array indices will be processed by 'for (INDX
          in ARRAY)' loops.  This is an advanced feature, so we defer
          the full description until later; see *note Controlling
          Scanning::.

'RLENGTH'
     The length of the substring matched by the 'match()' function
     (*note String Functions::).  'RLENGTH' is set by invoking the
     'match()' function.  Its value is the length of the matched string,
     or -1 if no match is found.

'RSTART'
     The start index in characters of the substring that is matched by
     the 'match()' function (*note String Functions::).  'RSTART' is set
     by invoking the 'match()' function.  Its value is the position of
     the string where the matched substring starts, or zero if no match
     was found.

'RT #'
     The input text that matched the text denoted by 'RS', the record
     separator.  It is set every time a record is read.

'SYMTAB #'
     An array whose indices are the names of all defined global
     variables and arrays in the program.  'SYMTAB' makes 'gawk''s
     symbol table visible to the 'awk' programmer.  It is built as
     'gawk' parses the program and is complete before the program starts
     to run.

     The array may be used for indirect access to read or write the
     value of a variable:

          foo = 5
          SYMTAB["foo"] = 4
          print foo    # prints 4

     The 'isarray()' function (*note Type Functions::) may be used to
     test if an element in 'SYMTAB' is an array.  Also, you may not use
     the 'delete' statement with the 'SYMTAB' array.

     Prior to version 5.0 of 'gawk', you could use an index for 'SYMTAB'
     that was not a predefined identifier:

          SYMTAB["xxx"] = 5
          print SYMTAB["xxx"]

     This no longer works, instead producing a fatal error, as it led to
     rampant confusion.

     The 'SYMTAB' array is more interesting than it looks.  Andrew
     Schorr points out that it effectively gives 'awk' data pointers.
     Consider his example:

          # Indirect multiply of any variable by amount, return result

          function multiply(variable, amount)
          {
              return SYMTAB[variable] *= amount
          }

     You would use it like this:

          BEGIN {
              answer = 10.5
              multiply("answer", 4)
              print "The answer is", answer
          }

     When run, this produces:

          $ gawk -f answer.awk
          -| The answer is 42

          NOTE: In order to avoid severe time-travel paradoxes,(2)
          neither 'FUNCTAB' nor 'SYMTAB' is available as an element
          within the 'SYMTAB' array.

                        Changing 'NR' and 'FNR'

   'awk' increments 'NR' and 'FNR' each time it reads a record, instead
of setting them to the absolute value of the number of records read.
This means that a program can change these variables and their new
values are incremented for each record.  (d.c.)  The following example
shows this:

     $ echo '1
     > 2
     > 3
     > 4' | awk 'NR == 2 { NR = 17 }
     > { print NR }'
     -| 1
     -| 17
     -| 18
     -| 19

Before 'FNR' was added to the 'awk' language (*note V7/SVR3.1::), many
'awk' programs used this feature to track the number of records in a
file by resetting 'NR' to zero when 'FILENAME' changed.

   ---------- Footnotes ----------

   (1) Some early implementations of Unix 'awk' initialized 'FILENAME'
to '"-"', even if there were data files to be processed.  This behavior
was incorrect and should not be relied upon in your programs.

   (2) Not to mention difficult implementation issues.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.