[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
C.3.1 A Minimal Introduction to gawk
Internals
The truth is that gawk
was not designed for simple extensibility.
The facilities for adding functions using shared libraries work, but
are something of a “bag on the side.” Thus, this tour is
brief and simplistic; would-be gawk
hackers are encouraged to
spend some time reading the source code before trying to write
extensions based on the material presented here. Of particular note
are the files ‘awk.h’, ‘builtin.c’, and ‘eval.c’.
Reading ‘awkgram.y’ in order to see how the parse tree is built
would also be of use.
With the disclaimers out of the way, the following types, structure members, functions, and macros are declared in ‘awk.h’ and are of use when writing extensions. The next section shows how they are used:
-
AWKNUM
An
AWKNUM
is the internal type ofawk
floating-point numbers. Typically, it is a Cdouble
.-
NODE
Just about everything is done using objects of type
NODE
. These contain both strings and numbers, as well as variables and arrays.-
AWKNUM force_number(NODE *n)
This macro forces a value to be numeric. It returns the actual numeric value contained in the node. It may end up calling an internal
gawk
function.-
void force_string(NODE *n)
This macro guarantees that a
NODE
’s string value is current. It may end up calling an internalgawk
function. It also guarantees that the string is zero-terminated.-
void force_wstring(NODE *n)
Similarly, this macro guarantees that a
NODE
’s wide-string value is current. It may end up calling an internalgawk
function. It also guarantees that the wide string is zero-terminated.-
size_t get_curfunc_arg_count(void)
This function returns the actual number of parameters passed to the current function. Inside the code of an extension this can be used to determine the maximum index which is safe to use with
get_actual_argument
. If this value is greater thannargs
, the function was called incorrectly from theawk
program.-
nargs
Inside an extension function, this is the maximum number of expected parameters, as set by the
make_builtin()
function.-
n->stptr
-
n->stlen
The data and length of a
NODE
’s string value, respectively. The string is not guaranteed to be zero-terminated. If you need to pass the string value to a C library function, save the value inn->stptr[n->stlen]
, assign'\0'
to it, call the routine, and then restore the value.-
n->wstptr
-
n->wstlen
The data and length of a
NODE
’s wide-string value, respectively. Useforce_wstring()
to make sure these values are current.-
n->type
The type of the
NODE
. This is a Cenum
. Values should be one ofNode_var
,Node_var_new
, orNode_var_array
for function parameters.-
n->vname
The “variable name” of a node. This is not of much use inside externally written extensions.
-
void assoc_clear(NODE *n)
Clears the associative array pointed to by
n
. Make sure that ‘n->type == Node_var_array’ first.-
NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference)
Finds, and installs if necessary, array elements.
symbol
is the array,subs
is the subscript. This is usually a value created withmake_string()
(see below).reference
should beTRUE
if it is an error to use the value before it is created. Typically,FALSE
is the correct value to use from extension functions.-
NODE *make_string(char *s, size_t len)
Take a C string and turn it into a pointer to a
NODE
that can be stored appropriately. This is permanent storage; understanding ofgawk
memory management is helpful.-
NODE *make_number(AWKNUM val)
Take an
AWKNUM
and turn it into a pointer to aNODE
that can be stored appropriately. This is permanent storage; understanding ofgawk
memory management is helpful.-
NODE *dupnode(NODE *n)
Duplicate a node. In most cases, this increments an internal reference count instead of actually duplicating the entire
NODE
; understanding ofgawk
memory management is helpful.-
void unref(NODE *n)
This macro releases the memory associated with a
NODE
allocated withmake_string()
ormake_number()
. Understanding ofgawk
memory management is helpful.-
void make_builtin(const char *name, NODE *(*func)(NODE *), int count)
Register a C function pointed to by
func
as new built-in functionname
.name
is a regular C string.count
is the maximum number of arguments that the function takes. The function should be written in the following manner:/* do_xxx --- do xxx function for gawk */ NODE * do_xxx(int nargs) { … }
-
NODE *get_argument(int i)
This function is called from within a C extension function to get the
i
-th argument from the function call. The first argument is argument zero.-
NODE *get_actual_argument(int i,
-
int optional, int wantarray);
This function retrieves a particular argument
i
.wantarray
isTRUE
if the argument should be an array,FALSE
otherwise. Ifoptional
isTRUE
, the argument need not have been supplied. If it wasn’t, the return value isNULL
. It is a fatal error ifoptional
isTRUE
but the argument was not provided.-
get_scalar_argument(i, opt)
This is a convenience macro that calls
get_actual_argument()
.-
get_array_argument(i, opt)
This is a convenience macro that calls
get_actual_argument()
.-
void update_ERRNO(void)
This function is called from within a C extension function to set the value of
gawk
’sERRNO
variable, based on the current value of the Cerrno
global variable. It is provided as a convenience.-
void update_ERRNO_saved(int errno_saved)
This function is called from within a C extension function to set the value of
gawk
’sERRNO
variable, based on the error value provided as the argument. It is provided as a convenience.-
void register_deferred_variable(const char *name, NODE *(*load_func)(void))
This function is called to register a function to be called when a reference to an undefined variable with the given name is encountered. The callback function will never be called if the variable exists already, so, unless the calling code is running at program startup, it should first check whether a variable of the given name already exists. The argument function must return a pointer to a
NODE
containing the newly created variable. This function is used to implement the builtinENVIRON
andPROCINFO
arrays, so you can refer to them for examples.-
void register_open_hook(void *(*open_func)(IOBUF *))
This function is called to register a function to be called whenever a new data file is opened, leading to the creation of an
IOBUF
structure iniop_alloc()
. After creating the newIOBUF
,iop_alloc()
will call (in reverse order of registration, so the last function registered is called first) each open hook until one returns non-NULL
. If any hook returns a non-NULL
value, that value is assigned to theIOBUF
’sopaque
field (which will presumably point to a structure containing additional state associated with the input processing), and no further open hooks are called.The function called will most likely want to set the
IOBUF
’sget_record
method to indicate that future input records should be retrieved by calling that method instead of using the standardgawk
input processing.And the function will also probably want to set the
IOBUF
’sclose_func
method to be called when the file is closed to clean up any state associated with the input.Finally, hook functions should be prepared to receive an
IOBUF
structure where thefd
field is set toINVALID_HANDLE
, meaning thatgawk
was not able to open the file itself. In this case, the hook function must be able to successfully open the file and place a valid file descriptor there.Currently, for example, the hook function facility is used to implement the XML parser shared library extension. For more info, please look in ‘awk.h’ and in ‘io.c’.
An argument that is supposed to be an array needs to be handled with some extra code, in case the array being passed in is actually from a function parameter.
The following boilerplate code shows how to do this:
NODE *the_arg; /* assume need 3rd arg, 0-based */ the_arg = get_array_argument(2, FALSE); |
Again, you should spend time studying the gawk
internals;
don’t just blindly copy this code.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |