File: gawk.info, Node: Feature History, Next: Common Extensions, Prev: POSIX/GNU, Up: Language History A.6 History of 'gawk' Features ============================== This minor node describes the features in 'gawk' over and above those in POSIX 'awk', in the order they were added to 'gawk'. Version 2.10 of 'gawk' introduced the following features: * The 'AWKPATH' environment variable for specifying a path search for the '-f' command-line option (*note Options::). * The 'IGNORECASE' variable and its effects (*note Case-sensitivity::). * The '/dev/stdin', '/dev/stdout', '/dev/stderr' and '/dev/fd/N' special file names (*note Special Files::). Version 2.13 of 'gawk' introduced the following features: * The 'FIELDWIDTHS' variable and its effects (*note Constant Size::). * The 'systime()' and 'strftime()' built-in functions for obtaining and printing timestamps (*note Time Functions::). * Additional command-line options (*note Options::): - The '-W lint' option to provide error and portability checking for both the source code and at runtime. - The '-W compat' option to turn off the GNU extensions. - The '-W posix' option for full POSIX compliance. Version 2.14 of 'gawk' introduced the following feature: * The 'next file' statement for skipping to the next data file (*note Nextfile Statement::). Version 2.15 of 'gawk' introduced the following features: * New variables (*note Built-in Variables::): - 'ARGIND', which tracks the movement of 'FILENAME' through 'ARGV'. - 'ERRNO', which contains the system error message when 'getline' returns -1 or 'close()' fails. * The '/dev/pid', '/dev/ppid', '/dev/pgrpid', and '/dev/user' special file names. These have since been removed. * The ability to delete all of an array at once with 'delete ARRAY' (*note Delete::). * Command-line option changes (*note Options::): - The ability to use GNU-style long-named options that start with '--'. - The '--source' option for mixing command-line and library-file source code. Version 3.0 of 'gawk' introduced the following features: * New or changed variables: - 'IGNORECASE' changed, now applying to string comparison as well as regexp operations (*note Case-sensitivity::). - 'RT', which contains the input text that matched 'RS' (*note Records::). * Full support for both POSIX and GNU regexps (*note Regexp::). * The 'gensub()' function for more powerful text manipulation (*note String Functions::). * The 'strftime()' function acquired a default time format, allowing it to be called with no arguments (*note Time Functions::). * The ability for 'FS' and for the third argument to 'split()' to be null strings (*note Single Character Fields::). * The ability for 'RS' to be a regexp (*note Records::). * The 'next file' statement became 'nextfile' (*note Nextfile Statement::). * The 'fflush()' function from BWK 'awk' (then at Bell Laboratories; *note I/O Functions::). * New command-line options: - The '--lint-old' option to warn about constructs that are not available in the original Version 7 Unix version of 'awk' (*note V7/SVR3.1::). - The '-m' option from BWK 'awk'. (Brian was still at Bell Laboratories at the time.) This was later removed from both his 'awk' and from 'gawk'. - The '--re-interval' option to provide interval expressions in regexps (*note Regexp Operators::). - The '--traditional' option was added as a better name for '--compat' (*note Options::). * The use of GNU Autoconf to control the configuration process (*note Quick Installation::). * Amiga support. This has since been removed. Version 3.1 of 'gawk' introduced the following features: * New variables (*note Built-in Variables::): - 'BINMODE', for non-POSIX systems, which allows binary I/O for input and/or output files (*note PC Using::). - 'LINT', which dynamically controls lint warnings. - 'PROCINFO', an array for providing process-related information. - 'TEXTDOMAIN', for setting an application's internationalization text domain (*note Internationalization::). * The ability to use octal and hexadecimal constants in 'awk' program source code (*note Nondecimal-numbers::). * The '|&' operator for two-way I/O to a coprocess (*note Two-way I/O::). * The '/inet' special files for TCP/IP networking using '|&' (*note TCP/IP Networking::). * The optional second argument to 'close()' that allows closing one end of a two-way pipe to a coprocess (*note Two-way I/O::). * The optional third argument to the 'match()' function for capturing text-matching subexpressions within a regexp (*note String Functions::). * Positional specifiers in 'printf' formats for making translations easier (*note Printf Ordering::). * A number of new built-in functions: - The 'asort()' and 'asorti()' functions for sorting arrays (*note Array Sorting::). - The 'bindtextdomain()', 'dcgettext()' and 'dcngettext()' functions for internationalization (*note Programmer i18n::). - The 'extension()' function and the ability to add new built-in functions dynamically. This has seen removed. It was replaced by the new extension mechanism. *Note Dynamic Extensions::. - The 'mktime()' function for creating timestamps (*note Time Functions::). - The 'and()', 'or()', 'xor()', 'compl()', 'lshift()', 'rshift()', and 'strtonum()' functions (*note Bitwise Functions::). * The support for 'next file' as two words was removed completely (*note Nextfile Statement::). * Additional command-line options (*note Options::): - The '--dump-variables' option to print a list of all global variables. - The '--exec' option, for use in CGI scripts. - The '--gen-po' command-line option and the use of a leading underscore to mark strings that should be translated (*note String Extraction::). - The '--non-decimal-data' option to allow non-decimal input data (*note Nondecimal Data::). - The '--profile' option and 'pgawk', the profiling version of 'gawk', for producing execution profiles of 'awk' programs (*note Profiling::). - The '--use-lc-numeric' option to force 'gawk' to use the locale's decimal point for parsing input data (*note Conversion::). * The use of GNU Automake to help in standardizing the configuration process (*note Quick Installation::). * The use of GNU 'gettext' for 'gawk''s own message output (*note Gawk I18N::). * BeOS support. This was later removed. * Tandem support. This was later removed. * The Atari port became officially unsupported and was later removed entirely. * The source code changed to use ISO C standard-style function definitions. * POSIX compliance for 'sub()' and 'gsub()' (*note Gory Details::). * The 'length()' function was extended to accept an array argument and return the number of elements in the array (*note String Functions::). * The 'strftime()' function acquired a third argument to enable printing times as UTC (*note Time Functions::). Version 4.0 of 'gawk' introduced the following features: * Variable additions: - 'FPAT', which allows you to specify a regexp that matches the fields, instead of matching the field separator (*note Splitting By Content::). - If 'PROCINFO["sorted_in"]' exists, 'for (iggy in foo)' loops sort the indices before looping over them. The value of this element provides control over how the indices are sorted before the loop traversal starts (*note Controlling Scanning::). - 'PROCINFO["strftime"]', which holds the default format for 'strftime()' (*note Time Functions::). * The special files '/dev/pid', '/dev/ppid', '/dev/pgrpid' and '/dev/user' were removed. * Support for IPv6 was added via the '/inet6' special file. '/inet4' forces IPv4 and '/inet' chooses the system default, which is probably IPv4 (*note TCP/IP Networking::). * The use of '\s' and '\S' escape sequences in regular expressions (*note GNU Regexp Operators::). * Interval expressions became part of default regular expressions (*note Regexp Operators::). * POSIX character classes work even with '--traditional' (*note Regexp Operators::). * 'break' and 'continue' became invalid outside a loop, even with '--traditional' (*note Break Statement::, and also see *note Continue Statement::). * 'fflush()', 'nextfile', and 'delete ARRAY' are allowed if '--posix' or '--traditional', since they are all now part of POSIX. * An optional third argument to 'asort()' and 'asorti()', specifying how to sort (*note String Functions::). * The behavior of 'fflush()' changed to match BWK 'awk' and for POSIX; now both 'fflush()' and 'fflush("")' flush all open output redirections (*note I/O Functions::). * The 'isarray()' function which distinguishes if an item is an array or not, to make it possible to traverse arrays of arrays (*note Type Functions::). * The 'patsplit()' function which gives the same capability as 'FPAT', for splitting (*note String Functions::). * An optional fourth argument to the 'split()' function, which is an array to hold the values of the separators (*note String Functions::). * Arrays of arrays (*note Arrays of Arrays::). * The 'BEGINFILE' and 'ENDFILE' special patterns (*note BEGINFILE/ENDFILE::). * Indirect function calls (*note Indirect Calls::). * 'switch' / 'case' are enabled by default (*note Switch Statement::). * Command-line option changes (*note Options::): - The '-b' and '--characters-as-bytes' options which prevent 'gawk' from treating input as a multibyte string. - The redundant '--compat', '--copyleft', and '--usage' long options were removed. - The '--gen-po' option was finally renamed to the correct '--gen-pot'. - The '--sandbox' option which disables certain features. - All long options acquired corresponding short options, for use in '#!' scripts. * Directories named on the command line now produce a warning, not a fatal error, unless '--posix' or '--traditional' are used (*note Command-line directories::). * The 'gawk' internals were rewritten, bringing the 'dgawk' debugger and possibly improved performance (*note Debugger::). * Per the GNU Coding Standards, dynamic extensions must now define a global symbol indicating that they are GPL-compatible (*note Plugin License::). * In POSIX mode, string comparisons use 'strcoll()' / 'wcscoll()' (*note POSIX String Comparison::). * The option for raw sockets was removed, since it was never implemented (*note TCP/IP Networking::). * Ranges of the form '[d-h]' are treated as if they were in the C locale, no matter what kind of regexp is being used, and even if '--posix' (*note Ranges and Locales::). * Support was removed for the following systems: - Atari - Amiga - BeOS - Cray - MIPS RiscOS - MS-DOS with the Microsoft Compiler - MS-Windows with the Microsoft Compiler - NeXT - SunOS 3.x, Sun 386 (Road Runner) - Tandem (non-POSIX) - Prestandard VAX C compiler for VAX/VMS Version 4.1 of 'gawk' introduced the following features: * Three new arrays: 'SYMTAB', 'FUNCTAB', and 'PROCINFO["identifiers"]' (*note Auto-set::). * The three executables 'gawk', 'pgawk', and 'dgawk', were merged into one, named just 'gawk'. As a result the command-line options changed. * Command-line option changes (*note Options::): - The '-D' option invokes the debugger. - The '-i' and '--include' options load 'awk' library files. - The '-l' and '--load' options load compiled dynamic extensions. - The '-M' and '--bignum' options enable MPFR. - The '-o' option only does pretty-printing. - The '-p' option is used for profiling. - The '-R' option was removed. * Support for high precision arithmetic with MPFR (*note Arbitrary Precision Arithmetic::). * The 'and()', 'or()' and 'xor()' functions changed to allow any number of arguments, with a minimum of two (*note Bitwise Functions::). * The dynamic extension interface was completely redone (*note Dynamic Extensions::). * Redirected 'getline' became allowed inside 'BEGINFILE' and 'ENDFILE' (*note BEGINFILE/ENDFILE::). * The 'where' command was added to the debugger (*note Execution Stack::). * Support for Ultrix was removed. Version 4.2 of 'gawk' introduced the following changes: * Changes to 'ENVIRON' are reflected into 'gawk''s environment and that of programs that it runs. *Note Auto-set::. * 'FIELDWIDTHS' was enhanced to allow skipping characters before assigning a value to a field (*note Splitting By Content::). * The 'PROCINFO["argv"]' array. *Note Auto-set::. * The maximum number of hexadecimal digits in '\x' escapes is now two. *Note Escape Sequences::. * Strongly typed regexp constants of the form '@/.../' (*note Strong Regexp Constants::). * The bitwise functions changed, making negative arguments into a fatal error (*note Bitwise Functions::). * The 'mktime()' function now accepts an optional second argument (*note Time Functions::). * The 'typeof()' function (*note Type Functions::). * Optimizations are enabled by default. Use '-s' / '--no-optimize' to disable optimizations. * For many years, POSIX specified that default field splitting only allowed spaces and tabs to separate fields, and this was how 'gawk' behaved with '--posix'. As of 2013, the standard restored historical behavior, and now default field splitting with '--posix' also allows newlines to separate fields. * Nonfatal output with 'print' and 'printf'. *Note Nonfatal::. * Retryable I/O via 'PROCINFO[INPUT-FILE, "RETRY"]'; (*note Retrying Input::). * Changes to the pretty-printer (*note Profiling::): - The '--pretty-print' option no longer runs the 'awk' program too. - Comments in the source program are preserved and placed into the output file. - Explicit parentheses for expressions in the input are preserved in the generated output. * Improvements to the extension API (*note Dynamic Extensions::): - The 'get_file()' function to access open redirections. - The 'nonfatal()' function for generating nonfatal error messages. - Support for GMP and MPFR values. - Input parsers can now override the default field parsing mechanism by specifying explicit locations. * Shell startup files are supplied with the distribution and installed by 'make install' (*note Shell Startup Files::). * The 'igawk' program and its manual page are no longer installed when 'gawk' is built. *Note Igawk Program::. * Support for MirBSD was removed. * Support for GNU/Linux on Alpha was removed. Version 5.0 added the following features: * The 'PROCINFO["platform"]' array element, which allows you to write code that takes the operating system / platform into account. Version 5.1 was created to release 'gawk' with a correct major version number for the API. This was overlooked for version 5.0, unfortunately. It added the following features: * The index for this manual was completely reworked. * Support was added for MSYS2. * 'asort()' and 'asorti()' were changed to allow 'FUNCTAB' and 'SYMTAB' as the first argument if a second destination array is supplied (*note String Functions::). * The '-I'/'--trace' options were added to print a trace of the byte codes as they execute (*note Options::). * '$0' and the fields are now cleared before starting a 'BEGINFILE' rule (*note BEGINFILE/ENDFILE::). * Several example programs in the manual were updated to their modern POSIX equivalents. * The "no effect" lint warnings from '--lint' were fixed up and now behave more sanely (*note Options::). * Handling of Infinity and NaN values were improved. *Note Math Definitions::, and also see *note POSIX Floating Point Problems::. Version 5.2 added the following features: * The 'mkbool()' built-in function (*note Boolean Functions::). * Interval expressions in regular expressions are enabled by default (*note Interval Expressions::). * Support for the FNV1-A hash algorithm for its hash function (*note Other Environment Variables::). * The 'gawkbug' script for reporting bugs (*note Bug address::). * Terence Kelly's persistent memory allocator (PMA) was added, allowing the use of persistent data on certain systems (*note Persistent Memory::).