[Top] | [Contents] | [Index] | [ ? ] |
General Introduction
This file documents awk
, a program that you can use to select
particular records in a file and perform operations upon them.
Copyright © 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999,
2000, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011
Free Software Foundation, Inc.
This is Edition 4 of GAWK: Effective AWK Programming: A User’s Guide for GNU Awk, for the 4.0.0 (or later) version of the GNU implementation of AWK.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being “GNU General Public License”, the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled “GNU Free Documentation License”.
- “A GNU Manual”
- “You have the freedom to copy and modify this GNU manual. Buying copies from the FSF supports it in developing GNU and promoting software freedom.”
Foreword | Some nice words about this Web page. | |
Preface | What this Web page is about; brief history and acknowledgments. | |
1. Getting Started with awk | A basic introduction to using
awk . How to run an awk
program. Command-line syntax.
| |
2. Running awk and gawk | How to run gawk .
| |
3. Regular Expressions | All about matching things using regular expressions. | |
4. Reading Input Files | How to read files and manipulate fields. | |
5. Printing Output | How to print using awk . Describes
the print and printf
statements. Also describes redirection of
output.
| |
6. Expressions | Expressions are the basic building blocks of statements. | |
7. Patterns, Actions, and Variables | Overviews of patterns and actions. | |
8. Arrays in awk | The description and use of arrays. Also includes array-oriented control statements. | |
9. Functions | Built-in and user-defined functions. | |
10. Internationalization with gawk | Getting gawk to speak your
language.
| |
11. Advanced Features of gawk | Stuff for advanced users, specific to
gawk .
| |
12. A Library of awk Functions | ||
13. Practical awk Programs | Many awk programs with complete
explanations.
| |
14. dgawk : The awk Debugger | The dgawk debugger.
| |
A. The Evolution of the awk Language | The evolution of the awk
language.
| |
B. Installing gawk | Installing gawk under various
operating systems.
| |
C. Implementation Notes | Notes about gawk extensions and
possible future work.
| |
D. Basic Programming Concepts | A very quick introduction to programming concepts. | |
Glossary | An explanation of some unfamiliar terms. | |
GNU General Public License | Your right to copy and distribute
gawk .
| |
GNU Free Documentation License | The license for this Web page. | |
Index | Concept and Variable Index. | |
History of awk and gawk | The history of gawk and
awk .
| |
0.1 A Rose by Any Other Name | What name to use to find awk .
| |
0.2 Using This Book | Using this Web page. Includes sample input files that you can use. | |
0.3 Typographical Conventions | ||
The GNU Project and This Book | Brief history of the GNU project and this Web page. | |
How to Contribute | Helping to save the world. | |
Acknowledgments | ||
1.1 How to Run awk Programs | How to run gawk programs;
includes command-line syntax.
| |
1.1.1 One-Shot Throwaway awk Programs | Running a short throwaway awk
program.
| |
1.1.2 Running awk Without Input Files | Using no input files (input from terminal instead). | |
1.1.3 Running Long Programs | Putting permanent awk
programs in files.
| |
1.1.4 Executable awk Programs | Making self-contained awk
programs.
| |
1.1.5 Comments in awk Programs | Adding documentation to gawk
programs.
| |
1.1.6 Shell-Quoting Issues | More discussion of shell quoting issues. | |
1.1.6.1 Quoting in MS-Windows Batch Files | Quoting in Windows Batch Files. | |
1.2 Data Files for the Examples | Sample data files for use in the
awk programs illustrated in
this Web page.
| |
1.3 Some Simple Examples | A very simple example. | |
1.4 An Example with Two Rules | A less simple one-line example using two rules. | |
1.5 A More Complex Example | A more complex example. | |
1.6 awk Statements Versus Lines | Subdividing or combining statements into lines. | |
1.7 Other Features of awk | ||
1.8 When to Use awk | When to use gawk and when to
use other things.
| |
2.1 Invoking awk | How to run awk .
| |
2.2 Command-Line Options | Command-line options and their meanings. | |
2.3 Other Command-Line Arguments | Input file names and variable assignments. | |
2.4 Naming Standard Input | How to specify standard input with other files. | |
2.5 The Environment Variables gawk Uses | The environment variables
gawk uses.
| |
2.5.1 The AWKPATH Environment Variable | Searching directories for awk
programs.
| |
2.5.2 Other Environment Variables | The environment variables. | |
2.6 gawk ’s Exit Status | gawk ’s exit status.
| |
2.7 Including Other Files Into Your Program | Including other files into your program. | |
2.8 Obsolete Options and/or Features | Obsolete Options and/or features. | |
2.9 Undocumented Options and Features | ||
3.1 How to Use Regular Expressions | ||
3.2 Escape Sequences | How to write nonprinting characters. | |
3.3 Regular Expression Operators | ||
3.4 Using Bracket Expressions | What can go between ‘[...]’. | |
3.5 gawk -Specific Regexp Operators | Operators specific to GNU software. | |
3.6 Case Sensitivity in Matching | How to do case-insensitive matching. | |
3.7 How Much Text Matches? | How much text matches. | |
3.8 Using Dynamic Regexps | ||
4.1 How Input Is Split into Records | Controlling how data is split into records. | |
4.2 Examining Fields | An introduction to fields. | |
4.3 Nonconstant Field Numbers | ||
4.4 Changing the Contents of a Field | ||
4.5 Specifying How Fields Are Separated | The field separator and how to change it. | |
4.5.1 Whitespace Normally Separates Fields | How fields are normally separated. | |
4.5.2 Using Regular Expressions to Separate Fields | Using regexps as the field separator. | |
4.5.3 Making Each Character a Separate Field | Making each character a separate field. | |
4.5.4 Setting FS from the Command Line | Setting FS from the
command-line.
| |
4.5.5 Field-Splitting Summary | Some final points and a summary table. | |
4.6 Reading Fixed-Width Data | Reading constant width data. | |
4.7 Defining Fields By Content | ||
4.8 Multiple-Line Records | Reading multi-line records. | |
4.9 Explicit Input with getline | Reading files under explicit program
control using the getline
function.
| |
4.9.1 Using getline with No Arguments | Using getline with no arguments.
| |
4.9.2 Using getline into a Variable | Using getline into a variable.
| |
4.9.3 Using getline from a File | Using getline from a file.
| |
4.9.4 Using getline into a Variable from a File | Using getline into a variable
from a file.
| |
4.9.5 Using getline from a Pipe | Using getline from a pipe.
| |
4.9.6 Using getline into a Variable from a Pipe | Using getline into a variable
from a pipe.
| |
4.9.7 Using getline from a Coprocess | Using getline from a coprocess.
| |
4.9.8 Using getline into a Variable from a Coprocess | Using getline into a variable
from a coprocess.
| |
4.9.9 Points to Remember About getline | Important things to know about
getline .
| |
4.9.10 Summary of getline Variants | ||
4.10 Directories On The Command Line | What happens if you put a directory on the command line. | |
5.1 The print Statement | The print statement.
| |
5.2 print Statement Examples | Simple examples of print
statements.
| |
5.3 Output Separators | The output separators and how to change them. | |
5.4 Controlling Numeric Output with print | Controlling Numeric Output With
print .
| |
5.5 Using printf Statements for Fancier Printing | The printf statement.
| |
5.5.1 Introduction to the printf Statement | Syntax of the printf statement.
| |
5.5.2 Format-Control Letters | Format-control letters. | |
5.5.3 Modifiers for printf Formats | Format-specification modifiers. | |
5.5.4 Examples Using printf | Several examples. | |
5.6 Redirecting Output of print and printf | How to redirect output to multiple files and pipes. | |
5.7 Special File Names in gawk | File name interpretation in
gawk . gawk allows
access to inherited file descriptors.
| |
5.7.1 Special Files for Standard Descriptors | Special files for I/O. | |
5.7.2 Special Files for Network Communications | Special files for network communications. | |
5.7.3 Special File Name Caveats | Things to watch out for. | |
5.8 Closing Input and Output Redirections | Closing Input and Output Files and Pipes. | |
6.1 Constants, Variables and Conversions | Constants, Variables, and Regular Expressions. | |
6.1.1 Constant Expressions | String, numeric and regexp constants. | |
6.1.1.1 Numeric and String Constants | Numeric and string constants. | |
6.1.1.2 Octal and Hexadecimal Numbers | What are octal and hex numbers. | |
6.1.1.3 Regular Expression Constants | Regular Expression constants. | |
6.1.2 Using Regular Expression Constants | When and how to use a regexp constant. | |
6.1.3 Variables | Variables give names to values for later use. | |
6.1.3.1 Using Variables in a Program | Using variables in your programs. | |
6.1.3.2 Assigning Variables on the Command Line | Setting variables on the command-line and a summary of command-line syntax. This is an advanced method of input. | |
6.1.4 Conversion of Strings and Numbers | The conversion of strings to numbers and vice versa. | |
6.2 Operators: Doing Something With Values | gawk ’s operators.
| |
6.2.1 Arithmetic Operators | Arithmetic operations (‘+’, ‘-’, etc.) | |
6.2.2 String Concatenation | Concatenating strings. | |
6.2.3 Assignment Expressions | Changing the value of a variable or a field. | |
6.2.4 Increment and Decrement Operators | Incrementing the numeric value of a variable. | |
6.3 Truth Values and Conditions | Testing for true and false. | |
6.3.1 True and False in awk | What is “true” and what is “false”. | |
6.3.2 Variable Typing and Comparison Expressions | How variables acquire types and how this affects comparison of numbers and strings with ‘<’, etc. | |
6.3.2.1 String Type Versus Numeric Type | String type versus numeric type. | |
6.3.2.2 Comparison Operators | The comparison operators. | |
6.3.2.3 String Comparison With POSIX Rules | String comparison with POSIX rules. | |
6.3.3 Boolean Expressions | Combining comparison expressions using boolean operators ‘||’ (“or”), ‘&&’ (“and”) and ‘!’ (“not”). | |
6.3.4 Conditional Expressions | Conditional expressions select between two subexpressions under control of a third subexpression. | |
6.4 Function Calls | A function call is an expression. | |
6.5 Operator Precedence (How Operators Nest) | How various operators nest. | |
6.6 Where You Are Makes A Difference | How the locale affects things. | |
7.1 Pattern Elements | What goes into a pattern. | |
7.1.1 Regular Expressions as Patterns | Using regexps as patterns. | |
7.1.2 Expressions as Patterns | Any expression can be used as a pattern. | |
7.1.3 Specifying Record Ranges with Patterns | Pairs of patterns specify record ranges. | |
7.1.4 The BEGIN and END Special Patterns | Specifying initialization and cleanup rules. | |
7.1.4.1 Startup and Cleanup Actions | How and why to use BEGIN/END rules. | |
7.1.4.2 Input/Output from BEGIN and END Rules | I/O issues in BEGIN/END rules. | |
7.1.5 The BEGINFILE and ENDFILE Special Patterns | Two special patterns for advanced control. | |
7.1.6 The Empty Pattern | The empty pattern, which matches every record. | |
7.2 Using Shell Variables in Programs | How to use shell variables with
awk .
| |
7.3 Actions | What goes into an action. | |
7.4 Control Statements in Actions | Describes the various control statements in detail. | |
7.4.1 The if -else Statement | Conditionally execute some
awk statements.
| |
7.4.2 The while Statement | Loop until some condition is satisfied. | |
7.4.3 The do -while Statement | Do specified action while looping until some condition is satisfied. | |
7.4.4 The for Statement | Another looping statement, that provides initialization and increment clauses. | |
7.4.5 The switch Statement | Switch/case evaluation for conditional execution of statements based on a value. | |
7.4.6 The break Statement | Immediately exit the innermost enclosing loop. | |
7.4.7 The continue Statement | Skip to the end of the innermost enclosing loop. | |
7.4.8 The next Statement | Stop processing the current input record. | |
7.4.9 Using gawk ’s nextfile Statement | Stop processing the current file. | |
7.4.10 The exit Statement | Stop execution of awk .
| |
7.5 Built-in Variables | Summarizes the built-in variables. | |
7.5.1 Built-in Variables That Control awk | Built-in variables that you change to
control awk .
| |
7.5.2 Built-in Variables That Convey Information | Built-in variables where awk
gives you information.
| |
7.5.3 Using ARGC and ARGV | Ways to use ARGC and
ARGV .
| |
8.1 The Basics of Arrays | The basics of arrays. | |
8.1.1 Introduction to Arrays | ||
8.1.2 Referring to an Array Element | How to examine one element of an array. | |
8.1.3 Assigning Array Elements | How to change an element of an array. | |
8.1.4 Basic Array Example | Basic Example of an Array | |
8.1.5 Scanning All Elements of an Array | A variation of the for
statement. It loops through the indices
of an array’s existing elements.
| |
8.2 The delete Statement | The delete statement removes an
element from an array.
| |
8.3 Using Numbers to Subscript Arrays | How to use numbers as subscripts in
awk .
| |
8.4 Using Uninitialized Variables as Subscripts | Using Uninitialized variables as subscripts. | |
8.5 Multidimensional Arrays | Emulating multidimensional arrays in
awk .
| |
8.5.1 Scanning Multidimensional Arrays | Scanning multidimensional arrays. | |
8.6 Arrays of Arrays | True multidimensional arrays. | |
9.1 Built-in Functions | Summarizes the built-in functions. | |
9.1.1 Calling Built-in Functions | How to call built-in functions. | |
9.1.2 Numeric Functions | Functions that work with numbers,
including int() , sin()
and rand() .
| |
9.1.3 String-Manipulation Functions | Functions for string manipulation, such
as split() , match() and
sprintf() .
| |
9.1.3.1 More About ‘\’ and ‘&’ with sub() , gsub() , and gensub() | More than you want to know about
‘\’ and ‘&’ with
sub() , gsub() , and
gensub() .
| |
9.1.4 Input/Output Functions | Functions for files and shell commands. | |
9.1.5 Time Functions | Functions for dealing with timestamps. | |
9.1.6 Bit-Manipulation Functions | Functions for bitwise operations. | |
9.1.7 Getting Type Information | Functions for type information. | |
9.1.8 String-Translation Functions | Functions for string translation. | |
9.2 User-Defined Functions | Describes User-defined functions in detail. | |
9.2.1 Function Definition Syntax | How to write definitions and what they mean. | |
9.2.2 Function Definition Examples | An example function definition and what it does. | |
9.2.3 Calling User-Defined Functions | Things to watch out for. | |
9.2.3.1 Writing A Function Call | Don’t use spaces. | |
9.2.3.2 Controlling Variable Scope | Controlling variable scope. | |
9.2.3.3 Passing Function Arguments By Value Or By Reference | Passing parameters. | |
9.2.4 The return Statement | Specifying the value a function returns. | |
9.2.5 Functions and Their Effects on Variable Typing | How variable types can change at runtime. | |
9.3 Indirect Function Calls | Choosing the function to call at runtime. | |
10.1 Internationalization and Localization | ||
10.2 GNU gettext | How GNU gettext works.
| |
10.3 Internationalizing awk Programs | Features for the programmer. | |
10.4 Translating awk Programs | Features for the translator. | |
10.4.1 Extracting Marked Strings | Extracting marked strings. | |
10.4.2 Rearranging printf Arguments | Rearranging printf arguments.
| |
10.4.3 awk Portability Issues | awk -level portability issues.
| |
10.5 A Simple Internationalization Example | A simple i18n example. | |
10.6 gawk Can Speak Your Language | gawk is also
internationalized.
| |
11.1 Allowing Nondecimal Input Data | Allowing nondecimal input data. | |
11.2 Controlling Array Traversal and Array Sorting | Facilities for controlling array traversal and sorting arrays. | |
11.2.1 Controlling Array Traversal | How to use PROCINFO["sorted_in"]. | |
11.2.1.1 Array Scanning Using A User-defined Function | Using a function to control scanning. | |
11.2.1.2 Controlling Array Scanning Order | Controlling the order in which arrays are scanned. | |
11.2.2 Sorting Array Values and Indices with gawk | How to use asort() and
asorti() .
| |
11.3 Two-Way Communications with Another Process | Two-way communications with another process. | |
11.4 Using gawk for Network Programming | Using gawk for network
programming.
| |
11.5 Profiling Your awk Programs | Profiling your awk programs.
| |
12.1 Naming Library Function Global Variables | How to best name private global variables in library functions. | |
12.2 General Programming | Functions that are of general use. | |
12.2.1 Converting Strings To Numbers | A replacement for the built-in
strtonum() function.
| |
12.2.2 Assertions | A function for assertions in
awk programs.
| |
12.2.3 Rounding Numbers | A function for rounding if
sprintf() does not do it
correctly.
| |
12.2.4 The Cliff Random Number Generator | ||
12.2.5 Translating Between Characters and Numbers | Functions for using characters as numbers and vice versa. | |
12.2.6 Merging an Array into a String | A function to join an array into a string. | |
12.2.7 Managing the Time of Day | A function to get formatted times. | |
12.3 Data File Management | Functions for managing command-line data files. | |
12.3.1 Noting Data File Boundaries | A function for handling data file transitions. | |
12.3.2 Rereading the Current File | A function for rereading the current file. | |
12.3.3 Checking for Readable Data Files | Checking that data files are readable. | |
12.3.4 Checking For Zero-length Files | Checking for zero-length files. | |
12.3.5 Treating Assignments as File Names | Treating assignments as file names. | |
12.4 Processing Command-Line Options | A function for processing command-line arguments. | |
12.5 Reading the User Database | Functions for getting user information. | |
12.6 Reading the Group Database | Functions for getting group information. | |
12.7 Traversing Arrays of Arrays | A function to walk arrays of arrays. | |
13.1 Running the Example Programs | How to run these examples. | |
13.2 Reinventing Wheels for Fun and Profit | Clones of common utilities. | |
13.2.1 Cutting out Fields and Columns | The cut utility.
| |
13.2.2 Searching for Regular Expressions in Files | The egrep utility.
| |
13.2.3 Printing out User Information | The id utility.
| |
13.2.4 Splitting a Large File into Pieces | The split utility.
| |
13.2.5 Duplicating Output into Multiple Files | The tee utility.
| |
13.2.6 Printing Nonduplicated Lines of Text | The uniq utility.
| |
13.2.7 Counting Things | The wc utility.
| |
13.3 A Grab Bag of awk Programs | Some interesting awk
programs.
| |
13.3.1 Finding Duplicated Words in a Document | Finding duplicated words in a document. | |
13.3.2 An Alarm Clock Program | An alarm clock. | |
13.3.3 Transliterating Characters | A program similar to the tr
utility.
| |
13.3.4 Printing Mailing Labels | Printing mailing labels. | |
13.3.5 Generating Word-Usage Counts | A program to produce a word usage count. | |
13.3.6 Removing Duplicates from Unsorted Text | Eliminating duplicate entries from a history file. | |
13.3.7 Extracting Programs from Texinfo Source Files | Pulling out programs from Texinfo source files. | |
13.3.8 A Simple Stream Editor | ||
13.3.9 An Easy Way to Use Library Functions | A wrapper for awk that
includes files.
| |
13.3.10 Finding Anagrams From A Dictionary | Finding anagrams from a dictionary. | |
13.3.11 And Now For Something Completely Different | People do amazing things with too much time on their hands. | |
14.1 Introduction to dgawk | ||
14.1.1 Debugging In General | ||
14.1.2 Additional Debugging Concepts | ||
14.1.3 Awk Debugging | ||
14.2 Sample dgawk session | ||
14.2.1 dgawk Invocation | ||
14.2.2 Finding The Bug | ||
14.3 Main dgawk Commands | ||
14.3.1 Control Of Breakpoints | Control of breakpoints. | |
14.3.2 Control of Execution | Control of execution. | |
14.3.3 Viewing and Changing Data | Viewing and changing data. | |
14.3.4 Dealing With The Stack | Dealing with the stack. | |
14.3.5 Obtaining Information About The Program and The Debugger State | Obtaining information about the program and the debugger state. | |
14.3.6 Miscellaneous Commands | ||
14.4 Readline Support | ||
14.5 Limitations and Future Plans | Limitations and future plans. | |
A.1 Major Changes Between V7 and SVR3.1 | The major changes between V7 and System V Release 3.1. | |
A.2 Changes Between SVR3.1 and SVR4 | Minor changes between System V Releases 3.1 and 4. | |
A.3 Changes Between SVR4 and POSIX awk | New features from the POSIX standard. | |
A.4 Extensions in Brian Kernighan’s awk | New features from Brian Kernighan’s
version of awk .
| |
A.5 Extensions in gawk Not in POSIX awk | The extensions in gawk not in
POSIX awk .
| |
A.6 Common Extensions Summary | ||
A.7 Regexp Ranges and Locales: A Long Sad Story | How locales used to affect regexp ranges. | |
A.8 Major Contributors to gawk | The major contributors to
gawk .
| |
B.1 The gawk Distribution | What is in the gawk
distribution.
| |
B.1.1 Getting the gawk Distribution | How to get the distribution. | |
B.1.2 Extracting the Distribution | How to extract the distribution. | |
B.1.3 Contents of the gawk Distribution | What is in the distribution. | |
B.2 Compiling and Installing gawk on Unix-like Systems | Installing gawk under various
versions of Unix.
| |
B.2.1 Compiling gawk for Unix-like Systems | Compiling gawk under Unix.
| |
B.2.2 Additional Configuration Options | Other compile-time options. | |
B.2.3 The Configuration Process | How it’s all supposed to work. | |
B.3 Installation on Other Operating Systems | ||
B.3.1 Installation on PC Operating Systems | Installing and Compiling gawk
on MS-DOS and OS/2.
| |
B.3.1.1 Installing a Prepared Distribution for PC Systems | Installing a prepared distribution. | |
B.3.1.2 Compiling gawk for PC Operating Systems | Compiling gawk for MS-DOS,
Windows32, and OS/2.
| |
B.3.1.3 Testing gawk on PC Operating Systems | Testing gawk on PC systems.
| |
B.3.1.4 Using gawk on PC Operating Systems | Running gawk on MS-DOS,
Windows32 and OS/2.
| |
B.3.1.5 Using gawk In The Cygwin Environment | Building and running gawk for
Cygwin.
| |
B.3.1.6 Using gawk In The MSYS Environment | ||
B.3.2 How to Compile and Install gawk on VMS | Installing gawk on VMS.
| |
B.3.2.1 Compiling gawk on VMS | How to compile gawk under
VMS.
| |
B.3.2.2 Installing gawk on VMS | How to install gawk under
VMS.
| |
B.3.2.3 Running gawk on VMS | How to run gawk under VMS.
| |
B.3.2.4 Some VMS Systems Have An Old Version of gawk | An old version comes with some VMS systems. | |
B.4 Reporting Problems and Bugs | ||
B.5 Other Freely Available awk Implementations | Other freely available awk
implementations.
| |
C.1 Downward Compatibility and Debugging | How to disable certain gawk
extensions.
| |
C.2 Making Additions to gawk | Making Additions To gawk .
| |
C.2.1 Accessing The gawk Git Repository | Accessing the Git repository. | |
C.2.2 Adding New Features | Adding code to the main body of
gawk .
| |
C.2.3 Porting gawk to a New Operating System | Porting gawk to a new
operating system.
| |
C.3 Adding New Built-in Functions to gawk | Adding new built-in functions to
gawk .
| |
C.3.1 A Minimal Introduction to gawk Internals | A brief look at some gawk
internals.
| |
C.3.2 Extension Licensing | A note about licensing. | |
C.3.3 Example: Directory and File Operation Built-ins | A example of new functions. | |
C.3.3.1 Using chdir() and stat() | What the new functions will do. | |
C.3.3.2 C Code for chdir() and stat() | The code for internal file operations. | |
C.3.3.3 Integrating the Extensions | How to use an external extension. | |
C.4 Probable Future Extensions | New features that may be implemented one day. | |
D.1 What a Program Does | The high level view. | |
D.2 Data Values in a Computer | A very quick intro to data types. | |
D.3 Floating-Point Number Caveats | Stuff to know about floating-point numbers. | |
D.3.1 The String Value Can Lie | ||
D.3.2 Floating Point Numbers Are Not Abstract Numbers | ||
D.3.3 Standards Versus Existing Practice |
[Top] | [Contents] | [Index] | [ ? ] |