manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: Escape Sequences,  Next: Regexp_Operators.php">Regexp Operators,  Prev: Regexp_Usage.php">Regexp Usage,  Up: Regexp

3.2 Escape Sequences
====================

Some characters cannot be included literally in string constants
('"foo"') or regexp constants ('/foo/').  Instead, they should be
represented with "escape sequences", which are character sequences
beginning with a backslash ('\').  One use of an escape sequence is to
include a double-quote character in a string constant.  Because a plain
double quote ends the string, you must use '\"' to represent an actual
double-quote character as a part of the string.  For example:

     $ awk 'BEGIN { print "He said \"hi!\" to her." }'
     -| He said "hi!" to her.

   The backslash character itself is another character that cannot be
included normally; you must write '\\' to put one backslash in the
string or regexp.  Thus, the string whose contents are the two
characters '"' and '\' must be written '"\"\\"'.

   Other escape sequences represent unprintable characters such as TAB
or newline.  There is nothing to stop you from entering most unprintable
characters directly in a string constant or regexp constant, but they
may look ugly.

   The following list presents all the escape sequences used in 'awk'
and what they represent.  Unless noted otherwise, all these escape
sequences apply to both string constants and regexp constants:

'\\'
     A literal backslash, '\'.

'\a'
     The "alert" character, 'Ctrl-g', ASCII code 7 (BEL). (This often
     makes some sort of audible noise.)

'\b'
     Backspace, 'Ctrl-h', ASCII code 8 (BS).

'\f'
     Formfeed, 'Ctrl-l', ASCII code 12 (FF).

'\n'
     Newline, 'Ctrl-j', ASCII code 10 (LF).

'\r'
     Carriage return, 'Ctrl-m', ASCII code 13 (CR).

'\t'
     Horizontal TAB, 'Ctrl-i', ASCII code 9 (HT).

'\v'
     Vertical TAB, 'Ctrl-k', ASCII code 11 (VT).

'\NNN'
     The octal value NNN, where NNN stands for 1 to 3 digits between '0'
     and '7'.  For example, the code for the ASCII ESC (escape)
     character is '\033'.

'\xHH...'
     The hexadecimal value HH, where HH stands for a sequence of
     hexadecimal digits ('0'-'9', and either 'A'-'F' or 'a'-'f').  A
     maximum of two digits are allowed after the '\x'.  Any further
     hexadecimal digits are treated as simple letters or numbers.
     (c.e.)  (The '\x' escape sequence is not allowed in POSIX awk.)

          CAUTION: In ISO C, the escape sequence continues until the
          first nonhexadecimal digit is seen.  For many years, 'gawk'
          would continue incorporating hexadecimal digits into the value
          until a non-hexadecimal digit or the end of the string was
          encountered.  However, using more than two hexadecimal digits
          produced undefined results.  As of version 4.2, only two
          digits are processed.

'\/'
     A literal slash (should be used for regexp constants only).  This
     sequence is used when you want to write a regexp constant that
     contains a slash (such as '/.*:\/home\/[[:alnum:]]+:.*/'; the
     '[[:alnum:]]' notation is discussed in *note Bracket
     Expressions::).  Because the regexp is delimited by slashes, you
     need to escape any slash that is part of the pattern, in order to
     tell 'awk' to keep processing the rest of the regexp.

'\"'
     A literal double quote (should be used for string constants only).
     This sequence is used when you want to write a string constant that
     contains a double quote (such as '"He said \"hi!\" to her."').
     Because the string is delimited by double quotes, you need to
     escape any quote that is part of the string, in order to tell 'awk'
     to keep processing the rest of the string.

   In 'gawk', a number of additional two-character sequences that begin
with a backslash have special meaning in regexps.  *Note GNU Regexp
Operators::.

   In a regexp, a backslash before any character that is not in the
previous list and not listed in *note GNU Regexp Operators:: means that
the next character should be taken literally, even if it would normally
be a regexp operator.  For example, '/a\+b/' matches the three
characters 'a+b'.

   For complete portability, do not use a backslash before any character
not shown in the previous list or that is not an operator.

                  Backslash Before Regular Characters

   If you place a backslash in a string constant before something that
is not one of the characters previously listed, POSIX 'awk' purposely
leaves what happens as undefined.  There are two choices:

Strip the backslash out
     This is what BWK 'awk' and 'gawk' both do.  For example, '"a\qc"'
     is the same as '"aqc"'.  (Because this is such an easy bug both to
     introduce and to miss, 'gawk' warns you about it.)  Consider 'FS =
     "[ \t]+\|[ \t]+"' to use vertical bars surrounded by whitespace as
     the field separator.  There should be two backslashes in the
     string: 'FS = "[ \t]+\\|[ \t]+"'.)

Leave the backslash alone
     Some other 'awk' implementations do this.  In such implementations,
     typing '"a\qc"' is the same as typing '"a\\qc"'.

   To summarize:

   * The escape sequences in the preceding list are always processed
     first, for both string constants and regexp constants.  This
     happens very early, as soon as 'awk' reads your program.

   * 'gawk' processes both regexp constants and dynamic regexps (*note
     Computed Regexps::), for the special operators listed in *note GNU
     Regexp Operators::.

   * A backslash before any other character means to treat that
     character literally.

                  Escape Sequences for Metacharacters

   Suppose you use an octal or hexadecimal escape to represent a regexp
metacharacter.  (See *note Regexp Operators::.)  Does 'awk' treat the
character as a literal character or as a regexp operator?

   Historically, such characters were taken literally.  (d.c.)  However,
the POSIX standard indicates that they should be treated as real
metacharacters, which is what 'gawk' does.  In compatibility mode (*note
Options::), 'gawk' treats the characters represented by octal and
hexadecimal escape sequences literally when used in regexp constants.
Thus, '/a\52b/' is equivalent to '/a\*b/'.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.