[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
3.9 GNU Extensions for Escapes in Regular Expressions
Until this chapter, we have only encountered escapes of the form
‘\^’, which tell sed
not to interpret the circumflex
as a special character, but rather to take it literally. For
example, ‘\*’ matches a single asterisk rather than zero
or more backslashes.
This chapter introduces another kind of escape(6)—that
is, escapes that are applied to a character or sequence of characters
that ordinarily are taken literally, and that sed
replaces
with a special character. This provides a way
of encoding non-printable characters in patterns in a visible manner.
There is no restriction on the appearance of non-printing characters
in a sed
script but when a script is being prepared in the
shell or by text editing, it is usually easier to use one of
the following escape sequences than the binary character it
represents:
The list of these escapes is:
\a
Produces or matches a BEL character, that is an “alert” (ASCII 7).
\f
Produces or matches a form feed (ASCII 12).
\n
Produces or matches a newline (ASCII 10).
\r
Produces or matches a carriage return (ASCII 13).
\t
Produces or matches a horizontal tab (ASCII 9).
\v
Produces or matches a so called “vertical tab” (ASCII 11).
\cx
Produces or matches CONTROL-x, where x is any character. The precise effect of ‘\cx’ is as follows: if x is a lower case letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus ‘\cz’ becomes hex 1A, but ‘\c{’ becomes hex 3B, while ‘\c;’ becomes hex 7B.
\dxxx
Produces or matches a character whose decimal ASCII value is xxx.
\oxxx
Produces or matches a character whose octal ASCII value is xxx.
\xxx
Produces or matches a character whose hexadecimal ASCII value is xx.
‘\b’ (backspace) was omitted because of the conflict with the existing “word boundary” meaning.
Other escapes match a particular character class and are valid only in regular expressions:
\w
Matches any “word” character. A “word” character is any letter or digit or the underscore character.
\W
Matches any “non-word” character.
\b
Matches a word boundary; that is it matches if the character to the left is a “word” character and the character to the right is a “non-word” character, or vice-versa.
\B
Matches everywhere but on a word boundary; that is it matches if the character to the left and the character to the right are either both “word” characters or both “non-word” characters.
\`
Matches only at the start of pattern space. This is different from
^
in multi-line mode.\'
Matches only at the end of pattern space. This is different from
$
in multi-line mode.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on January 5, 2013 using texi2html 5.0.