File: sed.info, Node: Escapes, Next: Locale Considerations, Prev: Back-references and Subexpressions, Up: sed regular expressions 5.8 Escape Sequences - specifying special characters ==================================================== Until this chapter, we have only encountered escapes of the form ‘\^’, which tell ‘sed’ not to interpret the circumflex as a special character, but rather to take it literally. For example, ‘\*’ matches a single asterisk rather than zero or more backslashes. This chapter introduces another kind of escape(1)--that is, escapes that are applied to a character or sequence of characters that ordinarily are taken literally, and that ‘sed’ replaces with a special character. This provides a way of encoding non-printable characters in patterns in a visible manner. There is no restriction on the appearance of non-printing characters in a ‘sed’ script but when a script is being prepared in the shell or by text editing, it is usually easier to use one of the following escape sequences than the binary character it represents: The list of these escapes is: ‘\a’ Produces or matches a BEL character, that is an "alert" (ASCII 7). ‘\f’ Produces or matches a form feed (ASCII 12). ‘\n’ Produces or matches a newline (ASCII 10). ‘\r’ Produces or matches a carriage return (ASCII 13). ‘\t’ Produces or matches a horizontal tab (ASCII 9). ‘\v’ Produces or matches a so called "vertical tab" (ASCII 11). ‘\cX’ Produces or matches ‘CONTROL-X’, where X is any character. The precise effect of ‘\cX’ is as follows: if X is a lower case letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus ‘\cz’ becomes hex 1A, but ‘\c{’ becomes hex 3B, while ‘\c;’ becomes hex 7B. ‘\dXXX’ Produces or matches a character whose decimal ASCII value is XXX. ‘\oXXX’ Produces or matches a character whose octal ASCII value is XXX. ‘\xXX’ Produces or matches a character whose hexadecimal ASCII value is XX. ‘\b’ (backspace) was omitted because of the conflict with the existing "word boundary" meaning. 5.8.1 Escaping Precedence ------------------------- GNU ‘sed’ processes escape sequences _before_ passing the text onto the regular-expression matching of the ‘s///’ command and address matching. Thus the following two commands are equivalent (‘0x5e’ is the hexadecimal ASCII value of the character ‘^’): $ echo 'a^c' | sed 's/^/b/' ba^c $ echo 'a^c' | sed 's/\x5e/b/' ba^c As are the following (‘0x5b’,‘0x5d’ are the hexadecimal ASCII values of ‘[’,‘]’, respectively): $ echo abc | sed 's/[a]/x/' xbc $ echo abc | sed 's/\x5ba\x5d/x/' xbc However it is recommended to avoid such special characters due to unexpected edge-cases. For example, the following are not equivalent: $ echo 'a^c' | sed 's/\^/b/' abc $ echo 'a^c' | sed 's/\\\x5e/b/' a^c ---------- Footnotes ---------- (1) All the escapes introduced here are GNU extensions, with the exception of ‘\n’. In basic regular expression mode, setting ‘POSIXLY_CORRECT’ disables them inside bracket expressions.
