info sed


File: sed.info,  Node: Character Classes and Bracket Expressions,  Next: regexp extensions,  Prev: ERE syntax,  Up: sed regular expressions

5.5 Character Classes and Bracket Expressions
=============================================

A “bracket expression” is a list of characters enclosed by ‘[’ and ‘]’.
It matches any single character in that list; if the first character of
the list is the caret ‘^’, then it matches any character *not* in the
list.  For example, the following command replaces the strings ‘gray’ or
‘grey’ with ‘blue’:

     sed  's/gr[ae]y/blue/'

   Bracket expressions can be used in both *note basic: BRE syntax. and
*note extended: ERE syntax. regular expressions (that is, with or
without the ‘-E’/‘-r’ options).

   Within a bracket expression, a “range expression” consists of two
characters separated by a hyphen.  It matches any single character that
sorts between the two characters, inclusive.  In the default C locale,
the sorting sequence is the native character order; for example, ‘[a-d]’
is equivalent to ‘[abcd]’.

   Finally, certain named classes of characters are predefined within
bracket expressions, as follows.

   These named classes must be used _inside_ brackets themselves.
Correct usage:
     $ echo 1 | sed 's/[[:digit:]]/X/'
     X

   Incorrect usage is rejected by newer ‘sed’ versions.  Older versions
accepted it but treated it as a single bracket expression (which is
equivalent to ‘[dgit:]’, that is, only the characters D/G/I/T/:):
     # current GNU sed versions - incorrect usage rejected
     $ echo 1 | sed 's/[:digit:]/X/'
     sed: character class syntax is [[:space:]], not [:space:]

     # older GNU sed versions
     $ echo 1 | sed 's/[:digit:]/X/'
     1

‘[:alnum:]’
     Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘C’
     locale and ASCII character encoding, this is the same as
     ‘[0-9A-Za-z]’.

‘[:alpha:]’
     Alphabetic characters: ‘[:lower:]’ and ‘[:upper:]’; in the ‘C’
     locale and ASCII character encoding, this is the same as
     ‘[A-Za-z]’.

‘[:blank:]’
     Blank characters: space and tab.

‘[:cntrl:]’
     Control characters.  In ASCII, these characters have octal codes
     000 through 037, and 177 (DEL). In other character sets, these are
     the equivalent characters, if any.

‘[:digit:]’
     Digits: ‘0 1 2 3 4 5 6 7 8 9’.

‘[:graph:]’
     Graphical characters: ‘[:alnum:]’ and ‘[:punct:]’.

‘[:lower:]’
     Lower-case letters; in the ‘C’ locale and ASCII character encoding,
     this is ‘a b c d e f g h i j k l m n o p q r s t u v w x y z’.

‘[:print:]’
     Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.

‘[:punct:]’
     Punctuation characters; in the ‘C’ locale and ASCII character
     encoding, this is ‘! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \
     ] ^ _ ` { | } ~’.

‘[:space:]’
     Space characters: in the ‘C’ locale, this is tab, newline, vertical
     tab, form feed, carriage return, and space.

‘[:upper:]’
     Upper-case letters: in the ‘C’ locale and ASCII character encoding,
     this is ‘A B C D E F G H I J K L M N O P Q R S T U V W X Y Z’.

‘[:xdigit:]’
     Hexadecimal digits: ‘0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f’.

   Note that the brackets in these class names are part of the symbolic
names, and must be included in addition to the brackets delimiting the
bracket expression.

   Most meta-characters lose their special meaning inside bracket
expressions:

‘]’
     ends the bracket expression if it's not the first list item.  So,
     if you want to make the ‘]’ character a list item, you must put it
     first.

‘-’
     represents the range if it's not first or last in a list or the
     ending point of a range.

‘^’
     represents the characters not in the list.  If you want to make the
     ‘^’ character a list item, place it anywhere but first.

   TODO: incorporate this paragraph (copied verbatim from BRE section).

   The characters ‘$’, ‘*’, ‘.’, ‘[’, and ‘\’ are normally not special
within LIST.  For example, ‘[\*]’ matches either ‘\’ or ‘*’, because the
‘\’ is not special here.  However, strings like ‘[.ch.]’, ‘[=a=]’, and
‘[:space:]’ are special within LIST and represent collating symbols,
equivalence classes, and character classes, respectively, and ‘[’ is
therefore special within LIST when it is followed by ‘.’, ‘=’, or ‘:’.
Also, when not in ‘POSIXLY_CORRECT’ mode, special escapes like ‘\n’ and
‘\t’ are recognized within LIST.  *Note Escapes::.

‘[.’
     represents the open collating symbol.

‘.]’
     represents the close collating symbol.

‘[=’
     represents the open equivalence class.

‘=]’
     represents the close equivalence class.

‘[:’
     represents the open character class symbol, and should be followed
     by a valid character class name.

‘:]’
     represents the close character class symbol.