File: sed.info, Node: Character Classes and Bracket Expressions, Next: regexp extensions, Prev: ERE syntax, Up: sed regular expressions 5.5 Character Classes and Bracket Expressions ============================================= A “bracket expression” is a list of characters enclosed by ‘[’ and ‘]’. It matches any single character in that list; if the first character of the list is the caret ‘^’, then it matches any character *not* in the list. For example, the following command replaces the strings ‘gray’ or ‘grey’ with ‘blue’: sed 's/gr[ae]y/blue/' Bracket expressions can be used in both *note basic: BRE syntax. and *note extended: ERE syntax. regular expressions (that is, with or without the ‘-E’/‘-r’ options). Within a bracket expression, a “range expression” consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive. In the default C locale, the sorting sequence is the native character order; for example, ‘[a-d]’ is equivalent to ‘[abcd]’. Finally, certain named classes of characters are predefined within bracket expressions, as follows. These named classes must be used _inside_ brackets themselves. Correct usage: $ echo 1 | sed 's/[[:digit:]]/X/' X Incorrect usage is rejected by newer ‘sed’ versions. Older versions accepted it but treated it as a single bracket expression (which is equivalent to ‘[dgit:]’, that is, only the characters D/G/I/T/:): # current GNU sed versions - incorrect usage rejected $ echo 1 | sed 's/[:digit:]/X/' sed: character class syntax is [[:space:]], not [:space:] # older GNU sed versions $ echo 1 | sed 's/[:digit:]/X/' 1 ‘[:alnum:]’ Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[0-9A-Za-z]’. ‘[:alpha:]’ Alphabetic characters: ‘[:lower:]’ and ‘[:upper:]’; in the ‘C’ locale and ASCII character encoding, this is the same as ‘[A-Za-z]’. ‘[:blank:]’ Blank characters: space and tab. ‘[:cntrl:]’ Control characters. In ASCII, these characters have octal codes 000 through 037, and 177 (DEL). In other character sets, these are the equivalent characters, if any. ‘[:digit:]’ Digits: ‘0 1 2 3 4 5 6 7 8 9’. ‘[:graph:]’ Graphical characters: ‘[:alnum:]’ and ‘[:punct:]’. ‘[:lower:]’ Lower-case letters; in the ‘C’ locale and ASCII character encoding, this is ‘a b c d e f g h i j k l m n o p q r s t u v w x y z’. ‘[:print:]’ Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space. ‘[:punct:]’ Punctuation characters; in the ‘C’ locale and ASCII character encoding, this is ‘! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~’. ‘[:space:]’ Space characters: in the ‘C’ locale, this is tab, newline, vertical tab, form feed, carriage return, and space. ‘[:upper:]’ Upper-case letters: in the ‘C’ locale and ASCII character encoding, this is ‘A B C D E F G H I J K L M N O P Q R S T U V W X Y Z’. ‘[:xdigit:]’ Hexadecimal digits: ‘0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f’. Note that the brackets in these class names are part of the symbolic names, and must be included in addition to the brackets delimiting the bracket expression. Most meta-characters lose their special meaning inside bracket expressions: ‘]’ ends the bracket expression if it's not the first list item. So, if you want to make the ‘]’ character a list item, you must put it first. ‘-’ represents the range if it's not first or last in a list or the ending point of a range. ‘^’ represents the characters not in the list. If you want to make the ‘^’ character a list item, place it anywhere but first. TODO: incorporate this paragraph (copied verbatim from BRE section). The characters ‘$’, ‘*’, ‘.’, ‘[’, and ‘\’ are normally not special within LIST. For example, ‘[\*]’ matches either ‘\’ or ‘*’, because the ‘\’ is not special here. However, strings like ‘[.ch.]’, ‘[=a=]’, and ‘[:space:]’ are special within LIST and represent collating symbols, equivalence classes, and character classes, respectively, and ‘[’ is therefore special within LIST when it is followed by ‘.’, ‘=’, or ‘:’. Also, when not in ‘POSIXLY_CORRECT’ mode, special escapes like ‘\n’ and ‘\t’ are recognized within LIST. *Note Escapes::. ‘[.’ represents the open collating symbol. ‘.]’ represents the close collating symbol. ‘[=’ represents the open equivalence class. ‘=]’ represents the close equivalence class. ‘[:’ represents the open character class symbol, and should be followed by a valid character class name. ‘:]’ represents the close character class symbol.
