manpagez: man pages & more
info grep
Home | html | info | man

File: grep.info,  Node: Matching Non-ASCII,  Prev: Character Encoding,  Up: Regular Expressions

3.9 Matching Non-ASCII and Non-printable Characters
===================================================

In a regular expression, non-ASCII and non-printable characters other
than newline are not special, and represent themselves.  For example, in
a locale using UTF-8 the command ‘grep 'Λ ω'’ (where the white space
between ‘Λ’ and the ‘ω’ is a tab character) searches for ‘Λ’ (Unicode
character U+039B GREEK CAPITAL LETTER LAMBDA), followed by a tab (U+0009
TAB), followed by ‘ω’ (U+03C9 GREEK SMALL LETTER OMEGA).

   Suppose you want to limit your pattern to only printable characters
(or even only printable ASCII characters) to keep your script readable
or portable, but you also want to match specific non-ASCII or non-null
non-printable characters.  If you are using the ‘-P’ (‘--perl-regexp’)
option, PCREs give you several ways to do this.  Otherwise, if you are
using Bash, the GNU project's shell, you can represent these characters
via ANSI-C quoting.  For example, the Bash commands ‘grep $'Λ\tω'’ and
‘grep $'\u039B\t\u03C9'’ both search for the same three-character string
‘Λ ω’ mentioned earlier.  However, because Bash translates ANSI-C
quoting before ‘grep’ sees the pattern, this technique should not be
used to match printable ASCII characters; for example, ‘grep $'\u005E'’
is equivalent to ‘grep '^'’ and matches any line, not just lines
containing the character ‘^’ (U+005E CIRCUMFLEX ACCENT).

   Since PCREs and ANSI-C quoting are GNU extensions to POSIX, portable
shell scripts written in ASCII should use other methods to match
specific non-ASCII characters.  For example, in a UTF-8 locale the
command ‘grep "$(printf '\316\233\t\317\211\n')"’ is a portable albeit
hard-to-read alternative to Bash's ‘grep $'Λ\tω'’.  However, none of
these techniques will let you put a null character directly into a
command-line pattern; null characters can appear only in a pattern
specified via the ‘-f’ (‘--file’) option.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.