manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: Case-sensitivity,  Next: Regexp_Summary.php">Regexp Summary,  Prev: Regexp_Operators.php">GNU Regexp Operators,  Up: Regexp

3.8 Case Sensitivity in Matching
================================

Case is normally significant in regular expressions, both when matching
ordinary characters (i.e., not metacharacters) and inside bracket
expressions.  Thus, a 'w' in a regular expression matches only a
lowercase 'w' and not an uppercase 'W'.

   The simplest way to do a case-independent match is to use a bracket
expression--for example, '[Ww]'.  However, this can be cumbersome if you
need to use it often, and it can make the regular expressions harder to
read.  There are two alternatives that you might prefer.

   One way to perform a case-insensitive match at a particular point in
the program is to convert the data to a single case, using the
'tolower()' or 'toupper()' built-in string functions (which we haven't
discussed yet; *note String Functions::).  For example:

     tolower($1) ~ /foo/  { ... }

converts the first field to lowercase before matching against it.  This
works in any POSIX-compliant 'awk'.

   Another method, specific to 'gawk', is to set the variable
'IGNORECASE' to a nonzero value (*note Built-in Variables::).  When
'IGNORECASE' is not zero, _all_ regexp and string operations ignore
case.

   Changing the value of 'IGNORECASE' dynamically controls the case
sensitivity of the program as it runs.  Case is significant by default
because 'IGNORECASE' (like most variables) is initialized to zero:

     x = "aB"
     if (x ~ /ab/) ...   # this test will fail

     IGNORECASE = 1
     if (x ~ /ab/) ...   # now it will succeed

   In general, you cannot use 'IGNORECASE' to make certain rules case
insensitive and other rules case sensitive, as there is no
straightforward way to set 'IGNORECASE' just for the pattern of a
particular rule.(1)  To do this, use either bracket expressions or
'tolower()'.  However, one thing you can do with 'IGNORECASE' only is
dynamically turn case sensitivity on or off for all the rules at once.

   'IGNORECASE' can be set on the command line or in a 'BEGIN' rule
(*note Other Arguments::; also *note Using BEGIN/END::).  Setting
'IGNORECASE' from the command line is a way to make a program case
insensitive without having to edit it.

   In multibyte locales, the equivalences between upper- and lowercase
characters are tested based on the wide-character values of the locale's
character set.  Prior to version 5.0, single-byte characters were tested
based on the ISO-8859-1 (ISO Latin-1) character set.  However, as of
version 5.0, single-byte characters are also tested based on the values
of the locale's character set.(2)

   The value of 'IGNORECASE' has no effect if 'gawk' is in compatibility
mode (*note Options::).  Case is always significant in compatibility
mode.

   ---------- Footnotes ----------

   (1) Experienced C and C++ programmers will note that it is possible,
using something like 'IGNORECASE = 1 && /foObAr/ { ... }' and
'IGNORECASE = 0 || /foobar/ { ... }'.  However, this is somewhat obscure
and we don't recommend it.

   (2) If you don't understand this, don't worry about it; it just means
that 'gawk' does the right thing.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.