manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: Interval Expressions,  Prev: Regexp Operator Details,  Up: Regexp Operators

3.3.2 Some Notes On Interval Expressions
----------------------------------------

Interval expressions were not traditionally available in 'awk'.  They
were added as part of the POSIX standard to make 'awk' and 'egrep'
consistent with each other.

   Initially, because old programs may use '{' and '}' in regexp
constants, 'gawk' did _not_ match interval expressions in regexps.

   However, beginning with version 4.0, 'gawk' does match interval
expressions by default.  This is because compatibility with POSIX has
become more important to most 'gawk' users than compatibility with old
programs.

   For programs that use '{' and '}' in regexp constants, it is good
practice to always escape them with a backslash.  Then the regexp
constants are valid and work the way you want them to, using any version
of 'awk'.(1)

   When '{' and '}' appear in regexp constants in a way that cannot be
interpreted as an interval expression (such as '/q{a}/'), then they
stand for themselves.

   As mentioned, interval expressions were not traditionally available
in 'awk'.  In March of 2019, BWK 'awk' (finally) acquired them.
Starting with version 5.2, 'gawk''s '--traditional' option no longer
disables interval expressions in regular expressions.

   POSIX says that interval expressions containing repetition counts
greater than 255 produce unspecified results.

   In the manual for GNU 'grep', Paul Eggert notes the following:

     Interval expressions may be implemented internally via repetition.
     For example, '^(a|bc){2,4}$' might be implemented as
     '^(a|bc)(a|bc)((a|bc)(a|bc)?)?$'.  A large repetition count may
     exhaust memory or greatly slow matching.  Even small counts can
     cause problems if cascaded; for example, 'grep -E
     ".*{10,}{10,}{10,}{10,}{10,}"' is likely to overflow a stack.
     Fortunately, regular expressions like these are typically
     artificial, and cascaded repetitions do not conform to POSIX so
     cannot be used in portable programs anyway.

This same caveat applies to 'gawk'.

   ---------- Footnotes ----------

   (1) Use two backslashes if you're using a string constant with a
regexp operator or function.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.