[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
6.15.1 Regexp Functions
By default, Guile supports POSIX extended regular expressions. That means that the characters ‘(’, ‘)’, ‘+’ and ‘?’ are special, and must be escaped if you wish to match the literal characters.
This regular expression interface was modeled after that implemented by SCSH, the Scheme Shell. It is intended to be upwardly compatible with SCSH regular expressions.
Zero bytes (#\nul
) cannot be used in regex patterns or input
strings, since the underlying C functions treat that as the end of
string. If there’s a zero byte an error is thrown.
Internally, patterns and input strings are converted to the current locale’s encoding, and then passed to the C library’s regular expression routines (see Regular Expressions in The GNU C Library Reference Manual). The returned match structures always point to characters in the strings, not to individual bytes, even in the case of multi-byte encodings.
- Scheme Procedure: string-match pattern str [start]
Compile the string pattern into a regular expression and compare it with str. The optional numeric argument start specifies the position of str at which to begin matching.
string-match
returns a match structure which describes what, if anything, was matched by the regular expression. See section Match Structures. If str does not match pattern at all,string-match
returns#f
.
Two examples of a match follow. In the first example, the pattern matches the four digits in the match string. In the second, the pattern matches nothing.
(string-match "[0-9][0-9][0-9][0-9]" "blah2002") ⇒ #("blah2002" (4 . 8)) (string-match "[A-Za-z]" "123456") ⇒ #f
Each time string-match
is called, it must compile its
pattern argument into a regular expression structure. This
operation is expensive, which makes string-match
inefficient if
the same regular expression is used several times (for example, in a
loop). For better performance, you can compile a regular expression in
advance and then match strings against the compiled regexp.
- Scheme Procedure: make-regexp pat flag…
- C Function: scm_make_regexp (pat, flaglst)
Compile the regular expression described by pat, and return the compiled regexp structure. If pat does not describe a legal regular expression,
make-regexp
throws aregular-expression-syntax
error.The flag arguments change the behavior of the compiled regular expression. The following values may be supplied:
- Variable: regexp/newline
If a newline appears in the target string, then permit the ‘^’ and ‘$’ operators to match immediately after or immediately before the newline, respectively. Also, the ‘.’ and ‘[^...]’ operators will never match a newline character. The intent of this flag is to treat the target string as a buffer containing many lines of text, and the regular expression as a pattern that may match a single one of those lines.
- Variable: regexp/basic
Compile a basic (“obsolete”) regexp instead of the extended (“modern”) regexps that are the default. Basic regexps do not consider ‘|’, ‘+’ or ‘?’ to be special characters, and require the ‘{...}’ and ‘(...)’ metacharacters to be backslash-escaped (see section Backslash Escapes). There are several other differences between basic and extended regular expressions, but these are the most significant.
- Scheme Procedure: regexp-exec rx str [start [flags]]
- C Function: scm_regexp_exec (rx, str, start, flags)
Match the compiled regular expression rx against
str
. If the optional integer start argument is provided, begin matching from that position in the string. Return a match structure describing the results of the match, or#f
if no match could be found.The flags argument changes the matching behavior. The following flag values may be supplied, use
logior
(see section Bitwise Operations) to combine them,
;; Regexp to match uppercase letters (define r (make-regexp "[A-Z]*")) ;; Regexp to match letters, ignoring case (define ri (make-regexp "[A-Z]*" regexp/icase)) ;; Search for bob using regexp r (match:substring (regexp-exec r "bob")) ⇒ "" ; no match ;; Search for bob using regexp ri (match:substring (regexp-exec ri "Bob")) ⇒ "Bob" ; matched case insensitive
- Scheme Procedure: regexp? obj
- C Function: scm_regexp_p (obj)
Return
#t
if obj is a compiled regular expression, or#f
otherwise.
- Scheme Procedure: list-matches regexp str [flags]
Return a list of match structures which are the non-overlapping matches of regexp in str. regexp can be either a pattern string or a compiled regexp. The flags argument is as per
regexp-exec
above.(map match:substring (list-matches "[a-z]+" "abc 42 def 78")) ⇒ ("abc" "def")
- Scheme Procedure: fold-matches regexp str init proc [flags]
Apply proc to the non-overlapping matches of regexp in str, to build a result. regexp can be either a pattern string or a compiled regexp. The flags argument is as per
regexp-exec
above.proc is called as
(proc match prev)
where match is a match structure and prev is the previous return from proc. For the first call prev is the given init parameter.fold-matches
returns the final value from proc.For example to count matches,
(fold-matches "[a-z][0-9]" "abc x1 def y2" 0 (lambda (match count) (1+ count))) ⇒ 2
Regular expressions are commonly used to find patterns in one string and replace them with the contents of another string. The following functions are convenient ways to do this.
- Scheme Procedure: regexp-substitute port match item …
Write to port selected parts of the match structure match. Or if port is
#f
then form a string from those parts and return that.Each item specifies a part to be written, and may be one of the following,
- A string. String arguments are written out verbatim.
-
An integer. The submatch with that number is written
(
match:substring
). Zero is the entire match. -
The symbol ‘pre’. The portion of the matched string preceding
the regexp match is written (
match:prefix
). -
The symbol ‘post’. The portion of the matched string following
the regexp match is written (
match:suffix
).
For example, changing a match and retaining the text before and after,
(regexp-substitute #f (string-match "[0-9]+" "number 25 is good") 'pre "37" 'post) ⇒ "number 37 is good"
Or matching a YYYYMMDD format date such as ‘20020828’ and re-ordering and hyphenating the fields.
(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])") (define s "Date 20020429 12am.") (regexp-substitute #f (string-match date-regex s) 'pre 2 "-" 3 "-" 1 'post " (" 0 ")") ⇒ "Date 04-29-2002 12am. (20020429)"
- Scheme Procedure: regexp-substitute/global port regexp target item…
-
Write to port selected parts of matches of regexp in target. If port is
#f
then form a string from those parts and return that. regexp can be a string or a compiled regex.This is similar to
regexp-substitute
, but allows global substitutions on target. Each item behaves as perregexp-substitute
, with the following differences,-
A function. Called as
(item match)
with the match structure for the regexp match, it should return a string to be written to port. -
The symbol ‘post’. This doesn’t output anything, but instead
causes
regexp-substitute/global
to recurse on the unmatched portion of target.This must be supplied to perform a global search and replace on target; without it
regexp-substitute/global
returns after a single match and output.
For example, to collapse runs of tabs and spaces to a single hyphen each,
(regexp-substitute/global #f "[ \t]+" "this is the text" 'pre "-" 'post) ⇒ "this-is-the-text"
Or using a function to reverse the letters in each word,
(regexp-substitute/global #f "[a-z]+" "to do and not-do" 'pre (lambda (m) (string-reverse (match:substring m))) 'post) ⇒ "ot od dna ton-od"
Without the
post
symbol, just one regexp match is made. For example the following is the date example fromregexp-substitute
above, without the need for the separatestring-match
call.(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])") (define s "Date 20020429 12am.") (regexp-substitute/global #f date-regex s 'pre 2 "-" 3 "-" 1 'post " (" 0 ")") ⇒ "Date 04-29-2002 12am. (20020429)"
-
A function. Called as
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on April 20, 2013 using texi2html 5.0.