2 Interface
2.1 Syntax
We currently use our own syntax for regular expressions, since the POSIX syntax does not allow for expressing complements or intersections of regular expressions.
E |
| ::= |
| E | E |
| union |
| | |
| E & E |
| intersection | |
| | |
| E E |
| concatenate | |
| | |
| ¬E |
| complement | |
| | |
| ~E |
| complement | |
| | |
| E* |
| zero or more repeats | |
| | |
| E+ |
| one or more repeats | |
| | |
| E{i} |
| repeat | |
| | |
| «E» |
| submatch | |
| | |
| (E) |
| change precedence | |
| | |
| [r] |
| character range | |
| | |
| [¬r] |
| complement ranges | |
| | |
| $ |
| every character | |
| | |
| c |
| literal character | |
| | |
| <empty string> |
| ||
r |
| ::= |
| c |
| single character |
| | |
| c-c |
| character range | |
i |
| ::= |
| <integer> |
| |
c |
| ::= |
| <single character> |
|
Rules higher on this list are "looser" than rules lower on the list. For example, the expression ab&cd|ef is equivalent to ((ab)&(cd))|(ef).
2.2 Matching
Note that one-more-re-nightmare can avoid a cache lookup (involving acquiring a lock and hash table searching) if the regular expression is a literal string, or a constant variable bound to a string.
first-match regular-expression string &key start endFunction
first-string-match regular-expression string &key start endFunction
Find the first match for regular-expression in string between start and end.
first-match either returns a simple vector, where each element is a register. The first two registers are always the start and end of the match, and then subsequent registers are the start and end of each submatch. A register is either a bounding index of string or nil (when there is no submatch), or nil if there is no match.
first-string-match either returns a simple vector, every element of which is a fresh string or nil (when there is no submatch), or nil if there is no match.
Examples
(first-match "[0-9]([0-9]| )+" "Phone: 632 3003")
;; => #(6 15)
(first-string-match "[0-9]([0-9]| )+" "Phone: 632 3003")
;; => "632 3003"
(first-match
"«[0-9]+»x«[0-9]+»|«[0-9]+»p"
"Foobar 1920x1080 17-inch display")
;; => #(7 16 7 11 12 16 NIL NIL)
(first-string-match
"«[0-9]+»x«[0-9]+»|«[0-9]+»p"
"Foobar 1920x1080 17-inch display")
;; => #("1920x1080" "1920" "1080" NIL)
all-matches regular-expression string &key start endFunction
all-string-matches regular-expression string &key start endFunction
Find all matches for regular-expression in string between start and end.
Both functions return a list of matches; all-matches represents matches as first-match does, and all-string-matches represents matches as first-string-match does.
Examples
(all-matches
"«[0-9]+»x«[0-9]+»|«[0-9]+»p"
"Foobar 1920x1080 17-inch display or Quux 19-inch 720p display?")
;; => (#(7 16 7 11 12 16 NIL NIL) #(49 53 NIL NIL NIL NIL 49 52))
(all-string-matches
"«[0-9]+»x«[0-9]+»|«[0-9]+»p"
"Foobar 1920x1080 17-inch display or Quux 19-inch 720p display?")
;; => (#("1920x1080" "1920" "1080" NIL) #("720p" NIL NIL "720"))
do-matches ((&rest registers) regular-expression
string &key start end)
&body bodyMacro
do-matches iterates over all matches for regular-expression across string. The registers variables are bound to the registers produced, as described for first-match.
It is possible to provide fewer variables than registers in the regular expression, but an error will be signalled if there are more variables than registers.
2.3 Compiling
The compiler may be run manually, when the regular expression is not known at compile time, and the code cache takes too long to search. (The latter can happen if many threads are accessing the code cache, and the time taken searching is sufficiently short, as lookups grab a global lock currently.)