2 Interface
2.1 Syntax
We currently use our own syntax for regular expressions, since the POSIX syntax does not allow for expressing complements or intersections of regular expressions.
| E | 
 | ::= | 
 | E | E | 
 | union | 
| 
 | | | 
 | E & E | 
 | intersection | |
| 
 | | | 
 | E E | 
 | concatenate | |
| 
 | | | 
 | ¬E | 
 | complement | |
| 
 | | | 
 | ~E | 
 | complement | |
| 
 | | | 
 | E* | 
 | zero or more repeats | |
| 
 | | | 
 | E+ | 
 | one or more repeats | |
| 
 | | | 
 | E? | 
 | zero or one repeats | |
| 
 | | | 
 | E{j,j} | 
 | repeat | |
| 
 | | | 
 | «E» | 
 | submatch | |
| 
 | | | 
 | (E) | 
 | change precedence | |
| 
 | | | 
 | [r] | 
 | character range | |
| 
 | | | 
 | [¬r] | 
 | complement ranges | |
| 
 | | | 
 | [^r] | 
 | complement ranges | |
| 
 | | | 
 | $ | 
 | every character | |
| 
 | | | 
 | c | 
 | literal character | |
| 
 | | | 
 | <empty string> | 
 | ||
| r | 
 | ::= | 
 | <empty range> | 
 | empty range | 
| 
 | | | 
 | cr | 
 | single character | |
| 
 | | | 
 | c-cr | 
 | character range | |
| 
 | | | 
 | [:<name>:]r | 
 | character class | |
| j | 
 | ::= | 
 | <integer> | 
 | bound | 
| 
 | | | 
 | <empty string> | 
 | no bound | |
| c | 
 | ::= | 
 | <single character> | 
 | 
Rules higher on this list are "looser" than rules lower on the list. For example, the expression ab&cd|ef is equivalent to ((ab)&(cd))|(ef).
2.2 Semantics
Matches and submatches are performed using the semantics for POSIX regular expressions (IEEE 2018), with additional rules pertaining to intersection and complements.
Submatches on both sides of an intersection operator are matched. For example,
(first-string-match "«a»b&a«b»" "ab") ; => #("ab" "a" "b")
No submatches in a complement can ever be matched, including in the complement of a complement. Thus ¬¬A is not the same as A:
(first-match "¬¬«a»" "a") ; => #(0 1 NIL NIL)
(first-match "«a»" "a") ; => #(0 1 0 1)
2.3 Matching
Note that one-more-re-nightmare can avoid a cache lookup (involving acquiring a lock and hash table searching) if the regular expression is a literal string, or a constant variable bound to a string.
first-match regular-expression string &key start endFunction
first-string-match regular-expression string &key start endFunction
Find the first match for regular-expression in string between start and end.
first-match returns a simple vector, where each element is a register, or nil when there is no match. The first two registers are always the start and end of the match, and then subsequent registers are the start and end of each submatch. A register is either a bounding index of string or nil when there is no submatch.
first-string-match either returns a simple vector, every element of which is a fresh string or nil (when there is no submatch), or nil if there is no match.
Examples
(first-match "[0-9]([0-9]| )+" "Phone: 632 3003")
;; => #(6 15)
(first-string-match "[0-9]([0-9]| )+" "Phone: 632 3003")
;; => "632 3003"
(first-match
"«[0-9]+»x«[0-9]+»|«[0-9]+»p"
"Foobar 1920x1080 17-inch display")
;; => #(7 16 7 11 12 16 NIL NIL)
(first-string-match
"«[0-9]+»x«[0-9]+»|«[0-9]+»p"
"Foobar 1920x1080 17-inch display")
;; => #("1920x1080" "1920" "1080" NIL)
all-matches regular-expression string &key start endFunction
all-string-matches regular-expression string &key start endFunction
Find all matches for regular-expression in string between start and end.
Both functions return a list of matches; all-matches represents matches as first-match does, and all-string-matches represents matches as first-string-match does.
Examples
(all-matches
"«[0-9]+»x«[0-9]+»|«[0-9]+»p"
"Foobar 1920x1080 17-inch display or Quux 19-inch 720p display?")
;; => (#(7 16 7 11 12 16 NIL NIL) #(49 53 NIL NIL NIL NIL 49 52))
(all-string-matches
"«[0-9]+»x«[0-9]+»|«[0-9]+»p"
"Foobar 1920x1080 17-inch display or Quux 19-inch 720p display?")
;; => (#("1920x1080" "1920" "1080" NIL) #("720p" NIL NIL "720"))
do-matches ((&rest registers) regular-expression
string &key start end)
&body bodyMacro
do-matches iterates over all matches for regular-expression across string. The registers variables are bound to the registers produced, as described for first-match.
It is possible to provide fewer variables than registers in the regular expression, but an error will be signalled if there are more variables than registers.
2.4 Compiling
The compiler may be run manually, when the regular expression is not known at compile time, and the code cache takes too long to search. (The latter can happen if many threads are accessing the code cache, and the time taken searching is sufficiently short, as lookups grab a global lock currently.)
compiled-regular-expression Class
An object representing a compiled regular expression. An instance of this class can be provided as a regular expression to all the searching functions, instead of a string.