3 Linting and warnings
one-more-re-nightmare can produce warnings at compile-time for various mistakes when writing regular expressions. The compiler produces a finite state machine, which involves traversing every execution path of the machine, so it can perform analysis with no false positives or negatives.
Linting occurs when regular expressions are provided as literal strings in the source code. All style warnings generated are of the type lint-style-warning.
3.1 Unreachability
The following issues generate style warnings at compile-time, of type not-matchable-style-warning. They do not indicate that something will go wrong at run-time, but their behaviour is rarely desirable.
3.1.1 "This expression is impossible to match."
Explanation
The expression will never match any expressions; it is equivalent to the empty set.
Examples
a&b: There are no characters that are simultaneously a and b.
¬(a$|$b)&¬(¬(a$)&¬($b)): There are no strings that match ¬(a$|$b) but not ¬(a$)&¬($b). In other words, the linter is used to prove \overline{a \lor b} \Rightarrow \overline a \land \overline b. While it is a fun idea, we don’t recommend using the linter to check equivalence of Boolean expressions.
3.1.2 "The <n-th> group in this expression is impossible to match."
Explanation
A submatch in the expression will never match any expressions; it may either correspond to the empty set, or is "shadowed" by an alternate expression.
Examples
a|«a» generates the warning "The first group in this expression is impossible to match.". The only string that the expression can match is a, and the left-hand side of the | operator takes precedence with POSIX semantics, so the right hand side can never match.
a|«b&c» generates the same warning. There are no characters that are simultaneously b and c.
3.2 Matching too much
Some regular expressions may match at every position, which is usually a sign of a mistake, as one usually wants to extract something from a string, and not everything. The following issues generate style warnings at compile-time, of type matching-too-much-style-warning.
3.2.1 "This expression matches the empty string at every position."
Explanation
The expression will match at every position, and most matches will have zero length. Often some * repetition needs to be replaced with some + repetition, to ensure matches contain at least one character.
Examples
The following code will produce too many matches:
(defun numbers (string)
(one-more-re-nightmare:all-string-matches "[0-9]*" string))
(numbers "Phone: 6323003")
;; => (#("") #("") #("") #("") #("") #("") #("") #("6323003") #(""))
one-more-re-nightmare generates this warning when the numbers function is submitted. One solution is to replace the * repetition with + repetition.
(defun numbers (string)
(one-more-re-nightmare:all-string-matches "[0-9]+" string))
(numbers "Phone: 6323003")
;; => (#("6323003"))
3.2.2 "This expression will only ever match the empty string at every position."
Explanation
The expression will only match at every position, and all matches will have zero length.
Examples
Using the empty string as a regular expression generates this warning. Other regular expressions which are not just the empty string can still generate this warning; |b&c will generate this warning, as the regular expression still can only match the empty string.
3.3 Syntax errors
Syntax errors can also be caught at compile-time, signalling full warnings, as function with invalid syntax will always fail at run-time.
Examples
( generates a parsing error. The open-parenthesis should be matched with a closing ).
3.4 Type errors
Type errors can be caught at compile-time, signalling full warnings, as functions with type errors will always fail at run-time.
3.4.1 "This regular expression only produces <m> registers, but <n> variables were provided."
Explanation
Too many register variables were provided for the regular expression provided to do-matches.
Examples
(one-more-re-nightmare:do-matches ((start end s1 e1) "abcde" x) (print (list s1 e1))) generates the warning "This regular expression only produces two registers, but four variables were provided." There are no submatches in abcde, but the do-matches form was provided the variable names s1 and e1 for a submatch.
3.4.2 SBCL reports a type conflict
Explanation
one-more-re-nightmare provides specific types to SBCL for regular expressions provided as string literals. The SBCL compiler can use these types to detect errors in code that uses the results produced by one-more-re-nightmare.
Specifically, one-more-re-nightmare provides the return type (or null (simple-vector 2(n+1))) for a call to first-match with a regular expression with n submatches. one-more-re-nightmare provides the type alexandria:array-index for the first two register variables, and the type (or null alexandria:array-index) for the remaining variables for do-matches.
Examples
(svref (first-match "abc" "abc") 2) generates the warning "Derived type (INTEGER 2 2) is not a suitable index for (SIMPLE-VECTOR 2)."
(do-matches ((s) "ab|ac" "ab") (print (symbol-name s))) generates the warning "Derived type of ... is (VALUES (MOD ...) &OPTIONAL) conflicting with its asserted type SYMBOL." The variable s will always be bound to an index, and never nil, because the first two registers designate the bounds of the entire match.