B-20
User Guide for Cisco Security MARS Local Controller
78-17020-01
Appendix B Regular Expression Reference
Comments
The condition is a positive lookahead assertion that matches an optional sequence of non-letters followed
by a letter. In other words, it tests for the presence of at least one letter in the subject. If a letter is found,
the subject is matched against the first alternative; otherwise it is matched against the second. This
pattern matches strings in one of the two forms dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are
digits.
Comments
The sequence (?# marks the start of a comment that continues up to the next closing parenthesis. Nested
parentheses are not permitted. The characters that make up a comment play no part in the pattern
matching at all.
If the PCRE_EXTENDED option is set, an unescaped # character outside a character class introduces a
comment that continues up to the next newline character in the pattern.
Recursive Patterns
Consider the problem of matching a string in parentheses, allowing for unlimited nested parentheses.
Without the use of recursion, the best that can be done is to use a pattern that matches up to some fixed
depth of nesting. It is not possible to handle an arbitrary nesting depth. Perl provides a facility that allows
regular expressions to recurse (amongst other things). It does this by interpolating Perl code in the
expression at run time, and the code can refer to the expression itself. A Perl pattern to solve the
parentheses problem can be created like this:
$re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;
The (?p{...}) item interpolates Perl code at run time, and in this case refers recursively to the pattern in
which it appears. Obviously, PCRE cannot support the interpolation of Perl code. Instead, it supports
some special syntax for recursion of the entire pattern, and also for individual subpattern recursion.
The special item that consists of (? followed by a number greater than zero and a closing parenthesis is
a recursive call of the subpattern of the given number, provided that it occurs inside that subpattern. (If
not, it is a "subroutine" call, which is described in the next section.) The special item (?R) is a recursive
call of the entire regular expression.
For example, this PCRE pattern solves the nested parentheses problem (assume the PCRE_EXTENDED
option is set so that white space is ignored):
\( ( (?>[^()]+) | (?R) )* \)
First it matches an opening parenthesis. Then it matches any number of substrings which can either be
a sequence of non-parentheses, or a recursive match of the pattern itself (that is a correctly parenthesized
substring). Finally there is a closing parenthesis.
If this were part of a larger pattern, you would not want to recurse the entire pattern, so instead you could
use this:
( \( ( (?>[^()]+) | (?1) )* \) )