B-12
User Guide for Cisco Security MARS Local Controller
78-17020-01
Appendix B Regular Expression Reference
Named Subpatterns
match exactly the same set of strings. Because alternative branches are tried from left to right, and
options are not reset until the end of the subpattern is reached, an option setting in one branch does affect
subsequent branches, so the above patterns match "SUNDAY" as well as "Saturday".
Named Subpatterns
Identifying capturing parentheses by number is simple, but it can be very hard to keep track of the
numbers in complicated regular expressions. Furthermore, if an expression is modified, the numbers may
change. To help with this difficulty, PCRE supports the naming of subpatterns, something that Perl does
not provide. The Python syntax (?P<name>...) is used. Names consist of alphanumeric characters and
underscores, and must be unique within a pattern.
Named capturing parentheses are still allocated numbers as well as names. The PCRE API provides
function calls for extracting the name-to-number translation table from a compiled pattern. There is also
a convenience function for extracting a captured substring by name. For further details see the pcreapi
documentation.
Repetition
Repetition is specified by quantifiers, which can follow any of the following items:
a literal data character
the . metacharacter
the \C escape sequence
the \X escape sequence (in UTF-8 mode with Unicode properties)
an escape such as \d that matches a single character
a character class
a back reference (see next section)
a parenthesized subpattern (unless it is an assertion)
The general repetition quantifier specifies a minimum and maximum number of permitted matches, by
giving the two numbers in curly brackets (braces), separated by a comma. The numbers must be less than
65536, and the first must be less than or equal to the second. For example:
z{2,4}
matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special character. If the second number
is omitted, but the comma is present, there is no upper limit; if the second number and the comma are
both omitted, the quantifier specifies an exact number of required matches. Thus
[aeiou]{3,}
matches at least 3 successive vowels, but may match many more, while
\d{8}
matches exactly 8 digits. An opening curly bracket that appears in a position where a quantifier is not
allowed, or one that does not match the syntax of a quantifier, is taken as a literal character. For example,
{,6} is not a quantifier, but a literal string of four characters.