B-16
User Guide for Cisco Security MARS Local Controller
78-17020-01
Appendix B Regular Expression Reference
Assertions
However, if the decimal number following the backslash is less than 10, it is always taken as a back
reference, and causes an error only if there are not that many capturing left parentheses in the entire
pattern. In other words, the parentheses that are referenced need not be to the left of the reference for
numbers less than 10. See
Non-printing Characters, page B-3
for further details of the handling of digits
following a backslash.
A back reference matches whatever actually matched the capturing subpattern in the current subject
string, rather than anything matching the subpattern itself (see
Subpatterns as Subroutines, page B-21
for a way of doing that). So the pattern
(sens|respons)e and \1ibility
matches "sense and sensibility" and "response and responsibility", but not "sense and responsibility". If
caseful matching is in force at the time of the back reference, the case of letters is relevant. For example,
((?i)rah)\s+\1
matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original capturing subpattern
is matched caselessly.
Back references to named subpatterns use the Python syntax (?P=name). We could rewrite the above
example as follows:
(?<p1>(?i)rah)\s+(?P=p1)
There may be more than one back reference to the same subpattern. If a subpattern has not actually been
used in a particular match, any back references to it always fail. For example, the pattern
(a|(bc))\2
always fails if it starts to match "a" rather than "bc". Because there may be many capturing parentheses
in a pattern, all digits following the backslash are taken as part of a potential back reference number. If
the pattern continues with a digit character, some delimiter must be used to terminate the back reference.
If the PCRE_EXTENDED option is set, this can be whitespace. Otherwise an empty comment (see
Comments, page B-20
) can be used.
A back reference that occurs inside the parentheses to which it refers fails when the subpattern is first
used, so, for example, (a\1) never matches. However, such references can be useful inside repeated
subpatterns. For example, the pattern
(a|b\1)+
matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of the subpattern, the back
reference matches the character string corresponding to the previous iteration. In order for this to work,
the pattern must be such that the first iteration does not need to match the back reference. This can be
done using alternation, as in the example above, or by a quantifier with a minimum of zero.
Assertions
An assertion is a test on the characters following or preceding the current matching point that does not
actually consume any characters. The simple assertions coded as \b, \B, \A, \G, \Z, \z, ^ and $ are
described
above
.