B-14
User Guide for Cisco Security MARS Local Controller
78-17020-01
Appendix B Regular Expression Reference
Atomic Grouping and Possessive Quantifiers
When a parenthesized subpattern is quantified with a minimum repeat count that is greater than 1 or with
a limited maximum, more memory is required for the compiled pattern, in proportion to the size of the
minimum or maximum.
If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent to Perl's /s) is set, thus
allowing the . to match newlines, the pattern is implicitly anchored, because whatever follows will be
tried against every character position in the subject string, so there is no point in retrying the overall
match at any position after the first. PCRE normally treats such a pattern as though it were preceded by
\A.
In cases where it is known that the subject string contains no newlines, it is worth setting
PCRE_DOTALL in order to obtain this optimization, or alternatively using ^ to indicate anchoring
explicitly.
However, there is one situation where the optimization cannot be used. When .* is inside capturing
parentheses that are the subject of a backreference elsewhere in the pattern, a match at the start may fail,
and a later one succeed. Consider, for example:
(.*)abc\1
If the subject is "xyz123abc123" the match point is the fourth character. For this reason, such a pattern
is not implicitly anchored.
When a capturing subpattern is repeated, the value captured is the substring that matched the final
iteration. For example, after
(tweedle[dume]{3}\s*)+
has matched "tweedledum tweedledee" the value of the captured substring is "tweedledee". However, if
there are nested capturing subpatterns, the corresponding captured values may have been set in previous
iterations. For example, after
/(a|(b))+/
matches "aba" the value of the second captured substring is "b".
Atomic Grouping and Possessive Quantifiers
With both maximizing and minimizing repetition, failure of what follows normally causes the repeated
item to be re-evaluated to see if a different number of repeats allows the rest of the pattern to match.
Sometimes it is useful to prevent this, either to change the nature of the match, or to cause it fail earlier
than it otherwise might, when the author of the pattern knows there is no point in carrying on.
Consider, for example, the pattern \d+foo when applied to the subject line
123456bar
After matching all 6 digits and then failing to match "foo", the normal action of the matcher is to try
again with only 5 digits matching the \d+ item, and then with 4, and so on, before ultimately failing.
"Atomic grouping" (a term taken from Jeffrey Friedl's book) provides the means for specifying that once
a subpattern has matched, it is not to be re-evaluated in this way.
If we use atomic grouping for the previous example, the matcher would give up immediately on failing
to match "foo" the first time. The notation is a kind of special parenthesis, starting with (?> as in this
example:
Содержание CS-MARS-20-K9 - Security MARS 20
Страница 20: ...Contents xx User Guide for Cisco Security MARS Local Controller 78 17020 01 ...
Страница 356: ...17 16 User Guide for Cisco Security MARS Local Controller 78 17020 01 Chapter 17 Network Summary Summary Page ...
Страница 420: ...20 28 User Guide for Cisco Security MARS Local Controller 78 17020 01 Chapter 20 Queries and Reports Reports ...
Страница 580: ...Glossary GL 4 User Guide for Cisco Security MARS Local Controller 78 17020 01 ...