B-13
User Guide for Cisco Security MARS Local Controller
78-17020-01
Appendix B Regular Expression Reference
Repetition
In UTF-8 mode, quantifiers apply to UTF-8 characters rather than to individual bytes. Thus, for example,
\x{100}{2} matches two UTF-8 characters, each of which is represented by a two-byte sequence.
Similarly, when Unicode property support is available, \X{3} matches three Unicode extended
sequences, each of which may be several bytes long (and they may be of different lengths).
The quantifier {0} is permitted, causing the expression to behave as if the previous item and the
quantifier were not present.
For convenience (and historical compatibility) the three most common quantifiers have single-character
abbreviations:
* is equivalent to {0,}
+ is equivalent to {1,}
? is equivalent to {0,1}
It is possible to construct infinite loops by following a subpattern that can match no characters with a
quantifier that has no upper limit, for example:
(a?)*
Earlier versions of Perl and PCRE used to give an error at compile time for such patterns. However,
because there are cases where this can be useful, such patterns are now accepted, but if any repetition of
the subpattern does in fact match no characters, the loop is forcibly broken.
By default, the quantifiers are "greedy", that is, they match as much as possible (up to the maximum
number of permitted times), without causing the rest of the pattern to fail. The classic example of where
this gives problems is in trying to match comments in C programs. These appear between /* and */ and
within the comment, individual * and / characters may appear. An attempt to match C comments by
applying the pattern
/\*.*\*/
to the string
/* first comment */ not comment /* second comment */
fails, because it matches the entire string owing to the greediness of the .* item.
However, if a quantifier is followed by a question mark, it ceases to be greedy, and instead matches the
minimum number of times possible, so the pattern
/\*.*?\*/
does the right thing with the C comments. The meaning of the various quantifiers is not otherwise
changed, just the preferred number of matches. Do not confuse this use of question mark with its use as
a quantifier in its own right. Because it has two uses, it can sometimes appear doubled, as in
\d??\d
which matches one digit by preference, but can match two if that is the only way the rest of the pattern
matches.
If the PCRE_UNGREEDY option is set (an option which is not available in Perl), the quantifiers are not
greedy by default, but individual ones can be made greedy by following them with a question mark. In
other words, it inverts the default behaviour.
Содержание CS-MARS-20-K9 - Security MARS 20
Страница 20: ...Contents xx User Guide for Cisco Security MARS Local Controller 78 17020 01 ...
Страница 356: ...17 16 User Guide for Cisco Security MARS Local Controller 78 17020 01 Chapter 17 Network Summary Summary Page ...
Страница 420: ...20 28 User Guide for Cisco Security MARS Local Controller 78 17020 01 Chapter 20 Queries and Reports Reports ...
Страница 580: ...Glossary GL 4 User Guide for Cisco Security MARS Local Controller 78 17020 01 ...