B-4
User Guide for Cisco Security MARS Local Controller
78-17020-01
Appendix B Regular Expression Reference
Backslash
The handling of a backslash followed by a digit other than 0 is complicated. Outside a character class,
PCRE reads it and any following digits as a decimal number. If the number is less than 10, or if there
have been at least that many previous capturing left parentheses in the expression, the entire sequence is
taken as a back reference. A description of how this works is given later, following the discussion of
parenthesized subpatterns.
Inside a character class, or if the decimal number is greater than 9 and there have not been that many
capturing subpatterns, PCRE re-reads up to three octal digits following the backslash, and generates a
single byte from the least significant 8 bits of the value. Any subsequent digits stand for themselves. For
example:
\040 is another way of writing a space
\40 is the same, provided there are fewer than 40 previous capturing subpatterns
\7 is always a back reference
\11 might be a back reference, or another way of writing a tab
\011 is always a tab
\0113 is a tab followed by the character "3"
\113 might be a back reference, otherwise the character with octal code 113
\377 might be a back reference, otherwise the byte consisting entirely of 1 bits
\81 is either a back reference, or a binary zero followed by the two characters
"8" and "1"
Note that octal values of 100 or greater must not be introduced by a leading zero, because no more than
three octal digits are ever read.
All the sequences that define a single byte value or a single UTF-8 character (in UTF-8 mode) can be
used both inside and outside character classes. In addition, inside a character class, the sequence \b is
interpreted as the backspace character (hex 08), and the sequence \X is interpreted as the character "X".
Outside a character class, these sequences have different meanings (see
Unicode Character Properties,
page B-5
).
Generic Character Types
The third use of backslash is for specifying generic character types. The following are always
recognized:
\d any decimal digit
\D any character that is not a decimal digit
\s any whitespace character
\S any character that is not a whitespace character
\w any "word" character
\W any "non-word" character
Each pair of escape sequences partitions the complete set of characters into two disjoint sets. Any given
character matches one, and only one, of each pair.
These character type sequences can appear both inside and outside character classes. They each match
one character of the appropriate type. If the current matching point is at the end of the subject string, all
of them fail, since there is no character to match.
For compatibility with Perl, \s does not match the VT character (code 11). This makes it different from
the the POSIX "space" class. The \s characters are HT (9), LF (10), FF (12), CR (13), and space (32).
Содержание CS-MARS-20-K9 - Security MARS 20
Страница 20: ...Contents xx User Guide for Cisco Security MARS Local Controller 78 17020 01 ...
Страница 356: ...17 16 User Guide for Cisco Security MARS Local Controller 78 17020 01 Chapter 17 Network Summary Summary Page ...
Страница 420: ...20 28 User Guide for Cisco Security MARS Local Controller 78 17020 01 Chapter 20 Queries and Reports Reports ...
Страница 580: ...Glossary GL 4 User Guide for Cisco Security MARS Local Controller 78 17020 01 ...