B-6
User Guide for Cisco Security MARS Local Controller
78-17020-01
Appendix B Regular Expression Reference
Backslash
Pd Dash punctuation
Pe Close punctuation
Pf Final punctuation
Pi Initial punctuation
Po Other punctuation
Ps Open punctuation
S Symbol
Sc Currency symbol
Sk Modifier symbol
Sm Mathematical symbol
So Other symbol
Z Separator
Zl Line separator
Zp Paragraph separator
Zs Space separator
Extended properties such as "Greek" or "InMusicalSymbols" are not supported by PCRE.
Specifying caseless matching does not affect these escape sequences. For example, \p{Lu} always
matches only upper case letters.
The \X escape matches any number of Unicode characters that form an extended Unicode sequence. \X
is equivalent to
(?>\PM\pM*)
That is, it matches a character without the "mark" property, followed by zero or more characters with the
"mark" property, and treats the sequence as an atomic group (see below). Characters with the "mark"
property are typically accents that affect the preceding character.
Matching characters by Unicode property is not fast, because PCRE has to search a structure that
contains data for over fifteen thousand characters. That is why the traditional escape sequences such as
\d and \w do not use Unicode properties in PCRE.
Simple Assertions
The fourth use of backslash is for certain simple assertions. An assertion specifies a condition that has
to be met at a particular point in a match, without consuming any characters from the subject string. The
use of subpatterns for more complicated assertions is described below. The backslashed assertions are:
\b matches at a word boundary
\B matches when not at a word boundary
\A matches at start of subject
\Z matches at end of subject or before newline at end
\z matches at end of subject
\G matches at first matching position in subject
These assertions may not appear in character classes (but note that \b has a different meaning, namely
the backspace character, inside a character class).
A word boundary is a position in the subject string where the current character and the previous character
do not both match \w or \W (i.e. one matches \w and the other matches \W), or the start or end of the
string if the first or last character matches \w, respectively.
The \A, \Z, and \z assertions differ from the traditional circumflex and dollar (described in the next
section) in that they only ever match at the very start and end of the subject string, whatever options are
set. Thus, they are independent of multiline mode. These three assertions are not affected by the