B-5
User Guide for Cisco Security MARS Local Controller
78-17020-01
Appendix B Regular Expression Reference
Backslash
A "word" character is an underscore or any character less than 256 that is a letter or digit. The definition
of letters and digits is controlled by PCRE's low-valued character tables, and may vary if locale-specific
matching is taking place (see "Locale support" in the
pcreapi
page). For example, in the "fr_FR"
(French) locale, some character codes greater than 128 are used for accented letters, and these are
matched by \w.
In UTF-8 mode, characters with values greater than 128 never match \d, \s, or \w, and always match \D,
\S, and \W. This is true even when Unicode character property support is available.
Unicode Character Properties
When PCRE is built with Unicode character property support, three additional escape sequences to
match generic character types are available when UTF-8 mode is selected. They are:
\p{
xx
} a character with the
xx
property
\P{
xx
} a character without the
xx
property
\X an extended Unicode sequence
The property names represented by
xx
above are limited to the Unicode general category properties. Each
character has exactly one such property, specified by a two-letter abbreviation. For compatibility with
Perl, negation can be specified by including a circumflex between the opening brace and the property
name. For example, \p{^Lu} is the same as \P{Lu}.
If only one letter is specified with \p or \P, it includes all the properties that start with that letter. In this
case, in the absence of negation, the curly brackets in the escape sequence are optional; these two
examples have the same effect:
\p{L}
\pL
The following property codes are supported:
C Other
Cc Control
Cf Format
Cn Unassigned
Co Private use
Cs Surrogate
L Letter
Ll Lower case letter
Lm Modifier letter
Lo Other letter
Lt Title case letter
Lu Upper case letter
M Mark
Mc Spacing mark
Me Enclosing mark
Mn Non-spacing mark
N Number
Nd Decimal number
Nl Letter number
No Other number
P Punctuation
Pc Connector punctuation