B-7
User Guide for Cisco Security MARS Local Controller
78-17020-01
Appendix B Regular Expression Reference
Circumflex and Dollar
PCRE_NOTBOL or PCRE_NOTEOL options, which affect only the behaviour of the circumflex and
dollar metacharacters. However, if the
startoffset
argument of
pcre_exec()
is non-zero, indicating that
matching is to start at a point other than the beginning of the subject, \A can never match. The difference
between \Z and \z is that \Z matches before a newline that is the last character of the string as well as at
the end of the string, whereas \z matches only at the end.
The \G assertion is true only when the current matching position is at the start point of the match, as
specified by the
startoffset
argument of
pcre_exec()
. It differs from \A when the value of
startoffset
is
non-zero. By calling
pcre_exec()
multiple times with appropriate arguments, you can mimic Perl's /g
option, and it is in this kind of implementation where \G can be useful.
Note, however, that PCRE's interpretation of \G, as the start of the current match, is subtly different from
Perl's, which defines it as the end of the previous match. In Perl, these can be different when the
previously matched string was empty. Because PCRE does just one match at a time, it cannot reproduce
this behaviour.
If all the alternatives of a pattern begin with \G, the expression is anchored to the starting match position,
and the "anchored" flag is set in the compiled regular expression.
Circumflex and Dollar
Outside a character class, in the default matching mode, the circumflex character is an assertion that is
true only if the current matching point is at the start of the subject string. If the startoffset argument of
pcre_exec() is non-zero, circumflex can never match if the PCRE_MULTILINE option is unset. Inside
a character class, circumflex has an entirely different meaning (see
Square Brackets and Character
Classes, page B-8
and
Posix Character Classes, page B-9
).
Circumflex need not be the first character of the pattern if a number of alternatives are involved, but it
should be the first thing in each alternative in which it appears if the pattern is ever to match that branch.
If all possible alternatives start with a circumflex, that is, if the pattern is constrained to match only at
the start of the subject, it is said to be an "anchored" pattern. (There are also other constructs that can
cause a pattern to be anchored.)
A dollar character is an assertion that is true only if the current matching point is at the end of the subject
string, or immediately before a newline character that is the last character in the string (by default).
Dollar need not be the last character of the pattern if a number of alternatives are involved, but it should
be the last item in any branch in which it appears. Dollar has no special meaning in a character class.
The meaning of dollar can be changed so that it matches only at the very end of the string, by setting the
PCRE_DOLLAR_ENDONLY option at compile time. This does not affect the \Z assertion.
The meanings of the circumflex and dollar characters are changed if the PCRE_MULTILINE option is
set. When this is the case, they match immediately after and immediately before an internal newline
character, respectively, in addition to matching at the start and end of the subject string. For example,
the pattern /^abc$/ matches the subject string "def\nabc" (where \n represents a newline character) in
multiline mode, but not otherwise. Consequently, patterns that are anchored in single line mode because
all branches start with ^ are not anchored in multiline mode, and a match for circumflex is possible when
the
startoffset
argument of
pcre_exec()
is non-zero. The PCRE_DOLLAR_ENDONLY option is ignored
if PCRE_MULTILINE is set.
Note that the sequences \A, \Z, and \z can be used to match the start and end of the subject in both modes,
and if all branches of a pattern start with \A it is always anchored, whether PCRE_MULTILINE is set or
not.