9
High Availability
159
Extending redundancy
Implementing
an
HA
cluster
will
eliminate
one
of
the
points
of
failure
in
a
network.
Routers,
switches,
and
Internet
connections
can
remain
as
potential
points
of
failure,
and
redundancy
for
these
should
also
be
considered.
HA mechanisms
This
section
discusses
in
depth
the
mechanisms
the
SEG
uses
to
implement
the
HA
feature.
Basic principles
The
SEG
HA
provides
a
redundant,
state
‐
synchronized
hardware
configuration.
The
state
of
the
active
unit,
which
includes
the
flow
table
and
other
vital
information,
is
continuously
copied
to
the
inactive
unit
via
one
or
more
sync
interfaces.
When
cluster
failover
occurs,
the
inactive
unit
knows
which
connections
are
active,
and
traffic
can
continue
to
flow
after
the
failover
with
negligible
disruption.
The
inactive
system
detects
that
the
active
system
is
not
operational
when
it
no
longer
detects
sufficient
Cluster
Heartbeats
.
Heartbeats
are
sent
over
all
interfaces
marked
as
Critical
(the
interface
default)
which
always
includes
the
sync
interfaces.
Heartbeats
have
the
following
characteristics:
•
Heartbeats
are
Ethernet
frames
and
not
IP
packets.
•
Heartbeats
cannot
be
forwarded
by
a
router
since
they
do
not
contain
an
IP
header.
•
The
Ethernet
source
and
destination
address
is
based
on
the
cluster
ID
and
the
role
of
the
sending
and
receiving
unit.
•
The
Ethernet
frame
type
is
set
as
0xC14B
.
Heartbeat frequency
The
SEG
sends
10
heartbeats
per
second
(every
tenth
of
a
second)
on
each
critical
interface.
Both
peers
send
these
to
each
other
and
both
monitor
any
missed
heartbeats
in
the
following
way:
•
If
either
cluster
node
misses
2
heartbeats
(no
heartbeats
over
0.2
seconds)
on
any
critical
interface
(which
includes
sync
interfaces),
that
node
enters
a
state
known
as
Early
Interface
Failure
Detection
.
This
state
means
that
the
node
will
send
out
ARP
queries
on
the
suspect
interface
for
previously
resolved
ARP
entries
in
its
ARP
cache.
This
allows
the
node
to
determine
if
the
failure
is
a
failure
associated
with
the
local
interface
or
if
the
problem
is
that
the
peer
failed
to
send
heartbeats
from
its
interface.
•
If
no
ARP
reply
has
been
received
after
0.6
more
seconds
(a
period
of
0.8
seconds
in
total),
the
node
will
consider
its
local
interface
to
be
malfunctioning.
If
ARP
replies
are
received
but
no
new
heartbeats
are
received
after
the
time
period,
the
other
peer
is
considered
to
have
a
malfunctioning
interface.
A
failover
then
occurs
but
only
if
the
active
node
has
the
detected
malfunction.