4
VMware
white paper
Figure 2. High Level Architecture of VMware Fault Tolerance
VMw
are
APP
OS
VMw
are
APP
OS
Primary
Record
Client
Shared Storage
FT Logging Traffic
ACKs
Secondary
Replay
The communication channel between the primary and the secondary host is established by the hypervisor using a standard
TCP/IP socket connection and the traffic flowing between them is called FT logging traffic. By default, incoming network traffic and
disk reads at the primary virtual machine are captured and sent to the secondary, but it is also possible to make the secondary virtual
machine read disk I/O directly from the disk. See
KB article 1011965
for more information about this alternative mode.
1.3. VMware vLockstep Interval
The primary virtual machine’s execution is always ahead of the secondary with respect to physical time. However, with respect to
virtual time, both the primary and secondary progress in sync with identical execution state. While the secondary’s execution lags
behind the primary, the vLockstep mechanism ensures that the secondary always has all the information in the log to reach the same
execution point as the primary. The physical time lag between the primary and secondary virtual machine execution is denoted as
the vLockstep interval in the FT summary status page.
Figure 3: vLockstep Interval in the FT Summary Status Page
The vLockstep interval is calculated as a moving average and it assumes that the round-trip network latency between the primary
and secondary hosts is constant. The vLockstep interval will increase if the secondary virtual machine lacks sufficient CPU cycles to
keep up with the primary. Under this circumstance, whenever the primary virtual machine becomes idle (for example while waiting
for an I/O completion) the secondary will catch up and the vLockstep interval will reduce. If the vLockstep interval is consistently
high, the hypervisor may slow the primary virtual machine to let the secondary catch up.
1.4. Transparent Failover
FT ensures that there is no or data or state loss in the virtual machine when the failover happens. Also, after a failover, the new
primary will perform no I/O that is inconsistent with anything previously issued by the old primary. This is achieved by ensuring that
the hypervisor at the primary commits to any externally visible action, such as network transmits or disk writes, only after receiving an
acknowledgement from the secondary that it has received all the log events preceding that event.