Table 4–3 (Cont.) System Operating Modes
Mode
Definition
Duplex
The memories in both zones are identical and both CPUs are running in
lockstep. The I/O subsystems of both zones are available and in use. The
cross-link state in both zones is Duplex. The system can be booted in this
mode, or can transition to this mode as the result of the synchronization
process from either Simplex or Degraded Duplex modes.
4.2.4 Error Types
EHS recognizes 11 error types. All errors are classified as one of those described
in Table 4–4.
Table 4–4 Error Types
Error Type
Definition
CPU/MEM
Faults
All data, ECC codes, and control signals flow over the primary rail. The
mirror rail exists primarily for the purpose of performing verification
checks against the primary rail. Some checks are performed by hardware
between these two rails to detect failures within the boundaries of the
CPU module. When such a condition is detected, a CPU/MEM fault is
generated by the hardware, and results in the following set of hardware
actions:
1. A high-level system interrupt occurs to report the error, causing an
entry into the error handler. In some cases, the failure may be severe
enough to prevent instructions from executing.
2. If the operating mode at the time of the failure is Duplex, it will
be changed to Degraded Duplex mode. In this case, the other zone is
interrupted as well by a report that a CPU/MEM fault occurred in the
failing zone.
3. Approximately 145 microseconds after the interrupt, the failing CPU
module will be reset by hardware, resulting in an entry into the system
console. The purpose of this brief delay is to allow the error handler to
store the contents of the CPU, JXD, and cross-link registers in the Console
Communications Area (CCA).
In non-Duplex modes, only one CPU is in use. This failure results in the
termination of the OpenVMS operating system.
CPU/MEM faults can be caused by solid or transient errors. Since
software cannot distinguish between the two, they are all treated as
transient. The CPU module requires service only when they exceed the
operating system’s threshold, when an end action timeout occurs, or when
diagnostics fail. In all cases, the FRU identified by software is the CPU
module which experienced the failure.
(continued on next page)
Error Handling and Analysis 4–5