Table 4–4 (Cont.) Error Types
Error Type
Definition
Halt errors
A halt error occurs when the system is operating in Duplex mode, the Zone
Halt Enable switch on the zone control panel is pressed, and the
Break
key
is pressed on one of the system consoles, or one zone experiences errors on
its halt lines.
The zone attached to the console terminal or with the error will be halted
and enter the system console. In the other zone, hardware generates
an interrupt to the EHS. The system operating mode will be degraded
to Simplex and the OpenVMS operating system will be continued after
deconfiguring the halted zone.
The failed zone is identified as the FRU in the error log. This error is
not subjected to thresholding. The halted zone must be resynchronized
manually to be returned to service.
Resynch
abort errors
During memory resynchronization, all memory writes are mimicked to
both zones. The data is driven from the master zone across the resynch
bus (also referred to as the cross-link cables) to the slave zone. The
incoming data on the slave side is protected by ECC. An ECC failure on
the slave side results in a CPU/MEM fault on the slave and is handled as
that type of error. The data is protected on the master side by an ECC, a
cross-rail ECC comparison and a data cross-check.
The failure of any of these checks results in hardware generating an
interrupt to the EHS reporting a resynch abort error. Resynch mode is
terminated by the hardware and system operation continues in Degraded
Duplex mode.
Since all resynch abort errors indicate failures on the master side, the
master CPU module is isolated as the FRU. This error can occur only
when the system is in Resynch mode, so removal of the CPU would result
in termination of the OpenVMS operating system. The error log message
will indicate the master CPU as the FRU.
The EHS compares the error to its error rate threshold. If the threshold is
exceeded, the EHS will disable automatic resynchronization of the remote
zone. Manual intervention will be required to repair this situation. Since
Duplex mode cannot be achieved and the master CPU is the source of this
failure, the OpenVMS operating system must be manually terminated to
repair the CPU module.
Nonexistent
I/O errors
Nonexistent I/O (NXIO) errors occur when a reference to an I/O module
times out. Such a timeout can occur during a DMA or CPU cycle. In
a CPU cycle, an automatic operation retry is attempted. If the retry
succeeds, hardware reports the failure as transient. Otherwise, it is
reported as a solid failure.
All timeouts during DMA cycles are transient errors. The error log
indicates if the error was solid or transient, and if it occurred on a DMA or
CPU cycle.
In all NXIO error cases, either an I/O or interface module will be identified
as the FRU. If the error is solid, the I/O or interface module will be
removed from system service by the EHS.
If the error is transient, it will be compared to its error rate threshold
by the EHS. If the threshold is exceeded and the system operating mode
is not Simplex, the I/O or interface module will be removed from system
service.
No I/O module will be removed due to transient errors from a Simplex
system (where alternate I/O paths are not normally available). Additional
transient errors on the I/O module will generate further error logs.
(continued on next page)
Error Handling and Analysis 4–9