102
SPARC Enterprise T5120 and T5220 Servers Product Notes • April 2010
Note –
In an LDoms environment, unrecoverable errors in a non-control LDoms
guest domain are
not
subject to this CR.
For example, an unrecoverable error in the control domain causes Solaris to panic.
Messages similar to the following are reported to the control domain console:
Or, an unrecoverable error causes the Hypervisor to abort and messages similar to
the following are reported to the SP console when logged into the ALOM CMT
compatibility CLI console:
After the control domain recovers, there is a diagnosis performed. Messages to the
console indicate the cause of the unrecoverable error. For example:
At this point, CR 6594506 might have been encountered. This will prevent future
PSH events (for example, new HW errors, correctable or uncorrectable) from being
transported into the domain and properly diagnosed.
SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
EVENT-TIME: 0x46c61864.0x318184c6 (0x1dfeda2137e)
PLATFORM: SUNW,SPARC-Enterprise-T5220, CSN: -, HOSTNAME: wgs48-100
SOURCE: SunOS, REV: 5.10 Generic_Patch
DESC: Errors have been detected that require a reboot to ensure system
integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more information.
AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
IMPACT: The system will sync files, save a crash dump if needed, and reboot
REC-ACTION: Save the error summary below in case telemetry cannot be saved
Aug 17 22:09:09 ERROR: HV Abort: <Unknown?> (228d74) - PowerDown
SUNW-MSG-ID: SUN4V-8000-UQ, TYPE: Fault, VER: 1, SEVERITY: Critical
EVENT-TIME: Fri Aug 17 18:00:57 EDT 2007
PLATFORM: SUNW,SPARC-Enterprise-T5220, CSN: -, HOSTNAME: wgs48-100
SOURCE: cpumem-diagnosis, REV: 1.6
EVENT-ID: a8b0eb18-6449-c0a7-cc0f-e230a1d27243
DESC: The number of level 2 cache uncorrectable data errors has exceeded
acceptable levels. Refer to http://sun.com/msg/SUN4V-8000-UQ for more
information.
AUTO-RESPONSE: No automated response.
IMPACT: System performance is likely to be affected.
REC-ACTION: Schedule a repair procedure to replace the affected resource,
the identity of which can be determined using fmdump -v -u <EVENT_ID>.