Fault Management Overview
The goal of fault management and monitoring is to increase system availability, by moving from
a reactive fault detection, diagnosis, and repair strategy to a proactive fault detection, diagnosis,
and repair strategy. The objectives are:
•
To detect problems automatically, as nearly as possible to when they actually occur.
•
To diagnose problems automatically, at the time of detection.
•
To automatically report in understandable text a description of the problem, the likely cause(s)
of the problem, the recommended action(s) to resolve the problem, and detailed information
about the problem.
•
To ensure that tools are available to repair or recover from the fault.
HP-UX Fault Management
Proactive fault prediction and notification is provided on HP-UX by SysFaultMgmt WBEM indication
providers, as well as by the Event Management System (EMS). The Event Management Service
and WBEM provide frameworks for monitoring and reporting events.
SysFaultMgmt WBEM indication providers and the EMS Hardware Monitors allow users to monitor
the operation of a wide variety of hardware products, and alert them immediately if any failure
or other unusual event occurs. By using hardware event monitoring, users can virtually eliminate
undetected hardware failures that could interrupt system operation or cause data loss.
Complete information on installing and using EMS hardware event monitors, as well as a list of
supported hardware, can be found in the EMS Hardware Monitors Users Guide. An electronic
copy of this book is provided on the HP website at:
http://www.hp.com/go/hpux-diagnostics-docs
.
WBEM indication providers and EMS Hardware Monitors
Hardware monitors are available to monitor the following components (These monitors are distributed
free on the OE media.):
•
Chassis/Fans/Environment
•
CPU monitor
•
UPS monitor*
•
FC Hub monitor*
•
FC Switch monitor*
•
Memory monitor
•
Core Electronics Components
•
Disk drives
•
Ha_disk_array
NOTE:
No SysFaultMgmt WBEM indication provider is currently available for components
followed by an asterisk.
EMS HA Monitors
High Availability monitors are also available through EMS to monitor disk, cluster, network, and
system resources. These tools are available from HP at an additional cost.
148
Troubleshooting