Chapter 6
Diagnostics
81
If the kernel hangs and the watchdog times out, ALOM reports and logs the event
and performs one of three user configurable actions.
■
xir: this is the default action and will cause the server to sync the filesystems and
restart. In the event of the sync hanging, ALOM will fallback to a hard reset after
15 minutes.
■
Reset: this is a hard reset and results in a rapid system recovery but diagnostic
data regarding the hang is not stored, and filesystem damage may result.
■
None - this will result in the system being left in the hung state indefinitely after
the watchdog timeout has been reported.
For more information, see the
sys_autorestart
section of the ALOM Online Help
that is contained on the Sun Fire V210 and V240 Server Documentation CD.
Automatic System Recovery (ASR)
Note –
Automatic System Recovery (ASR) is not the same as Automatic Server
Restart, which the Sun Fire V210 and V240 servers also support.
Automatic System Recovery (ASR) consists of self-test features and an auto-
configuring capability to detect failed hardware components and unconfigure them.
By doing this, the server is able to resume operating after certain non-fatal hardware
faults or failures have occured.
If a component is one that is monitored by ASR, and the server is capable of
operating without it, the server will automatically reboot if that component should
develop a fault or fail.
ASR monitors the following components:
■
Memory modules
If a fault is detected during the power-on sequence, the faulty component is
disabled. If the system remains capable of functioning, the boot sequence continues.
If a fault occurs on a running server, and it is possible for the server to run without
the failed component, the server automatically reboots. This prevents a faulty
hardware component from keeping the entire system down or causing the system to
crash repeatedly.
To support such a degraded boot capability, the OpenBoot firmware uses the 1275
Client Interface (via the device tree) to mark a device as either failed or disabled, by
creating an appropriate status property in the device tree node. The Solaris
operating environment will not activate a driver for any subsystem so marked.
Содержание Fire V240
Страница 8: ...Figures viii FIGURE 6 5 watch net all Diagnostic Output Message 80 80...
Страница 34: ...20 Sun Fire V210 and V240 Servers Administration Guide April 2003...
Страница 50: ...36 Sun Fire V210 and V240 Servers Administration Guide April 2003...
Страница 60: ...46 Sun Fire V210 and V240 Servers Administration Guide April 2003...
Страница 66: ...52 Sun Fire V210 and V240 Servers Administration Guide April 2003...
Страница 102: ...88 Sun Fire V210 and V240 Servers Administration Guide April 2003...