1
1. Introduction
1.1
Overview
Machine Check Monitoring Service provides a service to identify fault component of hardware by
sending logs of correctable error occurred on CPU and memory of Linux server to the firmware in the
server.
If the number of times correctable error occurrence exceeds threshold value, Machine Check
Monitoring Service performs Core Offline (offlining of CPU) or Page Offline (offlining memory page) to
prevent system down due to uncorrectable error. If the OS supports Core Online feature and the
system has spare CPU, Machine Check Monitoring Service adds spare CPU automatically (Core
Online) after Core Offline completes. The Offline and Online operations are performed in cooperation
with kernel on Linux server.
Machine Check Monitoring Service is composed of firmware and software on Linux server. Software
includes mcemonitor (Machine Check Monitoring Service) and capmonitor (Capacity Monitoring
Service).
Note
Refer to "Capacity Optimization (COPT) User's Guide" for details of Core
Online feature.
Core Offline, Core Online, and Page Offline are not supported on
Express5800/A1040b.
1.2
Operating Environment
Machine Check Monitoring Service requires operating environment as shown below:
Table 1-1 Operating Environment
Hardware
Express5800/A1040b
Express5800/A2010b
Express5800/A2020b
Express5800/A2040b
OS
Red Hat Enterprise Linux 6.6