300
IBM Power Systems 775 for AIX and Linux HPC Solution
5.4.5 A+ definitions
Although product names changed from FIP to A+, the definitions and functions of the
products remain the same, as shown in Table 5-1.
Table 5-1 A+ definitions
The administrator must set up xCAT A+ node groups that must work with the A+ environment.
One xCAT node group called “Aplus_defective” must be set up for any found A+ defective
nodes or octants. A second xCAT node group “Aplus_available” must list the A+ available
nodes or octants. You use the xCAT
mkdef
command to create the node groups, and then use
the
chdef
command to associate any node (octants) to the proper node group.
The following commands are used to define an A+ group and add a failed resource:
mkdef -t group -o Aplus_defective
Creates an Aplus_defective group that must be empty.
mkdef -t group -o Aplus_available members="node1,node2,node3"
Create an Aplus_available group with node1, node2, and node3.
chdef -t group -o Aplus_defective members=[node]
Adds a failed A+ node to the Aplus_defective resources group.
5.4.6 A+ components and recovery procedures
This section describes the tasks that are performed by the administrator or cluster user to
gather problem data or recover from failures.
Definition
Description
A+ / Fail in Place Component
All A+ features including Octants and fiber optic interfaces.
A+ / Fail in Place Event
A failure event that involves an A+ component or FRU element
that is left in the failed state in the system.
A+ / FIP Refresh Threshold
The minimum number of A+ components is available and at that
point a hardware replacement is required. The threshold is
determined from a table in which the values are set according to
the contract policy, expected failure rates, and the amount or time
that is remaining on the maintenance contract.
There are individual thresholds for different failure types
A+ / FIP Reset Threshold
The minimum number of A+ components that are needed to
restore the system to following the repair of A+ components. The
amount of hardware that is replaced is determined from a table in
which the values are dependent on the component, and the
amount of time that remaining on the service contract.
There are individual thresholds for different failure types
Compute QCM/Octant/Node
A QCM/Octant/Node without I/O adapters assigned to it. It is
used solely for running application code, and often runs
degraded.
Non-compute QCM/Octant/Node
This is a QCM/Octant/Node with I/O adapters assigned to it. It is
used for disk or I/O access and often must retain full function.
Summary of Contents for Power Systems 775
Page 2: ......
Page 170: ...156 IBM Power Systems 775 for AIX and Linux HPC Solution...
Page 256: ...242 IBM Power Systems 775 for AIX and Linux HPC Solution...
Page 278: ...264 IBM Power Systems 775 for AIX and Linux HPC Solution...
Page 326: ...312 IBM Power Systems 775 for AIX and Linux HPC Solution...
Page 357: ......