This soft copy for use by IBM employees only.
For the most part, if an internal port reports an error, the FRU is the card
that contains that port. You may discern patterns that indicate clock
problems, but you will quickly deduce that the FRU is the same because the
clock problem is a broken driver on the switch card.
2. All board patterns
There should be very few errors that occur on the whole board, because
once you get enough errors, you cannot get to the other parts of the board to
see if they are in error.
3. Half board pattern
When you see a half board with a problem, check to see if it matches the
clock tree. The majority of these patterns of error are clock-related.
4. All the nodes are reporting a problem
If all of the nodes are reporting an error, it is usually a -1, -7, -16, or -19.
These quite often point to clock problems, or power sequence problems,
which result from nodes being on the incorrect clock.
A -1 on all of the nodes on a board will occur on boards that are not on the
same board with the primary node: the primary node throws this off, because
if you have gotten as far as generating an out.top file, the primary node is
operational and reachable.
With a -1 on all the nodes, you will probably see -2, -3, or -4 on the ports that
connect to other switches. What this indicates is that the failing switch board
is not on the same clock as the primary node
′
s switch board. This can
happen because:
a. The clock tree was not set up properly by issuing an
Eclock with the
correct Eclock topology file.
b. The clock to this board is broken in this switch assembly, in the cable, or
in the switch that sources this board
′
s clock.
c. If the primary node
′
s board is the only one that is tunable, it ma y be set
to the incorrect clock.
A -16 can indicate a poor power-on sequence. For example, if the nodes in a
frame are powered-on before the switch, all the adapters will be set to their
internal clock (on card oscillator). You can discover this by looking in the
dtbx.trace file. Another example is that the switch was powered on before
the nodes, but the clock was not properly set. When the proper clock is
selected, the clock is momentarily interrupted, which causes the adapter to
have problems.
-7 and -19 both point to daemon time-out problems. The -7 occurs when the
primary node has initialized a node
′
s adapter and is waiting for that node
′
s
daemon to respond to communication that asks what node it thinks it is. The
-19 occurs after the whole switch has been tuned and the route table is being
distributed to the nodes. In this case, the daemon on the receiving node is
not responding. These -7 and -19 errors can occur more easily in large
systems than in small systems, because large networks will naturally eat
more into the time-out period during normal operation than a small network
will.
5. All of the switch-to-switch connections are reporting a problem
This is quite often a clock or power problem. It can be on the board, the
clock card, the clock source cable, the power card, or the power cable.
118
SP PD Guide
Summary of Contents for RS/6000 SP
Page 2: ......
Page 14: ...This soft copy for use by IBM employees only xii SP PD Guide...
Page 16: ...This soft copy for use by IBM employees only xiv SP PD Guide...
Page 106: ...This soft copy for use by IBM employees only 86 SP PD Guide...
Page 178: ...This soft copy for use by IBM employees only 158 SP PD Guide...
Page 214: ...This soft copy for use by IBM employees only 194 SP PD Guide...
Page 248: ...This soft copy for use by IBM employees only 228 SP PD Guide...
Page 290: ...This soft copy for use by IBM employees only 270 SP PD Guide...
Page 292: ...This soft copy for use by IBM employees only 272 SP PD Guide...
Page 300: ...This soft copy for use by IBM employees only 280 SP PD Guide...
Page 304: ...This soft copy for use by IBM employees only 284 SP PD Guide...
Page 308: ...This soft copy for use by IBM employees only 288 SP PD Guide...
Page 310: ...This soft copy for use by IBM employees only 290 SP PD Guide...
Page 316: ...IBML This soft copy for use by IBM employees only Printed in U S A SG24 4778 00...