This soft copy for use by IBM employees only.
The problem that you have with global synchronous clocks is that if the
master fails, the entire system goes down. SP Switch has improved
recoverability by putting two oscillators on each board. However, the
switchover will probably still be a manual process for a while. This is
because switching to a new oscillator will bring down the system. If the
hardware and software incorrectly sense that there is a clock problem, this
could bring down the whole system for no good reason. Until we have a
better handle on how to always correctly determine that a clock failure has
occurred and we can always correctly determine how to recover, we will
leave this up to a human operator who has a better understanding of the
state of the user applications when the system is brought down.
•
They use the same topology files
The topologies in HiPS are extremely scalable, so the SP Switch uses the
same files.
Eannotator must be run over the topology files to update the new
wiring. Refer to 4.5, “Switch Commands” on page 104.
•
They have the same mechanical factor
4.1.4 Improvements of SP Switch over HiPS
•
Faults are not global
The HiPS had global faults. When a fault occurred, a fault code was
propagated throughout the entire switch network until all of the links were
brought down and had to be reinitialized. This is a remnant of the original
design point of the HiPS switch as it was designed in research. The original
machine was only supposed to be running a single application.
With SP Switch, we had time to redesign the fault scenarios and we made
the faults localized. Only the link that experienced the fault is brought down.
It can then be brought back online with minimal disturbance to the system.
You will be glad to know that taking a node offline will no longer bring down
the entire switch.
•
No more chip shadowing
The HiPS used two physical chips to perform the function and checking of
one logical chip. The second chip checked the operation of the first chip.
This is a very thorough way to detect errors. However, it is also very
expensive. You have to add some circuitry to check the other chip and you
use two chips. By adding a little more circuitry to one chip, you can do
checking that is almost as good, but a lot cheaper. The architecture of the
system (with EDC and CRC on the messages) also helps check messages for
errors. What is lost is a little fault isolation. For that we get a less
expensive design with better reliability (fewer parts to break).
•
Cable clips on; not screwed in
An annoying problem with HiPS was that each of the 32 cable/wrap plugs on
a switch had to be screwed in: 2 screws on 32 cable is 64 screws. This
slowed down installing and servicing the switch. It also caused some sore
wrists for people installing large systems. SP Switch addressed this by
putting clips on the connectors instead of screws. Now, clips are not perfect.
They are not seated as surely as screws are. You also will have a small
pitch between connectors, so getting your fingers in to work the clips is not
necessarily a simple task.
•
Estimated 12.8 times better reliability
92
SP PD Guide
Summary of Contents for RS/6000 SP
Page 2: ......
Page 14: ...This soft copy for use by IBM employees only xii SP PD Guide...
Page 16: ...This soft copy for use by IBM employees only xiv SP PD Guide...
Page 106: ...This soft copy for use by IBM employees only 86 SP PD Guide...
Page 178: ...This soft copy for use by IBM employees only 158 SP PD Guide...
Page 214: ...This soft copy for use by IBM employees only 194 SP PD Guide...
Page 248: ...This soft copy for use by IBM employees only 228 SP PD Guide...
Page 290: ...This soft copy for use by IBM employees only 270 SP PD Guide...
Page 292: ...This soft copy for use by IBM employees only 272 SP PD Guide...
Page 300: ...This soft copy for use by IBM employees only 280 SP PD Guide...
Page 304: ...This soft copy for use by IBM employees only 284 SP PD Guide...
Page 308: ...This soft copy for use by IBM employees only 288 SP PD Guide...
Page 310: ...This soft copy for use by IBM employees only 290 SP PD Guide...
Page 316: ...IBML This soft copy for use by IBM employees only Printed in U S A SG24 4778 00...