background image

PRIMEPOWER 650 and 850 Technical White Paper

01/12/31

15

High Availability Functions

RAS/HA Concept

For PRIMEPOWER 650 and 850 models, HA (High Availability) functions,  based on mainframe technology , have

been employed. To  implement HA functions,  the  RAS (Reliability, Availability, Serviceability)  concept  has to be
guaranteed for each function.

High Availability, or the elimination of job stoppage time, cannot be achieved simply by increasing the reliability of the

hardware components.   High Availability must also be achieved for the software, applications, and support services.  It is

therefore essential to provide “system” RAS functions.

Of course to achieve high reliability, the quality of parts must be increased to the maximum.  In addition, appropriate

parts must be selected taking the product lifespan into consideration.  However, there are no parts that can be guaranteed

to never break, and it is therefore always necessary to consider the possibility of a failure.  This applies to software as well

as to hardware.  Naturally, it is highly desirable to have  software that is free of bugs.  However, since there are software
bugs that are triggered by hardware failures, it is extremely difficult to completely eliminate all bugs.  Still, it goes without

saying that  all efforts must be made to improve the reliability of the hardware and software.

Fujitsu controls and guarantees the reliability of the parts used.  When new parts are used, Fujitsu evaluates them by

checking the lifespan using stress tests such as burn-in tests to determine whether the parts provide the reliability that the

product aims for.

Availability can be expressed as an index indicating the time when the system is available for job operation.  Because the

number of errors cannot be kept to zero, mechanisms that ensure high availability must be installed to enable system
operation to continue when a hardware failure occurs in a part or unit, an error occurs in the basic software such as the OS,

or an error or failure occurs in an application process.

PRIMEPOWER 650 and 850 models incorporate the following basic mechanisms to provide for high availability:

- An expanded automatic error checking and correction range

- Improved retry functions when an error is detected and the provision of  a degradation function which isolates failed

components and allows a restart using a valid, if reduced, configuration.

- An automatic system restart to reduce down time

- A panel display function for error fault location at system startup
- Reduced system start time

- Redundant configurations for power supplies and fans and the provision of hot swappable components.

Serviceability refers to the functions that are used to quickly and easily recover the system from any problems that may

occur during system operation.  To achieve this, the cause of any occurring error must be identified, and the component

or components that caused the error isolated and replaced.  In addition, the event and conditions must be reported to the

system administrator and maintenance personnel in a format that is easy to understand.
Machine management software is provided with PRIMEPOWER 650 and 850 to support the isolating of fault locations

and the replacement of  components without having to stop the system.  This software  also  enables the system

administrator and maintenance personnel to clearly identify the operating status of all units and to enable the maintenance

personnel to perform the appropriate maintenance work.

Redundant Configuration and Hot Swapping

The power supply and fan units of these modules have a redundant configuration.  Storage can also be  installed in

redundant configuration by using mechanisms such as dual RAID controllers and disk mirroring.  This can be achieved
for these models by combining Fujitsu’s SynfinityDisk and a multipath disk control package.  Moreover, SynfinityDisk

can be used to mirror the system volumes themselves.  Even if a disk error occurs at booting, the boot disk is switched

automatically and the OS is restarted without the system process stopping.

Summary of Contents for PRIMEPOWER 650

Page 1: ...PRIMEPOWER 650 and 850 Technical White Paper 01 12 31 1 PRIMEPOWER 650 and 850 Technical White Paper December 2001 PRIMEPOWER650 PRIMEPOWER850 ...

Page 2: ...ibility System Architecture Future ConfigurationBasics Systemboard SPARC64 GPProcessors Memory Subsystem I OSubsystemandPCIBOX Power Supply Subsystem Coolingsystem High Availability Function RAS HA Concept Redundant Configuration and Hot Swapping DegradationFunctionUsingInitialDiagnosis Disk Subsystems Network Subsystem Cluster System BasicSoftware Software Configuration Browser Based Software Web...

Page 3: ...ness For a middle range server to meet such needs the following conditions have become more andmore important 1 Performance Processing power corresponding to broadband requirements 2 Reliability and Availability High reliability and availability inpublictrunksystems 3 Future expandability Investmentprotectionacross newprocessors and technology 4 Space saving Ability to set up many devices in limit...

Page 4: ...d in the cabinet Its compactness makes itpossible to mount it in the same rack as a disk array and the various other I O units required for a fully operating solution Limited installation space in an office or computer center environment can therefore be effectively used In order to prepare for and manage the occurrence of a processor memory or I O bus error a degradation function is supportedthat...

Page 5: ...d can be mounted in the cabinet It s compactness allows it to be mounted it in the same rack as a disk array and the various other I O units required for a fully operating solution Limited installation space in an office or computer center environment can therefore be effectively used In order to prepare for and manage the occurrence of processor memory or I O bus errors a degradation function is ...

Page 6: ...h an optimum balance of processor memory and I O performance High speed bussupporting data transfer at 540MHz SPARC64 GP CPUs withup to 8 MB ofsecondary cache PCI slots capable of 64 bit 66 MHz operation A high speed bus called the named Channel Bus connects withthePCI bridge A PCI bus that conforms to PCI Rev 2 1 is used for the I O bus PCI slots capable of operating at 64 bit 66 MHz are provided...

Page 7: ...anceratio 1 1 15 PRIMEPOWER600 CPU600MHz PRIMEPOWER800 CPU675MHz Compatibility The PRIMEPOWER architecture follows a consistent design policy This same hardware design policy is used on both PRIMEPOWER 650 and 850 Solaris the international operating environment is used This ensures binary compatibility of applications between PRIMEPOWER server models as well as between the SPARC Solaris units As a...

Page 8: ...memory per processor is supported 3 SMP architecture The architecture supports SMP configurations High performance system bus using 540 MHz data transfer 4 High performance I O interface maximums shown are with PRIMEPOWER 850 As basic slots up to 16 PCI buses are provided As expansion slots up to 24 PCI buses are provided to enable flexible configuration High speed I O is supported using 64 bit 66...

Page 9: ...e efficient cooling This configuration enables two Fan Trays to be installed per system board enabling redundant configuration of the cooling system for the high density mounted system boards In turn each of these Fan Trays also contains two small scale high performance fans with adjustable speed control to ensure continued efficient cooling even if a single fan fails PRIMEPOWER 650 and 850 are al...

Page 10: ...ity with SPARC Version 9 Up to four instructions can be issued per cycle Simultaneous execution of up to eight instructions 4 integer operations or 2 address operations 2 floating point operations and 2 load store operations Execution of a full range out of order instruction for all instructions using the register rename function and reservationstation Exceptforspecialinstructions like member etc ...

Page 11: ... effective in significantly reducingskew By this method data transfer at540 MHz has been achieved Conventional 16 CPU class models require a board on which a crossbar switch is mounted for the connection between multiple system boards This requirement causes an increase in memory access latency and an increase in the number of parts In designing PRIMEPOWER 650 and 850 the need for this crossbar ha...

Page 12: ...addressspecified as the low address If the low address at the next memory access matches continuous memory access is enabled as the select status is already set This technique is called low address matching The datain memory is protected using ECC where single bit errors are corrected automatically ECC is also employed in the DTAG used to maintain cache coherency In addition cache memory andTAG of...

Page 13: ... 33MHz 5VorUniversal ShortCard PCI2 B 64bit 32bit 33MHz 5VorUniversal ShortCard PCI3 B 64bit 32bit 33MHz 5VorUniversal ShortCard PCI4 C 64bit 32bit 33 66MHz 3 3V or Universal ShortCard PCI5 D 64bit 32bit 33MHz 5VorUniversal ShortCard PCI6 D 64bit 32bit 33MHz 5VorUniversal ShortCard PCI7 D 64bit 32bit 33MHz 5VorUniversal ShortorLongCard PCIBOX SlotNo PCIbusgroup PCI Card Width PCIClockRate CardInpu...

Page 14: ...ply even if a power supply error or power supply failure occurs Either of these two options can be accommodated in the base cabinet A UPS is supported By connecting a UPS the system can be safely shut down when a power failure occurs and data corruption can be prevented Cooling system PRIMEPOWER 650 and 850 both employ the latest cooling technology In particular these models feature highly capable...

Page 15: ...error occurs in the basic software such as the OS or an error or failure occurs in an applicationprocess PRIMEPOWER 650 and 850 models incorporate the following basic mechanisms toprovide for high availability An expanded automatic error checking and correction range Improved retry functions when an error is detected and the provision of a degradation function which isolates failed components and ...

Page 16: ...erance of the RAID units and also enable duplication of the access paths including the PCI cards SynfinityDisk SynfinityDisk is a software product that provides mirroring and hot spare functions between disk units in a single system configuration and a mirroring function between shared disk units in a SynfinityCluster environment SynfinityFile SynfinityFile is a UFS and API compatible file system ...

Page 17: ...yFile and SynfinityLink for increased availability and Synfinity VIA for further increases performance and network reliability Also provides is our new global cluster system PRIMECLUSTER an integration of Fujitsu Siemens Computers s Reliant Cluster with Synfinity series products In addition to the HA cluster function that ensures application failover PRIMECLUSTER provides a wider range of function...

Page 18: ...Facility SCF drivers etc Machine management Machine management supports the settings for the hardware environment status monitoring and information collection Machine management accesses the syslog messages output by the OS plus the hardware via the System Control Facility SCF and collects analyzes and displays information related to the hardware Auto Power Control System The Auto Power Control Sy...

Page 19: ...ed Copyright FUJITSULimited 2001 All SPARC trademarksare trademarksof SPARCInternational Inc ProductsthathavetheSPARCtrademarkarebasedonarchitecturedevelopedbySunMicrosystemsInc USA SPEC is a trademark of Standard Performance Evolution Corporation Sun Sun Microsystems and Solaris are trademarks or registered trademarks of Sun Microsystems Inc USA in the United States and othercountries Ethernetisa...

Reviews: