background image

LED Status Indicators

007-5806-004

85

This type of information can be useful in helping your administrator or service provider identify 
and more quickly correct hardware problems. See the following subsections for IP115 and IP119 
blade-status LED information.

IP115 Compute Blade Status LEDs

Figure 6-3 identifies the locations of the IP115 board status LEDs. 

10 11 12 13 14

8

9

1

2

3

4

5

7

6

Blade coolant access panel

Figure 6-3

IP115 Compute Blade Status LEDs Example

The function of the nine LED status lights on the lower node board are as follows:

1. UID - Unit identifier (lower node) - this blue LED is used during troubleshooting to find a 

specific compute node.The LED can be lit via software to aid in locating a compute node.

2. CPU Power Good (lower node) - this green LED is illuminated when the correct power 

levels are present on the processor(s).

3. IB0 link (lower node) - this green LED is illuminated when a link has been established on the 

internal InfiniBand 0 port.

4. IB0 active (lower node) - this amber LED flashes when IB0 is active (transmitting data).

5. Eth1 link (lower node)- this green LED is illuminated when a link has been established on 

the system control Eth port.

Summary of Contents for ICE X

Page 1: ...SGI ICE X System Hardware User Guide Document Number 007 5806 004 ...

Page 2: ...s commercial computer software subject to the provisions of its applicable license agreement as specified in a 48 CFR 12 212 of the FAR or if acquired for Department of Defense units b 48 CFR 227 7202 of the DoD FAR Supplement or sections succeeding thereto Contractor manufacturer is SGI 900 North McCarthy Blvd Milpitas CA 95035 TRADEMARKS AND ATTRIBUTIONS SGI and the SGI logo are registered trade...

Page 3: ...4 iii Record of Revision Version Description 001 March 2012 First release 002 February 2013 Blade and rack design updates 003 June 2014 Blade updates 004 November 2015 cpower command and service reference updates ...

Page 4: ......

Page 5: ...Reader Comments xx 1 Operation Procedures 1 Precautions 1 ESD Precaution 1 Safety Precautions 2 Console Connections 3 Powering the System On and Off 4 Preparing to Power On 5 Powering On and Off 8 Console Management Power cpower Commands 8 Monitoring Your Server 12 Optional SGI Remote Services SGI RS 13 SGI Remote Services Primary Capabilities 14 SGI Remote Services Benefits 14 SGI Remote Service ...

Page 6: ...ew 25 System Models 26 SGI ICE X System and Blade Architectures 29 IP113 Blade Architecture Overview 29 IP115 Blade Architecture Overview 30 IP119 Blade Architecture Overview 31 IP131 Blade Architecture Overview 32 IP133 Blade Architecture Overview 33 QuickPath Interconnect Features 34 IP113 IP115 and IP119 QPI Bandwidth 34 IP131 and IP133 QPI Bandwidth 34 Blade Memory Features 35 Blade DIMM Memor...

Page 7: ...verview 44 MDS Node 44 OSS Node 45 Reliability Availability and Serviceability RAS 45 System Components 47 D Rack Unit Numbering 51 Rack Numbering 51 Optional System Components 51 Optional SGI Remote Services SGI RS 51 SGI Remote Services Primary Capabilities 52 SGI Remote Services Benefits 52 SGI Remote Service Operations Overview 53 SGI Warranty Levels 54 4 Rack Information 55 Overview 55 SGI IC...

Page 8: ...ic Troubleshooting 81 Troubleshooting Chart 82 LED Status Indicators 83 Blade Enclosure Pair Power Supply LEDs 83 IP113 Compute Blade LEDs 84 IP115 Compute Blade Status LEDs 85 IP119 Compute Blade Status LEDs 87 IP131 Compute Blade Status LEDs 88 IP133 Compute Blade Status LEDs 89 Accessing Online Support Information and Services 90 SGI Customer Portal 90 Technical Assistance 91 Other Resources 91...

Page 9: ...er Specifications 108 D Rack System Environmental Specifications 109 ICE X M Rack Technical Specifications 110 Ethernet Port Specification 112 B Safety Information and Regulatory Specifications 113 Safety Information 113 Regulatory Specifications 115 CMN Number 115 CE Notice and Manufacturer s Declaration of Conformity 115 Electromagnetic Emissions 115 FCC Notice USA Only 116 Industry Canada Notic...

Page 10: ......

Page 11: ...rs 23 Figure 3 1 SGI ICE X Series System Single Rack Air Cooled Example 26 Figure 3 2 D rack Blade Enclosure and Rack Components Example 28 Figure 3 3 InfiniBand 48 port Premium FDR Switch Numbering in Blade Enclosures37 Figure 3 4 SGI ICE X System and Network Components Overview 39 Figure 3 5 D Rack Administration and RLC Cabling to CMCs Example 41 Figure 3 6 Example Rear View of a 1U Service Nod...

Page 12: ... Example 79 Figure 5 11 SGI UV 20 Service Node Rear Panel and Component Descriptions 79 Figure 5 12 SGI UV 20 Service Node Front Control Panel Description 80 Figure 6 1 Power Supply Status LED Indicator Locations 83 Figure 6 2 IP113 Compute Blade Status LED Locations Example 84 Figure 6 3 IP115 Compute Blade Status LEDs Example 85 Figure 6 4 IP119 Blade Status LEDs Example 87 Figure 6 5 IP131 Comp...

Page 13: ...10G RP5 P 2U Server Control Panel Functions listed top to bottom 78 Table 6 1 Troubleshooting Chart 82 Table 6 2 Power Supply LED States 83 Table 7 1 Customer replaceable Components and Maintenance Procedures 94 Table 7 2 SGI Administrative Server PCIe Support Levels 106 Table A 1 SGI ICE X Series Configuration Ranges 107 Table A 2 ICE X System D Rack Physical Specifications 108 Table A 3 Environm...

Page 14: ......

Page 15: ...h the assumption that the reader has a good working knowledge of computers and computer systems Important Information Warning To avoid problems that could void your warranty your SGI or other approved service technician should perform all the setup addition or replacement of parts cabling and service of your SGI ICE X series system with the exception of the following items that you can perform you...

Page 16: ...ack leader and other support server nodes An outline of the server functions is also provided Chapter 6 Basic Troubleshooting provides recommended actions if problems occur on your system Chapter 7 Maintenance Procedures covers end user service procedures that do not require special skills or tools to perform Procedures not covered in this chapter should be referred to SGI customer support special...

Page 17: ...node purposes This server is not used as a system RLC or administrative server SGI Rackable C2110G RP5 P System User Guide P N 007 6343 00x This guide covers general operation installation configuration and servicing of the 2U Rackable C2110G RP5 P server node used in the SGI ICE X system The 2U server can be used as a service node for login batch I O gateway MDS or other service node purposes SGI...

Page 18: ...ster the operation of SGI ICE X systems The management center software is also used to administer other non ICE SGI clusters or systems Obtaining SGI Publications You can obtain SGI documentation as follows Use the SGI customer portal and support website at http support sgi com Click on the following Support by Product productname Documentation If you do not find what you are looking for you can s...

Page 19: ...hat a preceding element can be repeated man page x Man page section identifiers appear in parentheses after man page names GUI element This font denotes the names of graphical user interface GUI elements such as windows screens dialog boxes menus toolbars icons buttons boxes fields and lists Product Support SGI provides a comprehensive product support and maintenance program for its products as fo...

Page 20: ...comments Online the document number is located in the front matter of the manual In printed manuals the document number is located at the bottom of each page You can contact SGI in the following ways Send e mail to the following address techpubs sgi com Contact your customer service representative and ask that an incident be filed in the SGI incident tracking system SGI values your comments and wi...

Page 21: ...ions Before operating your system familiarize yourself with the safety information in the following sections ESD Precaution on page 1 Safety Precautions on page 2 ESD Precaution Caution Observe all electro static discharge ESD precautions Failure to do so can result in damage to the equipment Wear an approved ESD wrist strap when you handle any ESD sensitive device to eliminate possible damage to ...

Page 22: ... warning labels Caution Power off the system only after the system software has been shut down in an orderly manner If you power off the system before you halt the operating system data may be corrupted Warning If a lithium battery is installed in your system as a soldered part only qualified SGI service personnel should replace this lithium battery For a battery of another type replace it only wi...

Page 23: ...drawer closed when the console is not in use and prevents it from accidentally sliding open 2 Handle Used to push and pull the module in and out of the rack 3 LCD Display Controls The LCD controls include On Off buttons and buttons to control the position and picture settings of the LCD display 4 Power LED Illuminates blue when the unit is receiving power 1 2 3 4 Figure 1 1 Flat Panel Rackmount Co...

Page 24: ... typically used for service purposes or for system console access in smaller systems or where an external ethernet connection is not used or available Check with your service representative if use of an RS 232 terminal is required for your system The flat panel rackmount or other optional VGA console connects to the administration controller s video and keyboard mouse connectors as shown in Figure...

Page 25: ...ugged into all the blade enclosure power supplies correctly see the example in Figure 1 3 Setting the circuit breakers on the PDUs to the On position will apply power to the blade enclosure supplies and will start each of the chassis managers in each enclosure Note that the chassis managers in each blade enclosure stay powered on as long as there is power coming into the unit Turn off the PDU brea...

Page 26: ...r switches see the examples in Figure 1 4 and Figure 1 5 on page 7 are turned on to provide power when the system is booted up Power distribution unit PDU Power source Figure 1 4 Eight Outlet Single Phase PDU Example Figure 1 5 on page 7 shows an example of the three phase PDUs ...

Page 27: ...Powering the System On and Off 007 5806 004 7 Figure 1 5 Three Phase PDU Examples ...

Page 28: ... off reset and show the power status of multiple or single system components or individual racks The cpower command is as follows cpower option target_type action target_list Example cpower command arguments are listed and described in Table 1 1 See Table 1 2 on page 11 for examples of the cpower command strings Table 1 1 cpower option action target type and target list descriptions Argument Descr...

Page 29: ...e period specified by the i seconds option see the description in the Option portion of this table on Powers on the target by sending an IPMI power on command Valid target types are switch iru leader node and system If the target type is system leaders and compute nodes are powered on first then the ICE compute nodes are powered on off Powers off the target by sending an IPMI power off command Val...

Page 30: ... reset on the target by sending an IPMI reset command Valid target types are leader and node The wait option is available for this action shutdown Shuts down the target but does not power it off by sending a shutdown h now command via ssh Waits for targets to shut down Valid target types are node leader and system Target_list Performs the listed action on all specified target types such as r1i n w...

Page 31: ...er node halt r i n Shuts down halts all the blade enclosure compute nodes in the system but not the administrative controller server rack leader controller or other service nodes cpower system on Boots or reboots all rack leaders and nodes in a system cpower node on r1i0n8 Command tries to specifically boot rack 1 IRU0 node 8 cpower leader status Determines the power status of all rack leaders cpo...

Page 32: ...ler server to access and monitor the system via IPMI See the SGI Management Center Administration Guide for Clusters P N 007 6358 00x for more information on console management These console connections enable you to view the status and error messages generated by your SGI ICE X system You can also use these consoles to input commands to manage and monitor your system See the section System Power ...

Page 33: ...guide for detailed information on installing optional I O cards or other components Note that each blade enclosure pair is configured with either two or four InfiniBand switch blades Optional SGI Remote Services SGI RS The optional SGI RS system automatically detects system conditions that indicate potential future problems and then notifies the appropriate personnel This enables you and SGI globa...

Page 34: ...involvement of customer staff during troubleshooting Faster support case resolution Improved productivity Proactive potential problem identification can result in higher system availability Automated Alerts and in some instances Case Opening results in faster problem resolution time and less direct involvement required by Customer Support Teams SGI Remote Services are available for all UV systems ...

Page 35: ...ng requires no changes to customer systems or firewalls as long as the SGI Agent can send HTTPS messages to highly secure Cloud and Global Access Servers It will also have no impact on customer network or system performance All communication between SGI global support and customer systems is kept secure using Secure Socket Layer SSL encryption All communication with SGI is initiated from the custo...

Page 36: ......

Page 37: ... two CMCs are needed when the enclosure uses dual node blades The first CMC is located directly below the enclosure s switch blade s and the other directly above The chassis manager supports power up and power down of the blade enclosure s compute node blades and environmental monitoring of all units within the enclosure Note that the stand alone service nodes use IPMI to monitor system health Mas...

Page 38: ...Remote workstation monitor Local Area Network LAN Local Area Network LAN Cat 5 Ethernet SGI ICE X system 18 007 5806 004 2 System Management Figure 2 1 SGI ICE X System Network Access Example ...

Page 39: ...ntrol The chassis management control network configuration of your ICE X series machine will depend on the size of the system and the control options selected Typically any system with multiple blade enclosures will be interconnected by the chassis managers in each blade enclosure Note Mass storage option enclosures are not monitored by the blade enclosure s chassis manager Most optional mass stor...

Page 40: ... rack leader and service node servers via Gigabit Ethernet switches See the redundant switch example in Figure 2 2 and the non redundant example in Figure 2 3 on page 21 LAN1 LAN2 BMC LAN3 LAN4 LAN1 LAN2 BMC LAN3 LAN4 48 port GigE switch VLAN VLAN 48 port GigE switch VLAN VLAN Stacking cables CMC 0 CMC 1 CMC 2 CMC 3 CMC 0 CMC 1 CMC 2 CMC 3 Rack 001 and 002 RLC Customer LAN System admin node Servic...

Page 41: ...Service node LAN1 LAN2 BMC Rack 001 Rack 002 VLAN Figure 2 3 Non redundant Chassis Manager Interconnection Diagram Example M rack Chassis Manager Interconnection The interconnection of an M Cell s rack environment is somewhat more complex than a D rack and requires the use of a third VLAN VLAN3 within the Gigabit Ethernet switch network This VLAN3 interface allows the CMCs to monitor and adjust th...

Page 42: ...M Cell cooling rack Cooling controller Cooling distribution unit 22 007 5806 004 2 System Management Figure 2 4 M rack System Chassis Manager Interconnect Example Chassis Management Control CMC Functions The following list summarizes the control and monitoring functions that the CMCs perform Most functions are common across multiple blade enclosures Controls and monitors blade enclosure fan speeds...

Page 43: ...onsole connection used primarily for service troubleshooting RES RESET switch depress this switch to reset the CMC microprocessor HB Heartbeat LED lighted green LED indicates CMC is running PG Power Good LED this LED is illuminated green when power is present Figure 2 5 shows the chassis management controller front panel in the blade enclosure CMC 0 CMC 1 ACC CNSL RES HB PG Figure 2 5 Chassis Mana...

Page 44: ...on how many blade enclosures are being queried for status powered on or turned off cpower system status This command gives the status of all compute nodes in the system To power on a specific blade enclosure enter a command similar to the following cpower iru on r1i0 In this example the system should respond by powering on the IRU blade enclosure 0 nodes in rack 1 Note that this command does not p...

Page 45: ...sures Each blade enclosure also has an internal InfiniBand communication backplane The 18 compute blades supported in each enclosure can use one or two node boards with ASICs processors memory components and I O chip sets mounted on them The blades slide directly in and out of the enclosures Every compute node in a blade contains four or eight dual inline memory module DIMM memory units per proces...

Page 46: ...26 007 5806 004 3 System Overview System Models Figure 3 1 shows an example configuration of an air cooled single rack SGI ICE X server Figure 3 1 SGI ICE X Series System Single Rack Air Cooled Example ...

Page 47: ...ptional water chilled D rack cooling is available Note that systems with liquid cooled blades reside in M Cell racks and always require water cooling systems to operate See the section Chapter 4 Rack Information for more information on water cooled ICE X M Cell systems The basic SGI ICE X system requires a minimum of one 42U tall rack with PDUs installed to support each blade enclosure pair and an...

Page 48: ... PG CMC 0 CMC 1 ACC CNSL RES HB PG CMC 0 CMC 1 ACC CNSL RES HB PG Blade enclosure pair 1U Gig E switch Blade enclosure pair 1U Gig E switch Rack leader controller Admin server Service node 28 007 5806 004 3 System Overview Figure 3 2 D rack Blade Enclosure and Rack Components Example ...

Page 49: ...ctions One single port IB HCA One dual port IB HCA Two HCAs each with a single port IB connector The node board in an IP113 blade is configured with two multi core Intel processors a maximum of 16 processor cores per compute blade were supported at the time this document was published A maximum of 16 DDR3 memory DIMMs are supported per compute blade The two processors on the IP113 maintain an inte...

Page 50: ...e required for each enclosure This configuration supports a single plane topology in each of the blade enclosures A maximum of 16 DDR3 memory DIMMs are supported per compute blade 8 on each node board The DIMM slots support up to 1600 MT s DIMMs Each node board in the IP115 blade assembly supports one optional 2 5 inch hard disk drive or solid state drive SSD The two processors on each node board ...

Page 51: ...de supports one Xeon E5 2600 processor assembly and one Intel Xeon Phi co processor A PCIe link connects the base and mezzanine boards Processors and co processors are cooled with liquid Cold Sink technology Four RDDR3 memory DIMM slots per board support up to 1600 MT s DIMMs eight memory DIMM slots total within the blade One 2 5 HDD or 2 5 SSD per blade assembly Board management controller BMC Th...

Page 52: ...ute blade 8 DIMMs per socket The two processors on the IP131 maintain an interactive communication link using the Intel QuickPath Interconnect QPI technology This high speed interconnect technology provides data transfers between the on board processors See the section QuickPath Interconnect Features on page 34 for an overview of the link functionality and bandwidth capability Note that the IP131 ...

Page 53: ...must be present as two switch blades are required for each enclosure This configuration supports a single plane topology in each of the blade enclosures A maximum of 16 DDR4 memory DIMMs are supported per compute blade 8 on each node board The DIMM slots support up to 2133 MT s DIMMs Each node board in the IP133 blade assembly supports one optional 2 5 inch hard disk drive or one optional solid st...

Page 54: ...at each clock period once on the rising edge of the clock and once on the falling edge DDR Of the 20 bits in the channel 16 bits are data and 4 bits are error correction 6 4 GHz times 16 bits equals 102 4 Gbits per second Convert to bytes 102 4 divided by 8 equals 12 8 GB s max single direction bandwidth The total aggregate bandwidth of the QPI channel is 25 6 GB s 12 8 GB s x 2 channels IP131 and...

Page 55: ...um of sixteen DDR4 memory DIMMs are supported on each IP131 blade The IP133 compute blade uses two separate and independent node boards and each node supports a maximum of eight DDR4 memory DIMMs Each E5 2600 v3 processor used on IP131 and IP133 blades has four DDR4 memory channels for a total of eight DIMMs per processor socket Each memory channel supports a maximum of two memory DIMMs for a tota...

Page 56: ... a different total DIMM capacity For example one blade may have 16 DIMMs and another may have only eight Note that while this difference in capacity is acceptable functionally it may have an impact on compute load balancing within the system System InfiniBand Switch Blades Two or four fourteen data rate FDR InfiniBand switch blades can be used with each blade enclosure pair configured in the SGI I...

Page 57: ...ingle node blade such as the IP113 or IP131 A blade enclosure pair using dual node blades must use four switch blades to support a single plane topology Check with your SGI sales or service representative for additional information on availability The SGI ICE X FDR switch blade locations example is shown in Figure 3 3 on page 37 Any external switch blade ports not used to support the IB system fab...

Page 58: ...s You can add different types of stand alone module options to a system rack to achieve the desired system configuration You can configure and scale blade enclosures around processing capability memory size or InfiniBand fabric I O capability The air cooled blade enclosure has redundant hot swap fans and redundant hot swap power supplies A water chilled rack option expands an ICE X rack s heat dis...

Page 59: ...All compute blades Admin controller All Rack Leader Controllers All service nodes Runs IPMI software Compute Blades Contains the following Processors Memory Optional PCIe slots and drives Optional MIC GPU cards Each Blade has a BMC Runs Linux OS Service Nodes Login Batch Gateway Optional Lustre Nodes Storage Runs Linux OS Each node has a BMC In Band Software Figure 3 4 SGI ICE X System and Network...

Page 60: ...lone 1U server The rack leader controllers are guided and monitored by the system administration server Each RLC in turn monitors pulls and stores data from all the blade enclosures within the logical rack that it monitors The rack leader then consolidates and forwards data requests received from the blade enclosures to the administration server A rack leader controller also supplies boot and root...

Page 61: ...onfigurations the fabric management function is handled by the rack leader controller RLC node The RLC is an independent server that is not part of the blade enclosure pair See the Rack Leader Controller on page 40 subsection for more detail The fabric management software runs on one or two RLC nodes and monitors the function of and any changes in the InfiniBand fabrics of the system It is also po...

Page 62: ...y combined with the I O gateway server node function in some configurations One or more per system are supported Very large systems with high levels of user logins may use multiple dedicated login server nodes The login node functionality is generally used to create and compile programs and additional login server nodes can be added as the total number of user logins increase The login server is u...

Page 63: ...Users login to a batch server in order to run batch scheduler portable batch system load sharing facility PBS LSF programs Users login or connect to this node to submit these jobs to the system compute nodes I O Gateway Node The I O gateway server function may be combined with login or other service nodes for many configurations If required the I O gateway server function can be hosted on an optio...

Page 64: ... it completes a filename lookup on the MDS node As a result a file is created on behalf of the client or the layout of an existing file is returned to the client For read or write operations the client then interprets the layout in the logical object volume LOV layer which maps the offset and size to one or more objects each residing on a separate OST within the OSS node MDS Node The metadata serv...

Page 65: ... Availability and Serviceability RAS The SGI ICE X server series components have the following features to increase the reliability availability and serviceability RAS of the systems Power and cooling Power supplies within the blade enclosure pair chassis are redundant and can be hot swapped under most circumstances A rack level water chilled cooling option is available for all D rack configuratio...

Page 66: ...cting and correcting 4 bit and 8 bit DRAM failures Detection of all double component 4 bit DRAM failures occur within a pair of DIMMs 32 bits of error checking code ECC are used on each 256 bits of data Automatic retry of uncorrected errors occurs to eliminate potential soft errors Power on and boot Automatic testing POST occurs after you power on the system nodes Processors and memory are automat...

Page 67: ...ont components The blade enclosure pair used in M Cell configurations employs side mounted power supplies see Figure 3 10 on page 50 Fan blower enclosure D rack systems This sheetmetal enclosure is installed back to back with each blade enclosure pair in a D rack system The fan enclosure consists of two 6 blower enclosures and two dedicated power supplies Figure 7 3 on page 99 shows an example of ...

Page 68: ...MC 1 ACC CNSL RES HB PG Chassis manager CMC 0 CMC 1 ACC CNSL RES HB PG CMC 0 CMC 1 ACC CNSL RES HB PG Switch blades Power supplies 48 007 5806 004 3 System Overview Figure 3 8 SGI ICE X Series D Rack Blade Enclosure Pair Components Example ...

Page 69: ... 6 Blade slot 5 Blade slot 4 Blade slot 3 Blade slot 2 Blade slot 1 Blade slot 0 Blade slot 17 Blade slot 16 Blade slot 15 Blade slot 14 Blade slot 13 Blade slot 12 Blade slot 11 Blade slot 10 Blade slot 9 InfiniBand switch blade slot 1 InfiniBand switch blade slot 0 CMC 1 CMC 0 Chassis management controller 9 Compute blade slots Power shelf 0 Power shelf 1 System Components 007 5806 004 49 Figure...

Page 70: ...RES HB PG CMC 0 CMC 1 ACC CNSL RES HB PG CMC 0 CMC 1 ACC CNSL RES HB PG CMC 0 CMC 1 ACC CNSL RES HB PG Power supplies Rack mount shelf Chassis manager Switch blades Figure 3 10 M Rack Blade Enclosure Pair Components Example ...

Page 71: ...ys be zero 0 These numbers are used to identify components starting with the rack including the individual blade enclosures and their internal compute node blades Note that these single digit ID numbers are incorporated into the host names of the rack leader controller RLC as well as the compute blades that reside in that rack Optional System Components Availability of optional components for the ...

Page 72: ...ntification of issues before they create an outage Increase system stability by monitoring hardware and software version compatibility Reduced time to resolve support cases Greater operational efficiency Less involvement of customer staff during troubleshooting Faster support case resolution Improved productivity Proactive potential problem identification can result in higher system availability A...

Page 73: ...eviews select Event Logs around the clock every five minutes to identify potential failure information If the Cloud intelligence detects a critical Event it notifies SGI support personnel This monitoring requires no changes to customer systems or firewalls as long as the SGI Agent can send HTTPS messages to highly secure Cloud and Global Access Servers It will also have no impact on customer netwo...

Page 74: ...ct Additional electronic services may become available after publication of this document To purchase a support contract that allows you to use all available SGI Electronic Support services contact your SGI sales representative For more information about the various support contracts see the following Web pages http www sgi com support http www sgi com services support ...

Page 75: ...tions on page 61 SGI ICE X M Cell Rack Assemblies on page 62 M Cell Functional Overview on page 63 Overview At the time this document was published only specific SGI ICE X racks were approved for ICE X systems shipped from the SGI factory See Figure 4 1 on page 57 and Figure 4 5 on page 62 for examples Contact your SGI sales or support representative for more information on configuring SGI ICE X s...

Page 76: ...enings are located in the front floor and top of the rack Cables are only attached to the front of the IRUs therefore most cable management occurs in the front and top of the rack Stand alone administrative leader and login server modules are the exception to this rule and have cables that attach at the rear of the rack Rear cable connections will also be required for optional storage modules inst...

Page 77: ...p to 16 power outlets may be needed to power a single blade enclosure pair and supporting servers installed in a single rack Optional single phase PDUs can be used in SGI ICE X racks dedicated to I O functionality Figure 4 1 SGI ICE X Series D Rack Example ...

Page 78: ...58 007 5806 004 4 Rack Information Figure 4 2 Front Lock on Tall 42U D Rack ...

Page 79: ...SGI ICE X Series D Rack 42U 007 5806 004 59 Figure 4 3 Optional Water Chilled Door Panels on Rear of ICE X D Rack ...

Page 80: ...60 007 5806 004 4 Rack Information Figure 4 4 Air Cooled D Rack Rear Door and Lock Example ...

Page 81: ...range Nominal Tolerance range North America International 200 240 VAC 230 VAC 180 264 VAC Frequency Nominal Tolerance range North America International 60 Hz 50 Hz 47 63 Hz Phase required 3 phase optional single phase available in I O rack Power requirements max 34 58 kVA 33 89 kW Hold time 16 ms Power cable 12 ft 3 66 m pluggable cords Important The D rack s optional water cooled door panels only...

Page 82: ...ce in an M Cell rack system is that it does not exhaust heated air into the surrounding environment this means an M Cell does not add to the heat load of the computer room Multiple M Cells can be interconnected and configured to create very large systems Most M Cell configurations also require the use of a separate cooling distribution rack unit CDU rack not shown in Figure 4 5 Figure 4 5 M Cell R...

Page 83: ...or storage components and is used strictly for cooling the M Cell assembly The compute racks used in an M Cell configuration are 33 inches 83 8 cm wide other size and weight differences compared to the D Rack are noted in Table 4 2 In the M rack there are six power shelves per blade enclosure pair Table 4 2 SGI ICE X M Rack Technical Specifications Characteristic Specification Height 93 in 236 2 c...

Page 84: ...64 007 5806 004 4 Rack Information Figure 4 6 SGI ICE X Multi Cell M Cell Rack Array Example ...

Page 85: ...SGI ICE X M Cell Rack Assemblies 007 5806 004 65 Figure 4 7 Half M Cell Rack Assembly Cell Example ...

Page 86: ......

Page 87: ...escribe the stand alone servers that act as management infrastructure controllers The specialized functions these servers perform within the SGI ICE X system primarily include Administration and management Rack leader controller RLC functions Other servers described in this chapter can be configured to provide additional services such as Fabric management usually used with larger systems Login Bat...

Page 88: ...er server also known as the system admin controller is at the top of the distributed management infrastructure within the SGI ICE X system The overall SGI ICE X series management is hierarchical see the following subsection System Hierarchy and also Figure 5 1 on page 70 with the RLC s communicating with the compute nodes via CMC interconnect System Hierarchy The SGI ICE X system has a four tier h...

Page 89: ...same logical rack The following two components communicate with CMCs only in M Cell rack systems The cooling rack controller CRC assigned to the same logical rack as the CMC The cooling distribution unit CDU assigned to the same logical rack as the CMC Tip A logical rack can be one or two physical racks The number of CMCs in a blade enclosure pair determines the number of physical racks in a logic...

Page 90: ...s Management Controllers CMC Rack leader controller RLC Rack Cooling Distribution Unit CDU M Cell systems only Customer LAN 48 port GigE Switch System Control GigE Backbone V L A N 1 a n d V L A N 4 Board Management Controllers BMC Each CMC talks to a maximum of 18 BMCs One BMC per compute node V L A N 3 V L A N 3 VLAN3 and VLAN4 Service Nodes V L A N 1 VLAN1 and VLAN4 System Admin Node V L A N 1 ...

Page 91: ...Ie slot Full height half depth x16 PCIe slot Figure 5 2 1U Rack Leader Controller RLC Server Front and Rear Panels The system administrative controller unit acts as the SGI ICE X system s primary interface to the outside world typically a local area network LAN The server is used by administrators to provision and manage cluster functions using SGI s cluster management software Refer to the SGI Ma...

Page 92: ...atch fabric management I O MDS or OSS system At the heart of the system is a dual processor serverboard based on the Intel C602 chipset The serverboard motherboard supports two multi core Intel Xeon E5 2600 series processors Separate QPI link pairs connect the two processors and the I O hub in a network on the board The serverboard has eight DIMM slots four per processor that support DDR3 1600 133...

Page 93: ... GPU or x16 full height PCIe slot 1U Service Nodes 007 5806 004 73 Figure 5 3 SGI Rackable C1104G RP5 1U Service Node Front and Rear Panels 2 1 Figure 5 4 SGI Rackable C1104G RP5 System Control Panel and LEDs From left to right the LED indicators are Overheat fan fail UID LAN1 and LAN2 network indicators Hard drive activity and power good LEDs ...

Page 94: ...ble C2108 RP2 and uses up to 8 hard disk drives Figure 5 5 shows a front view example of the C2108 RP2 service node Figure 5 5 8 HDD Configuration RP2 Service Node Front Panel Example See the SGI Rackable RP2 Standard Depth Servers User Guide P N 007 5837 00x for more detailed information on the RP2 service nodes RP2 Service Node Front Controls The control panel on the C2108 RP2 service node has a...

Page 95: ...th integrated LED B NMI button recessed tool required for use C NIC 1 Activity LED D NIC 3 Activity LED E System Cold Reset button F System Status LED G Power button with integrated LED H Hard Drive Activity LED I NIC 4 Activity LED J NIC 2 Activity LED 2U Service Nodes 007 5806 004 75 ...

Page 96: ...e node Figure 5 7 RP2 Service Node Back Panel Components Example Table 5 2 RP2 Service Node Back Panel Components Label Description A Power Supply Module 1 B Power Supply Module 2 C NIC 1 D NIC 2 E NIC 3 F NIC 4 G Video connector H RJ45 Serial A port I USB ports J RMM4 NIC port K I O module ports connectors optional L Add in adapter slots via Riser Card 1 and Riser Card 2 M Serial B port optional ...

Page 97: ...representative for additional information on GPU configurations See the SGI Rackable C2110G RP5 P System User Guide P N 007 6343 00x for more detailed information on this 2U service node PCI expansion slots 2 1 System reset System LEDs Main power Ten disk drive bays VGA port Ethernet ports USB ports IPMI LAN Figure 5 8 Front and Rear Views of the SGI C2110G RP5 P 2U Service Node The control panel ...

Page 98: ...set button Pressing this button reboots the server Power LED Indicates power is being supplied to the server s power supply units Disk activity LED Indicates drive activity when flashing NIC 1 Activity LED Indicates network activity on LAN 1 when flashing green NIC 2 Activity LED Indicates network activity on LAN 2 when flashing green Power Fail LED The power fail LED lights when a power supply mo...

Page 99: ...E5 4600 Xeon processors and supports up to 48 DIMM memory modules plus multiple I O modules and storage adapters Figure 5 10 shows an example front view of the SGI UV 20 service node and Figure 5 11 shows and describes the unit s rear panel components Figure 5 10 SGI UV 20 Service Node Front Panel Example Figure 5 11 SGI UV 20 Service Node Rear Panel and Component Descriptions ...

Page 100: ...re 5 12 identifies and describes the functions of the SGI UV 20 service node s front control panel Figure 5 12 SGI UV 20 Service Node Front Control Panel Description For more information on the SGI UV 20 server see the SGI UV 20 System User Guide P N 007 5900 00x ...

Page 101: ...page 82 LED Status Indicators on page 83 Blade Enclosure Pair Power Supply LEDs on page 83 IP113 Compute Blade LEDs on page 84 IP115 Compute Blade Status LEDs on page 85 IP119 Compute Blade Status LEDs on page 87 IP131 Compute Blade Status LEDs on page 88 IP133 Compute Blade Status LEDs on page 89 Accessing Online Support Information and Services on page 90 ...

Page 102: ...nd the circuit breaker is on contact your SSE An enclosure pair will not power on Ensure the power cables of the enclosure are plugged in and the PDU is turned on View the CMC output from your system administration controller console If the CMC is not running contact your support provider The system will not boot the operating system Contact your support provider The PWR LED of a populated PCI slo...

Page 103: ...as one green and one amber status LED located at the right edge of the supply Each of the LEDs see Figure 6 1 will either light green or amber yellow stay dark or flash green or yellow to indicate the status of the individual supply See Table 6 2 for a complete list Green LED Amber LED Figure 6 1 Power Supply Status LED Indicator Locations Table 6 2 Power Supply LED States Power supply status Gree...

Page 104: ...are to aid in locating a specific compute blade 2 CPU Power OK this green LED lights when the correct power levels are present on the processor s 3 IB0 link green LED lights when a link is established on the internal InfiniBand 0 port 4 IB0 active this amber LED flashes when IB0 is active transmitting data 5 IB1 link green LED lights when a link is established on the internal InfiniBand 1 port 6 I...

Page 105: ...of the nine LED status lights on the lower node board are as follows 1 UID Unit identifier lower node this blue LED is used during troubleshooting to find a specific compute node The LED can be lit via software to aid in locating a compute node 2 CPU Power Good lower node this green LED is illuminated when the correct power levels are present on the processor s 3 IB0 link lower node this green LED...

Page 106: ...an be lit via software to aid in locating a compute node 9 CPU Power Good upper node this green LED is illuminated when the correct power levels are present on the processor s 10 IB0 link upper node this green LED is illuminated when a link has been established on the internal InfiniBand 0 port 11 IB0 active upper node this amber LED flashes when IB0 is active transmitting data 12 Eth1 link upper ...

Page 107: ...CPU Power Good lower board this green LED is illuminated when the correct power levels are present on the processor s 4 UID Unit identifier upper board this blue LED is used during troubleshooting to find a specific compute node The LED can be lit via software to aid in locating a compute node 5 IB1 active upper board this yellow LED flashes when IB1 is active transmitting data 6 IB1 link upper bo...

Page 108: ... via software to aid in locating a specific compute blade 2 CPU Power OK this green LED lights when the correct power levels are present on the processor s 3 IB0 link green LED lights when a link is established on the internal InfiniBand 0 port 4 IB0 active this amber LED flashes when IB0 is active transmitting data 5 IB1 link green LED lights when a link is established on the internal InfiniBand ...

Page 109: ...oting to find a specific compute node The LED can be lit via software to aid in locating a compute node 2 CPU Power Good lower node this green LED is illuminated when the correct power levels are present on the processor s 3 IB0 link lower node this green LED is illuminated when a link has been established on the internal InfiniBand 0 port 4 IB0 active lower node this amber LED flashes when IB0 is...

Page 110: ...nk has been established on the system control Ethernet port 13 Eth1 active upper node this amber LED flashes when Eth1 is active transmitting data 14 BMC Heartbeat upper node this green LED flashes when the BMC is functioning normally Accessing Online Support Information and Services Multiple levels of service support and troubleshooting information are available through http www sgi com support T...

Page 111: ...ca the Asia Pacific zone Near East and Europe Authorized support partners in areas not directly supported by an SGI Customer Support Center Other Resources Topics covered under Other Resources include Support Services descriptions Product Warranties Customer Replaceable Units CRU Warranty and Support Contract Transfers Service Contracts Software Keys SGI Warranty Levels The complete SGI Electronic...

Page 112: ...Electronic Support services such as SGI RS contact your SGI sales representative Optional SGI Remote Services SGI RS The optional SGI RS system automatically detects system conditions that indicate potential failure see the section Optional SGI Remote Services SGI RS in Chapter 1 for an overview and description ...

Page 113: ... 94 Note These procedures are intended for D rack based ICE X systems Check with your support provider for information on M Cell maintenance Maintenance Precautions and Procedures This section describes how to access the system for specific types of customer approved maintenance and protect the components from damage The following topics are covered Preparing the System for Maintenance or Upgrade ...

Page 114: ... of the band 2 Wrap the exposed adhesive side firmly around your wrist unroll the rest of the band and then peel the liner from the copper foil at the opposite end 3 Attach the copper foil to an exposed electrical ground such as a metal part of the chassis Caution Do not attempt to install or remove components that are not listed in Table 7 1 Components not listed must be installed or removed by a...

Page 115: ...hutting down the enclosure or the complete system In the case of a fully configured loaded enclosure this may not be possible Caution The body of the power supply may be hot allow time for cooling and handle with care Use the following steps to replace a power supply in the blade enclosure box 1 Open the front door of the rack and locate the power supply that needs replacement 2 Disengage the powe...

Page 116: ...opening 6 Slide the power supply into the chassis until the retention latch engages 7 Reconnect the power cord to the supply and engage the retention clip Note When AC power to the rear fan assembly is disconnected prior to the replacement procedure all the fans will come on and run at top speed when power is reapplied The speeds will readjust when normal communication with the blade pair enclosur...

Page 117: ...B P G A C C C N S L R E S H B P G CMC 0 CMC 1 Replacing ICE X System Components 007 5806 004 97 Figure 7 2 Replacing an Enclosure Power Supply ...

Page 118: ... indicating the rack and enclosure position 001c01 L2 Fan number warning limit reached 0 RPM 2 A line will be added to the L1 system controller s log file indicating the fan warning The chassis management controller CMC monitors the temperature within each enclosure If the temperature increases due to a failed fan the remaining fans will run at a higher RPM to compensate for the missing fan The sy...

Page 119: ...Fan 5 Fan 3 Fan 4 Fan 11 Fan 9 Fan 10 Fan 2 Fan 0 Fan 1 Fan 8 Fan 6 Fan 7 Fan power box Replacing ICE X System Components 007 5806 004 99 Figure 7 3 Enclosure Pair Rear Fan Assembly Blowers ...

Page 120: ... 1 Using the 1 Phillips screwdriver undo the captive screw located in the middle of the blower assembly handle The handle has a notch for the screw access see Figure 7 4 2 Grasp the blower assembly handle and pull the assembly straight out A Screw C Loosen screw B Figure 7 4 Removing a Fan From the Rear Assembly ...

Page 121: ...screw to secure the new fan B Tighten screw A Note If you disconnected the AC power to the rear fan assembly prior to the replacement procedure all the fans will come on and run at top speed when power is reapplied The speeds will readjust when normal communication with the blade pair enclosure CMC is fully established Figure 7 5 Replacing an Enclosure Fan ...

Page 122: ...to adjust or move power or other cables to enable the access door to swing outward 3 Move the fan power box outward so that the front of the supply is fully accessible 4 Disconnect the power cord from the supply that is to be replaced If the supply has been active allow several minutes for it to cool down 5 Push the power supply retention tab towards the center of the supply to release it from the...

Page 123: ...Loosen screw A Pull handle B C Press latch to release D Replacing ICE X System Components 007 5806 004 103 Figure 7 6 Removing a Power Supply From the Fan Power Box ...

Page 124: ...Tighten screw A B C D 104 007 5806 004 7 Maintenance Procedures Figure 7 7 Replacing a Power Supply in the Fan Power Box ...

Page 125: ... with PCI PCI X in the following ways Compatible software layers Compatible device driver models Same basic board form factors PCIe controlled devices appear the same as PCI PCI X devices to most software PCI Express technology is different from PCI PCI X in the following ways PCI Express uses a point to point serial interface vs a shared parallel bus interface used in older PCI PCI X technology P...

Page 126: ...rt Levels SGI Admin PCIe Connectors x4 PCIe cards Supported x8 PCIe cards Supported x16 PCIe cards Two supported x32 PCIe cards Not supported If you need more specific information on installing PCIe cards in an administrative leader or other standalone server see the user documentation for that particular unit After installing or removing a new PCIe card do the following 1 Return the server to ser...

Page 127: ... Table A 1 summarizes the SGI ICE X series configuration ranges Table A 1 SGI ICE X Series Configuration Ranges Category Minimum Maximum Blades per enclosure pair 2 bladesa a Compute blades support one or two stuffed sockets each 36 blades Compute nodes per blade 1 compute node two compute nodes Blade enclosure pair 1 per rack 2 per rack Blade slots per rack 36 slots one enclosure pair 72 blade sl...

Page 128: ...ximum 2 500 lbs 1 136 kg approximate water cooled Shipping weight maximum 2 970 lbs 1 350 kg approximate maximum Shipping height maximum 88 75 in 225 4 cm Shipping width 44 in 111 8 cm Shipping depth 62 75 in 159 4 cm Voltage range Nominal Tolerance range North America International 200 240 VAC 230 VAC 180 264 VAC 180 254 VAC Frequency Nominal Tolerance range North America International 60 Hz 50 H...

Page 129: ...n Rack cooling requirements Ambient air or optional water cooling Heat dissipation to air Air cooled ICE X rack Approximately 115 63 kBTU hr maximum based on 33 89 kW 100 dissipation to air Heat dissipation to air Water cooled ICE X rack Approximately 5 76 kBTU hr maximum based on 33 89 kW 5 dissipation to air Heat dissipation to water Approximately 109 85 kBTU hr maximum based on 33 89 kW 95 diss...

Page 130: ...ght full 2 426 lbs 1 103 kg approximate Shipping weight max 2 850 lbs 1 295 kg approximate Voltage range Nominal Tolerance range North America International 200 240 VAC 230 VAC 180 264 VAC 180 254 VAC Frequency Nominal Tolerance range North America International 60 Hz 50 Hz 47 63 Hz 47 63 Phase required single phase or optional 3 phase Power requirements max 76 kVA 77 47 kW Hold time 20 ms Power c...

Page 131: ...g 40 C 40 F to 60 C 140 F Relative humidity 10 to 95 operating no condensation 10 to 95 non operating no condensation Rack cooling requirements Chilled water cooling Heat rejection dissipation to air Zero BTUs Heat rejection dissipation to water Approximately 246 kBTU hr maximum 21 tons based on 100 dissipation to water Maximum altitude 10 000 ft 3 049 m operating 40 000 ft 12 195 m non operating ...

Page 132: ...e A 1 Ethernet Port Table A 6 shows the cable pinout assignments for the Ethernet port operating in 10 100 Base T mode and also operating in 1000Base T mode Table A 6 Ethernet Pinouts Ethernet 10 100Base T Pinouts Gigabit Ethernet Pinouts Pins Assignment Pins Assignment 1 Transmit 1 Transmit Receive 0 2 Transmit 2 Transmit Receive 0 3 Receive 3 Transmit Receive 1 4 NU 4 Transmit Receive 2 5 NU 5 T...

Page 133: ... may fall causing serious damage to the product 5 Slots and openings in the system are provided for ventilation To ensure reliable operation of the product and to protect it from overheating these openings must not be blocked or covered This product should never be placed near or over a radiator or heat register or in a built in installation unless proper ventilation is provided 6 This product sho...

Page 134: ...per adjustment of other controls may result in damage and will often require extensive work by a qualified technician to restore the product to normal condition If the product has been dropped or the cabinet has been damaged If the product exhibits a distinct change in performance indicating a need for service 11 If a lithium battery is a soldered part only qualified SGI service personnel should r...

Page 135: ...European requirements Caution This product has several governmental and third party approvals licenses and permits Do not modify this product in any way that is not expressly approved by SGI If you do you may lose these approvals and your governmental agency authority to operate this device CMN Number The model number or CMN number for the system is on the system label which is mounted inside the ...

Page 136: ...o communications Operation of this equipment in a residential area is likely to cause harmful interference in which case you will be required to correct the interference at your own expense If this equipment does cause harmful interference to radio or television reception which can be determined by turning the equipment off and on you are encouraged to try to correct the interference by using one ...

Page 137: ... numériques de Classe A préscrites dans le Règlement sur les interferences radioélectriques établi par le Ministère des Communications du Canada VCCI Notice Japan Only Figure B 1 VCCI Notice Japan Only Chinese Class A Regulatory Notice Figure B 2 Chinese Class A Regulatory Notice Korean Class A Regulatory Notice Figure B 3 Korean Class A Regulatory Notice ...

Page 138: ...ctrostatic discharge ESD ESD is a source of electromagnetic interference and can cause problems ranging from data errors and lockups to permanent component damage It is important that you keep all the covers and doors including the plastics in place while you are operating the system The shielded cables that came with the unit and its peripherals should be installed correctly with all thumbscrews ...

Page 139: ...n Warning Advarsel Laserstråling nar deksel åpnesstirr ikke inn i strålen Lithium Battery Statements Warning If a lithium battery is a soldered part only qualified SGI service personnel should replace this lithium battery For other types replace the battery only with the same type or an equivalent type recommended by the battery manufacturer or the battery could explode Discard used batteries acco...

Page 140: ...samma batterityp eller en ekvivalent typ som rekommenderas av apparattillverkaren Kassera anvãnt batteri enligt fabrikantens instruktion Warning Varoitus Päristo voi räjähtää jos se on virheellisesti asennettu Vaihda paristo ainoastaan laitevalmistajan suosittelemaan tyyppiin Hävitä käytetty paristo valmistajan ohjeiden mukaisesti Warning Vorsicht Explosionsgefahr bei unsachgemäßen Austausch der B...

Page 141: ... 17 CMN number 77 Compute Memory Blade LEDs 64 customer service xvii D documentation available via the World Wide Web xvi conventions xvii E environmental specifications 69 F front panel display L1 controller 17 L laser compliance statements 81 LED Status Indicators 63 LEDs on the front of the IRUs 63 lithium battery warning statements 2 82 M Message Passing Interface 19 monitoring server 11 N num...

Page 142: ...ion 5 product support xvii R RAS features 40 S server monitoring locations 11 system architecture 23 25 system block diagram 29 system components SGI ICE X front 42 list of 41 system features 32 system overview 19 T tall rack features 46 technical specifications system level 67 technical support xvii three phase PDU 21 troubleshooting problems and recommended actions 62 Troubleshooting Chart 62 ...

Reviews: