background image

Exported on Oct/24/2022 09:50 AM

 

 

 

 

 

 

 

QM9700/QM9790 1U NDR 400Gb/s 

InfiniBand Switch Systems User Manual

 

 

Summary of Contents for 920-9B210-00FN-0D0

Page 1: ...Exported on Oct 24 2022 09 50 AM QM9700 QM9790 1U NDR 400Gb s InfiniBand Switch Systems User Manual ...

Page 2: ...ation 19 Power Cable and Cable Retainer 19 Port Cables 21 Initial Power On 24 System Bring Up of Managed Systems 25 Configuring Network Attributes 25 Configuring the Switch with ZTP 34 Rerunning the Wizard 34 Starting the Command Line CLI 34 FRU Replacements 35 Power Supply 35 Fans 36 Software Management 38 InfiniBand Subnet Manager 38 Upgrading Software on Managed Systems 38 Updating Firmware on ...

Page 3: ...Unit Identification LED 47 Port LEDs 47 Inventory Pull out Tab 48 Troubleshooting 50 Specifications 51 Appendixes 52 Accessory and Replacement Parts 52 Thermal Threshold Definitions 52 Interface Specifications 53 OSFP Pin Description 53 RJ45 to DB9 Harness Pinout 54 Disassembly and Disposal 55 Disassembly Procedure 55 Disposal 55 Document Revision History 57 ...

Page 4: ...NDR 32 OSFP ports unmanaged P2C airflow forward 920 9B210 00RN 0D0 MQM9790 NS2R 64 ports NDR 32 OSFP ports unmanaged C2P airflow reverse Related Documentation Document Description InfiniBand Architecture Specification Volume 1 Release 1 5 The InfiniBand Trade Association IBTA InfiniBand Specification at https www infinibandta org MLNX OS User Manual This document contains information regarding the...

Page 5: ...need to exchange information in real time The QM9700 NDR InfiniBand switches extend NVIDIA In Network Computing technologies and introduce the third generation of NVIDIA SHARP technology SHARPv3 Creating virtually unlimited scalability for large data aggregation through the data center network participating in the application s runtime and reducing the amount of data needed to traverse the network...

Page 6: ...790 32 51 2Tb s Management Interfaces PSUs and Fans The table below lists the various management interfaces and available replacement parts per system model System Model USB MGT I2 C Console Replaceable PSU Replaceable Fan QM9700 Front USB3 0 type A Front 1 port NA Front Yes Yes QM9790 NA NA Front USB3 0 type A NA Yes Yes Features For a full feature list please refer to the system s product brief ...

Page 7: ...rtifications The list of certifications such as EMC Safety and others per system for different regions of the world is located on the Mellanox website at http www mellanox com page environmental_compliance ...

Page 8: ...10 standard for 19 inch racks Take precautions to guarantee proper ventilation in order to maintain good airflow at ambient temperature Due to thermal considerations the switch systems must be installed in a horizontal position do not install the systems vertically Unless otherwise specified NVIDIA products are designed to work in an environmentally controlled data center with low levels of gaseou...

Page 9: ...ll models Air Flow NVIDIA systems are offered with two air flow patterns Power rear side inlet to connector side outlet marked with blue dots that are placed on the power inlet side Air Flow Direction Marking Power Side Inlet to Connector Side Outlet Connector front side inlet to power side outlet marked with red dots that are placed on the power inlet side Air Flow Direction Marking Connector Sid...

Page 10: ...rts for visible damage that may have occurred during shipping The QM9700 and QM9790 package content is as follows 1 System 1 Rail kit 4 Power cables Type C14 to C15 1 Harness HAR000631 Harness RS232 2M cable DB9 to RJ 45 only in QM9700 2 Cable retainers 32 OSFP thermal caps 19 System Mounting Options By default the systems are shipped with the rail kit described in Tool Less Rail Kit Tool Less Rai...

Page 11: ... in the same direction Note that the part of the system to which you choose to attach the rails the front panel direction as demonstrated in Option 1 or the FRUs direction as demonstrated in Option 2 will determine the system s adjustable side The system s part to which the brackets are attached will be adjacent to the cabinet The FRUs as well as high speed and MNG cables must be extracted for rep...

Page 12: ...System Rails A to the Switch Secure the assembly by gently pushing the system chassis pins through the slider key holes until locking occurs The following steps include illustrations that show front side ports installation yet all instructions apply to all installation options ...

Page 13: ... Chassis Pins in the Rails Slots Locking them in a Fixed Position Mount both of the rack rails B into the rack by angularly inserting the brakes located at the rails edges into the designated slots in the rack unit as shown in the following figure ...

Page 14: ... B to sit horizontally in parallel to the rack assembly By straightening the rails angular position their breaks will be caught and locked in the rack s slots Aligning the Rack Rails B Angular Position The Breaks are Caught and Locked in the Rack s Slots ...

Page 15: ...e Rack Assembly Pull the rack rails telescopic extensions all the way to the rack s opposite side and insert the latches at the rails free edges to the rack s slots A click should be heard as the spring latches are fully inserted and locking occurs ...

Page 16: ...eight perform the following steps Slide the rails installed on the system into the channels in the rack rails Push them forward until the locking mechanism is activated on both sides and a click is heard Tighten the captive screws on both sides to further secure the system to the rack s posts Sliding the System s Rails A into the Rack Rails B At least two people are required to safely mount the sy...

Page 17: ...the electrical outlet While your installation partner is supporting the system s weight Loosen the captive screws attaching the system s rails to the rack s posts Use two hands to pull the system out until the rails are stopped Pulling the System Out Press the spring latches on both sides of the rack and continue to pull the system out until the rack rails are clear of the system s rails ...

Page 18: ... from the system Release the metal latches and pull out the rails so the system s pins will be removed out of the oval slots Removing the Rails from the System Remove the rails from the rack by pressing the lock button and pull the rails outside of the rack assembly ...

Page 19: ...ecommended to use them in order to secure the power cables in place When installing retainers for the PSUs of the QM97x0 switch systems please adhere to the following instructions Verify the integrity of the retainer assembly as demonstrated in the below table The snaps push pins must have visible edges with no broken or torn parts The shoulders pins should be in tact and must not be bent inwards ...

Page 20: ... retainer s plastic loop is facing upwards as demonstrated in the below table Correct Insertion Incorrect Insertion Push the retainer until the shoulders pins in blue circles below are open and aligned with the PSU front panel as shown in the following table For demonstration purposes the images in this document show C2P Connector to Power airflow PSUs with red latches yet the instructions apply t...

Page 21: ...op over the AC cord as shown in the following table and fasten it tightly Proper Loop Placement Improper Loop Placement Port Cables All cables can be inserted or removed with the unit powered on Each cable retainer can be used once only Once the retainer has been fully inserted and the shoulders pins have been adjusted the retainer cannot be used again and should be discarded if pulled out ...

Page 22: ...ivided into 2 dual lane ports It maximizes flexibility by enabling end users to use a combination of dual lane and quad lane interfaces according to the specific requirements of their network In the QM97x0 systems each connector contains two ports and all system ports may be split into 2 lane ports For the systems splitting options see QM9700 QM9790 Splitting Options below Splitting a port changes...

Page 23: ...of 400G and each port can be split to two There are no blocking requirements Port Notation Schematics The following behavior should be expected when disconnecting a 1 2 splitter cable from cages in both the upper and lower rows When you disconnect a cable marked as 1 the CLI cage number 1 will always go down and the left LED of the cage will be turned off When you disconnect the cable marked as 2 ...

Page 24: ...able Wait for the system upload process Check the frontal System Status LEDs and confirm that all of the LEDs show status lights consistent with normal operation initially flashing and then moving to a steady color as shown below For more information refer to LED Notifications System Status LEDs 5 Minutes After Power On The system platform will automatically power on when AC power is applied There...

Page 25: ...ht is on make sure that the Fan Status LED shows green If the Fan Status LED is not green unplug the power connection and check that all fan modules are inserted properly and that the mating connector of the fan unit is free of any dirt and or obstacles If no obstacles were found and the problem persists call your NVIDIA representative for assistance Risk of electric shock and energy hazard The tw...

Page 26: ...ystem Connect a host PC to the Console RJ45 port of the system using the supplied harness cable DB9 to RJ45 Make sure to connect to the Console RJ45 port and not to the Ethernet MGT port Configure a serial terminal program for example HyperTerminal minicom or Tera Term on your host PC with the settings described in the table below Once you perform that you should get the CLI prompt of the system S...

Page 27: ...at initialization is ongoing a countdown of the number of remaining modules to be configured is displayed in the following format no of modules Modules are being configured Go through the Switch Management configuration wizard IP Configuration by DHCP Wizard Session Display Example Comments Do you want to use the wizard for initial configuration yes You must perform this configuration the first ti...

Page 28: ... to enable IPv6 on management ports If you wish to enable IPv6 type yes and press Enter If you enter no no IPv6 then you will automatically be referred to Step 5 Step 4 Enable IPv6 autoconfig SLAAC on mgmt0 interface Perform this step to enable StateLess address autoconfig on external management port If you wish to enable it type yes and press Enter If you wish to disable it enter no Step 5 Use DH...

Page 29: ...ormation Hostname switch name Use DHCP on mgmt0 interface yes Enable IPv6 yes Enable IPv6 autoconfig SLAAC on mgmt0 interface yes Enable DHCPv6 on mgmt0 interface no Update time current time Enable password hardening yes Admin password Enter to leave unchanged CHANGED To change an answer enter the step number to return to Otherwise hit enter to save changes and exit Choice Enter Configuration chan...

Page 30: ...nable StateLess address autoconfig on external management port If you wish to enable it type yes and press Enter If you wish to disable it enter no Step 5 Use DHCPv6 on mgmt0 interface yes Perform this step to enable DHCPv6 on the MGMT0 interface Step 6 Admin password Press Enter to leave unchanged new_password To avoid illegal access to the machine please type a password and then press Enter Step...

Page 31: ...t to use the wizard for initial configuration y Step 1 Hostname switch 112126 Step 2 Use DHCP on mgmt0 interface yes n Step 3 Use zeroconf on mgmt0 interface no Step 4 Primary IP address 192 168 10 4 Mask length may not be zero if address is not zero interface mgmt0 Step 5 Netmask 0 0 0 0 255 255 255 0 Step 6 Default gateway 192 168 10 1 Step 7 Primary DNS server Step 8 Domain name Step 9 Enable I...

Page 32: ...nged You have entered the following information Hostname switch 112126 Use DHCP on mgmt0 interface no Use zeroconf on mgmt0 interface yes Default gateway 192 168 10 1 Primary DNS server Domain name Enable IPv6 yes Enable IPv6 autoconfig SLAAC on mgmt0 interface yes Update time yyyy mm dd hh mm ss Enable password hardening yes Admin password Enter to leave unchanged unchanged To change an answer en...

Page 33: ...ce yes 6 Admin password Enter to leave unchanged unchanged 7 HA Chassis IP address 10 6 166 200 8 HA Chassis Management IP netmask 255 255 255 0 9 HA Chassis IPv6 address fdfd fdfd 7 145 1000 4814 10 HA Chassis Management IPv6 masklen 33 To change an answer enter the step number to return to Otherwise hit enter to save changes and exit Choice Configuration changes saved To return to the wizard fro...

Page 34: ...l deployment cost For more information please refer to section Zero touch Provisioning Rerunning the Wizard To rerun the wizard Enter Config mode Run switch enable switch config terminal Rerun the wizard Run switch config configuration jump start Starting the Command Line CLI Set up an Ethernet connection between the switch and a local network machine using a standard RJ 45 connector Start a remot...

Page 35: ...ated on https docs nvidia com networking category mlnxos FRU Replacements Power Supply NVIDIA systems are equipped with two replaceable power supply units work in a redundant configuration Either unit may be extracted without bringing down the system To extract a power supply unit Remove the power cord from the power supply unit Grasping the handle with your hand push the latch release with your t...

Page 36: ...tage Fans The system can fully operate if one fan FRU is dysfunctional Failure of more than one fan is not supported To extract a fan unit Extract the fan by pulling the gold handle outwards As the fan unit unseats its status LEDs will turn off Remove the fan unit Do not attempt to insert a power supply unit with a power cord connected to it The green power supply unit indicator should light If it...

Page 37: ...tor of the new unit is free of any dirt and or obstacles Insert the fan unit by sliding it into the opening until slight resistance is felt Continue pressing the fan unit until it seats completely The green Fan Status LED should light If not extract the fan unit and reinsert it After two unsuccessful attempts to install the fan unit power off the system before attempting any system debug ...

Page 38: ...tself system based or on one of the nodes which is connected to the fabric host based The subnet manager OpenSM assigns Local IDentifiers LIDs to each port connected to the fabric and develops a routing table based on the assigned LIDs A typical installation using the OFED package will run the OpenSM subnet manager at system start up after the drivers are loaded This automatic OpenSM is resident i...

Page 39: ...its your operating system In order to obtain information regarding the externally managed system you must download the NVIDIA Mellanox MFT tools from https network nvidia com products adapter software firmware tools Select and download the release that matches your system Follow the instructions in the User Manual https docs nvidia com networking category mft to get the tools Updating Firmware In ...

Page 40: ... network nvidia com support firmware firmware downloads select the Quantum System page If the current version is not the latest version follow the directions in the MFT User manual to burn the new firmware inband For further information please refer to MFT User Manual at https docs nvidia com networking category mft ...

Page 41: ...rs support Speed InfiniBand speed is auto adjusted by the InfiniBand protocol NVIDIA systems support QDR FDR EDR HDR NDR InfiniBand FDR is an InfiniBand data rate where each lane of a 4X port runs a bit rate of 14 0625Gb s with 64b 66b encoding resulting in an effective bandwidth of 56 25Gb s EDR is an InfiniBand data rate where each lane of a 4X port runs a bit rate of 25Gb s with 64b 66b encodin...

Page 42: ...ent The connector comes in a standard micro USB shape To view the full matrix of micro USB configuration options refer to Management Interfaces PSUs and Fans I C The I C connector is combined with the USB connector and is located on the front side of the system It can be used with the I C DB9 to micro USB splitting harness This interface is not found in externally managed systems This interface is...

Page 43: ... hardware event notification and troubleshooting LEDs Symbols Symbol Name Description Normal Conditions System Status LED Shows the health of the system Green Flashing green when booting Power Supply Units LEDs Shows the health of the power supply units Green Only original NVIDIA cables supplied with the switch package can be used to connect a switch system to the server Connecting any cable other...

Page 44: ... system is up and running normally N A Flashing Green The system is booting up This assignment is valid on managed systems only Wait up to five minutes for the end of the booting process Solid Amber Major error has occurred For example corrupted firmware system is overheated etc If the System Status LED shows amber five minutes after starting the system unplug the system and call your NVIDIA repre...

Page 45: ...r Fan LED Behavior Description Action Required Solid Green A specific fan unit is operating N A Solid Amber A specific fan unit is missing or not operating properly The fan unit should be replaced Power Supply Status LEDs There are two power supply inlets in the system for redundancy The system can operate with only one power supply connected Each power supply unit has a single 2 color LED that in...

Page 46: ...EDs on the rear side of the system are located on the PSUs themselves Each PSU has a single 2 color LED Power Supply Unit Status Rear LED Assignments LED Behavior Description Action Required Solid Green All PS units are connected and running normally N A Flashing Green 1Hz AC present Only 12VSB on PSU off or PSU in Smart on state Call your NVIDIA representative for assistance Amber AC cord unplugg...

Page 47: ...Identification LED The UID LED is a debug feature that the user can use to find a particular system within a cluster by turning on the UID blue LED To activate the UID LED on a switch system run switch config led MGMT uid on To verify the LED status run switch config show leds Module LED Status MGMT UID Blues To deactivate the UID LED on a switch system run switch config led MGMT uid off Port LEDs...

Page 48: ...mber A problem with the link Check that the SM is up In InfiniBand system mode the LED indicator corresponding to each data port will light orange when the physical connection is established that is when the unit is powered on and a cable is plugged into the port with the other end of the connector plugged into a functioning port When a logical connection is made the LED will change to green When ...

Page 49: ...49 The images provided here are for illustration purposes only The may not reflect the latest version of the product nor all available models ...

Page 50: ...he power cable Replace the PSU if needed The activity LED does not light up InfiniBand Make sure that there is an SM running in the fabric System boot failure The last software upgrade failed on x86 based systems Solution Connect the RS232 connector CONSOLE to a laptop Push the system s reset button Press the ArrowUp or ArrowDown key during the system boot GRUB menu will appear For example Default...

Page 51: ... Humidity Operational 10 85 non condensing Non Operational 10 90 non condensing Altitude 3050m Noise level 78 4dBA at room temperature Regulatory Safety CB cTUVus CE CU EMC EMC CE FCC VCCI ICES RCM RoHS RoHS compliant Power Input Voltage 1x 2x 200 240Vac 10A 50 60Hz Global Power Consumption QM9700 Typical power with passive cables ATIS 747W Max power with active cables 1 703W QM9790 Typical power ...

Page 52: ... AC P2C Airflow For QM97xx switches Power cord included HAR000631 Harness RS232 2M cable DB9 to RJ 45 for managed switches only ACC001897 Power cord black 250V 15A 1830MM C14 TO C15 UL ACC001899 Power cord black 250V 10A 1830MM C14 TO C15 EUR CCC ACC001850 OSFP thermal cap with openings for airflow Thermal Threshold Definitions Three thermal threshold definitions are measured by the Quantum ASICs ...

Page 53: ...rted GND 7 Ground TX6P 8 Transmitter Data Non Inverted TX6N 9 Transmitter Data Inverted GND 10 Ground TX8P 11 Transmitter Data Non Inverted TX8N 12 Transmitter Data Inverted GND 13 Ground SCL 14 2 wire Serial interface clock VCC1 15 3 3V Power VCC1 16 3 3V Power LPWn_PRSn 17 PRSn Low Power Mode Module Present GND 18 Ground RX7N 19 Receiver Data Inverted RX7P 20 Receiver Data Non Inverted GND 21 Gr...

Page 54: ...Tn Module Interrupt Module Reset VCC2 45 3 3V Power VCC2 46 3 3V Power SDA 47 2 wire Serial interface data GND 48 Ground TX7N 49 Transmitter Data Inverted TX7P 50 Transmitter Data Non Inverted GND 51 Ground TX5N 52 Transmitter Data Inverted TX5P 53 Transmitter Data Non Inverted GND 54 Ground TX3N 55 Transmitter Data Inverted TX3P 56 Transmitter Data Non Inverted GND 57 Ground TX1N 58 Transmitter D...

Page 55: ...EC all waste electrical and electronic equipment EEE should be collected separately and not disposed of with regular household waste Dispose of this product and all of its parts in a responsible and environmentally friendly way RJ 45 Console and I C interfaces are integrated in the same connector Due to that connecting any cable other than the NVIDIA supplied console cable may cause an I C hang Us...

Page 56: ...56 Follow the instructions found at http www mellanox com page dismantling_procedures for proper disassembly and disposal of the switch according to the WEEE directive ...

Page 57: ...y Date Revision Description July 2022 1 2 Updated OPNs in Ordering Information Installation Accessory and Replacement Parts Updated Cable Installation February 2022 1 1 Updated Cable Installation November 2021 1 0 Initial release ...

Page 58: ...d by customer and perform the necessary testing for the application in order to avoid a default of the application or the product Weaknesses in customer s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and or requirements beyond those contained in this document NVIDIA accepts no liability related to any default dama...

Page 59: ...of the respective companies with which they are associated Copyright 2022 NVIDIA Corporation affiliates All Rights Reserved ...

Reviews: