background image

 

 
 
 

11 

The programming model for the InfiniBand transport assumes that an application accesses at least one 
Send and one Receive queue to initiate the I/O. The transport layer supports four types of data 
transfers for the Send queue: 

 

Send/Receive – Typical operation where one node sends a message and another node receives the 
message 

 

RDMA Write – Operation where one node writes data directly into a memory buffer of a remote 
node 

 

RDMA Read – Operation where one node reads data directly from a memory buffer of a remote 
node 

 

RDMA Atomics – Allows atomic update of a memory location from an HCA perspective.  

The only operation available for the receive queue is Post Receive Buffer transfer, which identifies a 
buffer that a client may send to or receive from using a Send or RDMA Write data transfer.  

Scale-out clusters built on InfiniBand and HP technology 

In the past few years, scale-out cluster computing has become a mainstream architecture for high 
performance computing. As the technology becomes more mature and affordable, scale-out clusters 
are being adopted in a broader market beyond HPC. HP Oracle Database Machine is one example. 
The trend in this industry is toward using space- and power-efficient blade systems as building blocks 
for scale-out solutions . HP BladeSystem c-Class solutions offer significant savings in power, cooling, 
and data center floor space without compromising performance.  
The c7000 enclosure supports up to 16 half-height or 8 full-height server blades and includes rear 
mounting  bays for management and interconnect components. Each server blade includes mezzanine 
connectors for I/O options such as the HP 4x QDR IB mezzanine card. HP c-Class server blades are 
available in two form-factors and server node configurations to meet various density goals. To meet 
extreme density goals, the half-height HP BL2x220c server blade includes two server nodes. Each 
node can support two quad-core Intel Xeon 5400-series processors and a slot for a mezzanine board, 
providing a maximum of 32 nodes and 256 cores per c7000 enclosure.  

NOTE: 

The DDR HCA mezzanine card should be installed in a PCIe x8 
connector for maximum InfiniBand performance. The QDR HCA 
mezzanine card is supported on the ProLiant G6 blades with PCIe 
x8 Gen 2 mezzanine connectors 

Figure 9 shows a full bandwidth fat-tree configuration of HP BladeSystem c-Class components 
providing 576 nodes in a cluster. Each c7000 enclosure includes an HP 4x QDR IB Switch, which 
provides 16 downlinks for server blade connection and 16 QSFP uplinks for fabric connectivity. 
Spine-level fabric connectivity is provided through sixteen 36-port Voltaire 4036 QDR InfiniBand 
Switches

2

. The Voltaire 36-port switches provide 40-Gbps (per port) performance and offer fabric 

management capabilities.  

                                                 

2

 Qualified, marketed, and supported by HP. 

Содержание 489183-B21 - InfiniBand DDR Switch

Страница 1: ... 7 IPoIB 7 RDMA based protocols 7 RDS 8 InfiniBand hardware architecture 8 Link operation 9 Scale out clusters built on InfiniBand and HP technology 11 Conclusion 13 Appendix A Glossary 14 For more information 15 Call to action 15 Using InfiniBand for a scalable compute infrastructure technology brief 3rd edition ...

Страница 2: ...InfiniBand based scale out architectures Introduction The overall performance of enterprise servers is determined by the synergetic relationship between three main subsystems processing memory and input output The multiprocessor architecture used in the latest single server systems Figure 1 provides a high degree of parallel processing capability However multiprocessor server architecture cannot s...

Страница 3: ...t with an interconnect technology that scales easily reliably and economically with system expansion Ethernet is a pervasive mature interconnect technology that can be cost effective for some application workloads The emergence of 10 Gigabit Ethernet 10GbE offers a cluster interconnect that meets higher bandwidth requirements than 1GbE can provide However 10GbE still lags the latest InfiniBand tec...

Страница 4: ...data payloads and encapsulates each data payload and an identifier of the destination node into one or more packets Packets can contain data payloads of up to four kilobytes The packets are passed to the network layer which selects a route to the destination node and if necessary attaches the route information to the packets The data link layer attaches a local identifier LID to the packet for com...

Страница 5: ...an communicate with an HCA or an InfiniBand switch An InfiniBand switch provides scalability by allowing a number of HCAs TCAs and other IB switches to connect to an InfiniBand infrastructure The switch handles network traffic by checking the local link header of each data packet received and forwarding the packet to the proper destination The most basic InfiniBand infrastructure will consist of h...

Страница 6: ...r APIs uDAPL SDP Library MAD API Open Fabrics Verbs CMA and API IPoIB Upper Level Protocols Provider User space Kernel space Hardware Specific Driver B Hardware MPIs Open SM Application Level Mid Layer Modules Connection Manager SA Client MAD Services Diag Tools RDS RDMA based Protocols kDAPL SDP SRP iSER NFS InfiniBand HCA Open Fabrics Verbs and API SMA Clustered DB VNIC IP Based Access Sockets B...

Страница 7: ...ications such as PING FTP and TELNET IPoIB does not support the RDMA features of InfiniBand Communication between IB nodes using IPoIB and Ethernet nodes using IP will require a gateway router interface RDMA based protocols DAPL The Direct Access Programming Library DAPL allows low latency RDMA communications between nodes The uDAPL provides user level access to RDMA functionality on InfiniBand wh...

Страница 8: ... 10 Gbps Bandwidth is increased by adding more lanes per link InfiniBand interconnect types include 1x 4x or 12x wide full duplex links Figure 5 The 4x is the most popular configuration and provides a theoretical full duplex QDR bandwidth of 80 2 x 40 gigabits per second Figure 5 InfiniBand link types Encoding overhead in the data transmission process limits the maximum data bandwidth per link to ...

Страница 9: ...Figure 7 Each virtual lane provides flow control and allows a pair of devices to communicate autonomously Typical implementations have each link accommodating eight lanes1 one lane is reserved for fabric management and the other lanes for packet transport The virtual lane design allows an InfiniBand link to share bandwidth between various sources and targets simultaneously For example if a 10Gb s ...

Страница 10: ... interpreted These protocols can be implemented in hardware some protocols more efficient than others The UD and Raw protocols for instance are basic datagram movers and may require system processor support depending on the ULP used When the reliable connection protocol is operating Figure 8 hardware at the source generates packet sequence numbers for every packet sent and the hardware at the dest...

Страница 11: ...ignificant savings in power cooling and data center floor space without compromising performance The c7000 enclosure supports up to 16 half height or 8 full height server blades and includes rear mounting bays for management and interconnect components Each server blade includes mezzanine connectors for I O options such as the HP 4x QDR IB mezzanine card HP c Class server blades are available in t...

Страница 12: ...rms are built around specific hardware and software platforms and offer a choice of interconnects For example the HP Cluster Platform CL3000BL uses the HP BL2x220c G5 BL280c G6 and BL460c blade servers as the compute node with a choice of GbE or InfiniBand interconnects No longer unique to Linux or HP UX environments HPC clustering is now supported through Microsoft Windows Server HPC 2003 with na...

Страница 13: ... ISVs for developing and running MPI based applications across multiple platforms and interconnect types Software development and support become simplified since interconnects from a variety of vendors can be supported by an application written to the HP MPI protocol Parallel compute applications that involve a high degree of message passing between nodes benefit significantly from InfiniBand HP B...

Страница 14: ...ons in parallel compute systems NFS Network File System file storage protocol NHP Non Hot Pluggable drive QDR Quad Data Rate for InfiniBand clock rate of 10 Gbps 2 5 Gbps x 4 QSFP Quad Small Form factor Pluggable interconnect connector type RDMA Remote Direct Memory Access protocol allowing data movement in and out of system memory without CPU intervention RDS Reliable Datagram Sockets transport p...

Страница 15: ...adeSystem http h18004 www1 hp com products blades compone nts c class tech function html Call to action Send comments about this paper to TechCom HP com 2009 Hewlett Packard Development Company L P The information contained herein is subject to change without notice The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and serv...

Отзывы: