background image

 

 
 
 

InfiniBand components 

InfiniBand architecture involves four key components: 

 

Host channel adapter 

 

Subnet manager 

 

Target channel adapter 

 

InfiniBand switch 

A host node or server requires a host channel adapter (HCA) to connect to an InfiniBand 
infrastructure. An HCA can be a card installed in an expansion slot or integrated onto the host’s 
system board. An HCA can communicate directly with another HCA, with a target channel adapter, 
or with an InfiniBand switch. 
InfiniBand uses subnet manager (SM) software to manage the InfiniBand fabric and to monitor 
interconnect performance and health at the fabric level. A fabric can be as simple as a point-to-point 
connection or multiple connections through one or more switches. The SM software resides on a node 
or switch within the fabric, and provides switching and configuration information to all of the switches 
in the fabric. Additional backup SMs may be located within the fabric for failover should the primary 
SM fail. All other nodes in the fabric will contain an SM agent that processes management data. 
Managers and agents communicate using management datagrams (MADs).  
A target channel adapter (TCA) is used to connect an external storage unit or I/O interface to an 
InfiniBand infrastructure. The TCA includes an I/O controller specific to the device’s protocol (SCSI, 
Fibre Channel, Ethernet, etc.) and can communicate with an HCA or an InfiniBand switch. 
An InfiniBand switch provides scalability by allowing a number of HCAs, TCAs, and other IB switches 
to connect to an InfiniBand infrastructure. The switch handles network traffic by checking the local link 
header of each data packet received and forwarding the packet to the proper destination.   
The most basic InfiniBand infrastructure will consist of host nodes or servers equipped with HCAs, an 
InfiniBand switch, and subnet manager software. More expansive networks will include multiple 
switches. 

InfiniBand software architecture 

InfiniBand, like Ethernet, uses a multi-layer processing stack to transfer data between nodes. 
InfiniBand architecture, however, provides OS-bypass features such as the communication processing 
duties and RDMA operations as core capabilities and offers greater adaptability through a variety of 
services and protocols. 
While the majority of existing InfiniBand clusters operate on the Linux platform, drivers and HCA 
stacks are also available for Microsoft® Windows®, HP-UX, Solaris, and other operating systems 
from various InfiniBand hardware and software vendors. 
The layered software architecture of the HCA allows writing code without specific hardware in mind. 
The functionality of an HCA is defined by its verb set, which is a table of commands used by the 
application programming interface (API) of the operating system being run. A number of services and 
software protocols are available (Figure 4) and, depending on type, can be implemented from user 
space or from the kernel.  

Содержание 489183-B21 - InfiniBand DDR Switch

Страница 1: ... 7 IPoIB 7 RDMA based protocols 7 RDS 8 InfiniBand hardware architecture 8 Link operation 9 Scale out clusters built on InfiniBand and HP technology 11 Conclusion 13 Appendix A Glossary 14 For more information 15 Call to action 15 Using InfiniBand for a scalable compute infrastructure technology brief 3rd edition ...

Страница 2: ...InfiniBand based scale out architectures Introduction The overall performance of enterprise servers is determined by the synergetic relationship between three main subsystems processing memory and input output The multiprocessor architecture used in the latest single server systems Figure 1 provides a high degree of parallel processing capability However multiprocessor server architecture cannot s...

Страница 3: ...t with an interconnect technology that scales easily reliably and economically with system expansion Ethernet is a pervasive mature interconnect technology that can be cost effective for some application workloads The emergence of 10 Gigabit Ethernet 10GbE offers a cluster interconnect that meets higher bandwidth requirements than 1GbE can provide However 10GbE still lags the latest InfiniBand tec...

Страница 4: ...data payloads and encapsulates each data payload and an identifier of the destination node into one or more packets Packets can contain data payloads of up to four kilobytes The packets are passed to the network layer which selects a route to the destination node and if necessary attaches the route information to the packets The data link layer attaches a local identifier LID to the packet for com...

Страница 5: ...an communicate with an HCA or an InfiniBand switch An InfiniBand switch provides scalability by allowing a number of HCAs TCAs and other IB switches to connect to an InfiniBand infrastructure The switch handles network traffic by checking the local link header of each data packet received and forwarding the packet to the proper destination The most basic InfiniBand infrastructure will consist of h...

Страница 6: ...r APIs uDAPL SDP Library MAD API Open Fabrics Verbs CMA and API IPoIB Upper Level Protocols Provider User space Kernel space Hardware Specific Driver B Hardware MPIs Open SM Application Level Mid Layer Modules Connection Manager SA Client MAD Services Diag Tools RDS RDMA based Protocols kDAPL SDP SRP iSER NFS InfiniBand HCA Open Fabrics Verbs and API SMA Clustered DB VNIC IP Based Access Sockets B...

Страница 7: ...ications such as PING FTP and TELNET IPoIB does not support the RDMA features of InfiniBand Communication between IB nodes using IPoIB and Ethernet nodes using IP will require a gateway router interface RDMA based protocols DAPL The Direct Access Programming Library DAPL allows low latency RDMA communications between nodes The uDAPL provides user level access to RDMA functionality on InfiniBand wh...

Страница 8: ... 10 Gbps Bandwidth is increased by adding more lanes per link InfiniBand interconnect types include 1x 4x or 12x wide full duplex links Figure 5 The 4x is the most popular configuration and provides a theoretical full duplex QDR bandwidth of 80 2 x 40 gigabits per second Figure 5 InfiniBand link types Encoding overhead in the data transmission process limits the maximum data bandwidth per link to ...

Страница 9: ...Figure 7 Each virtual lane provides flow control and allows a pair of devices to communicate autonomously Typical implementations have each link accommodating eight lanes1 one lane is reserved for fabric management and the other lanes for packet transport The virtual lane design allows an InfiniBand link to share bandwidth between various sources and targets simultaneously For example if a 10Gb s ...

Страница 10: ... interpreted These protocols can be implemented in hardware some protocols more efficient than others The UD and Raw protocols for instance are basic datagram movers and may require system processor support depending on the ULP used When the reliable connection protocol is operating Figure 8 hardware at the source generates packet sequence numbers for every packet sent and the hardware at the dest...

Страница 11: ...ignificant savings in power cooling and data center floor space without compromising performance The c7000 enclosure supports up to 16 half height or 8 full height server blades and includes rear mounting bays for management and interconnect components Each server blade includes mezzanine connectors for I O options such as the HP 4x QDR IB mezzanine card HP c Class server blades are available in t...

Страница 12: ...rms are built around specific hardware and software platforms and offer a choice of interconnects For example the HP Cluster Platform CL3000BL uses the HP BL2x220c G5 BL280c G6 and BL460c blade servers as the compute node with a choice of GbE or InfiniBand interconnects No longer unique to Linux or HP UX environments HPC clustering is now supported through Microsoft Windows Server HPC 2003 with na...

Страница 13: ... ISVs for developing and running MPI based applications across multiple platforms and interconnect types Software development and support become simplified since interconnects from a variety of vendors can be supported by an application written to the HP MPI protocol Parallel compute applications that involve a high degree of message passing between nodes benefit significantly from InfiniBand HP B...

Страница 14: ...ons in parallel compute systems NFS Network File System file storage protocol NHP Non Hot Pluggable drive QDR Quad Data Rate for InfiniBand clock rate of 10 Gbps 2 5 Gbps x 4 QSFP Quad Small Form factor Pluggable interconnect connector type RDMA Remote Direct Memory Access protocol allowing data movement in and out of system memory without CPU intervention RDS Reliable Datagram Sockets transport p...

Страница 15: ...adeSystem http h18004 www1 hp com products blades compone nts c class tech function html Call to action Send comments about this paper to TechCom HP com 2009 Hewlett Packard Development Company L P The information contained herein is subject to change without notice The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and serv...

Отзывы: