background image

 

 
 
 

MPI 

The message passing interface (MPI) protocol is a library of calls used by applications in a parallel 
computing environment to communicate between nodes. MPI calls are optimized for performance in a 
compute cluster that takes advantage of high-bandwidth and low-latency interconnects. In parallel 
computing environments, code is executed across multiple nodes simultaneously. MPI facilitates the 
communication and synchronization among these jobs across the entire cluster. 
To take advantage of the features of MPI, an application must be written and compiled to include the 
libraries from the particular MPI implementation used. Several implementations of MPI are on the 
market:  

 

HP-MPI 

 

Intel MPI 

 

Publicly available versions such as MVAPICH2 and Open MPI 

MPI has become the de-facto IB ULP standard. In particular, HP-MPI has been accepted by more 
independent software vendors (ISVs) than any other commercial MPI. By using shared libraries, 
applications built on HP-MPI can transparently select interconnects that significantly reduce the effort 
for applications to support various popular interconnect technologies. HP-MPI is supported on HP-UX, 
Linux, True64 UNIX, and Microsoft Windows Compute Cluster Server 2003. 

IPoIB 

Internet Protocol over InfiniBand (IPoIB) allows the use of TCP/IP or UDP/IP-based applications 
between nodes connected to an InfiniBand fabric. IPoIB supports IPv4 or IPv6 protocols and 
addressing schemes. An InfiniBand HCA is configured through the operating system as a traditional 
network adapter that can use all of the standard IP-based applications such as PING, FTP, and 
TELNET. IPoIB does not support the RDMA features of InfiniBand. Communication between IB nodes 
using IPoIB and Ethernet nodes using IP will require a gateway/router interface. 

RDMA-based protocols 

DAPL - The Direct Access Programming Library (DAPL) allows low-latency RDMA communications 
between nodes. The uDAPL provides user-level access to RDMA functionality on InfiniBand while the 
kDAPL provides the kernel-level API. To use RDMA for the data transfers between nodes, applications 
must be written with a specific DAPL implementation. 

SDP – Sockets Direct Protocol (SDP) is an RDMA protocol that that operates from the kernel. 
Applications must be written to take advantage of the SDP interface. SDP is based on the WinSock 
Direct Protocol used by Microsoft server operating systems and is suited for connecting databases to 
application servers. 

SRP – SCSI RDMA Protocol (SRP) is a data movement protocol that encapsulates SCSI commands over 
InfiniBand for SAN networking. Operating from the kernel level, SRP allows copying SCSI commands 
between systems using RDMA for low-latency communications with storage systems. 

iSER – iSCSI Enhanced RDMA (iSER) is a storage standard originally specified on the iWARP RDMA 
technology and now officially supported on InfiniBand. The iSER protocol provides iSCSI 
manageability to RDMA storage operations. 

NFS – The Network File System (NFS) is a storage protocol that has evolved since its inception in the 
1980s, undergoing several generations of development while remaining network-independent. With 
the development of high-performance I/O such as PCIe and the significant advances in memory 
subsystems, NFS over RDMA on InfiniBand offers low-latency performance for transparent file sharing 
across different platforms. 

Содержание 489183-B21 - InfiniBand DDR Switch

Страница 1: ... 7 IPoIB 7 RDMA based protocols 7 RDS 8 InfiniBand hardware architecture 8 Link operation 9 Scale out clusters built on InfiniBand and HP technology 11 Conclusion 13 Appendix A Glossary 14 For more information 15 Call to action 15 Using InfiniBand for a scalable compute infrastructure technology brief 3rd edition ...

Страница 2: ...InfiniBand based scale out architectures Introduction The overall performance of enterprise servers is determined by the synergetic relationship between three main subsystems processing memory and input output The multiprocessor architecture used in the latest single server systems Figure 1 provides a high degree of parallel processing capability However multiprocessor server architecture cannot s...

Страница 3: ...t with an interconnect technology that scales easily reliably and economically with system expansion Ethernet is a pervasive mature interconnect technology that can be cost effective for some application workloads The emergence of 10 Gigabit Ethernet 10GbE offers a cluster interconnect that meets higher bandwidth requirements than 1GbE can provide However 10GbE still lags the latest InfiniBand tec...

Страница 4: ...data payloads and encapsulates each data payload and an identifier of the destination node into one or more packets Packets can contain data payloads of up to four kilobytes The packets are passed to the network layer which selects a route to the destination node and if necessary attaches the route information to the packets The data link layer attaches a local identifier LID to the packet for com...

Страница 5: ...an communicate with an HCA or an InfiniBand switch An InfiniBand switch provides scalability by allowing a number of HCAs TCAs and other IB switches to connect to an InfiniBand infrastructure The switch handles network traffic by checking the local link header of each data packet received and forwarding the packet to the proper destination The most basic InfiniBand infrastructure will consist of h...

Страница 6: ...r APIs uDAPL SDP Library MAD API Open Fabrics Verbs CMA and API IPoIB Upper Level Protocols Provider User space Kernel space Hardware Specific Driver B Hardware MPIs Open SM Application Level Mid Layer Modules Connection Manager SA Client MAD Services Diag Tools RDS RDMA based Protocols kDAPL SDP SRP iSER NFS InfiniBand HCA Open Fabrics Verbs and API SMA Clustered DB VNIC IP Based Access Sockets B...

Страница 7: ...ications such as PING FTP and TELNET IPoIB does not support the RDMA features of InfiniBand Communication between IB nodes using IPoIB and Ethernet nodes using IP will require a gateway router interface RDMA based protocols DAPL The Direct Access Programming Library DAPL allows low latency RDMA communications between nodes The uDAPL provides user level access to RDMA functionality on InfiniBand wh...

Страница 8: ... 10 Gbps Bandwidth is increased by adding more lanes per link InfiniBand interconnect types include 1x 4x or 12x wide full duplex links Figure 5 The 4x is the most popular configuration and provides a theoretical full duplex QDR bandwidth of 80 2 x 40 gigabits per second Figure 5 InfiniBand link types Encoding overhead in the data transmission process limits the maximum data bandwidth per link to ...

Страница 9: ...Figure 7 Each virtual lane provides flow control and allows a pair of devices to communicate autonomously Typical implementations have each link accommodating eight lanes1 one lane is reserved for fabric management and the other lanes for packet transport The virtual lane design allows an InfiniBand link to share bandwidth between various sources and targets simultaneously For example if a 10Gb s ...

Страница 10: ... interpreted These protocols can be implemented in hardware some protocols more efficient than others The UD and Raw protocols for instance are basic datagram movers and may require system processor support depending on the ULP used When the reliable connection protocol is operating Figure 8 hardware at the source generates packet sequence numbers for every packet sent and the hardware at the dest...

Страница 11: ...ignificant savings in power cooling and data center floor space without compromising performance The c7000 enclosure supports up to 16 half height or 8 full height server blades and includes rear mounting bays for management and interconnect components Each server blade includes mezzanine connectors for I O options such as the HP 4x QDR IB mezzanine card HP c Class server blades are available in t...

Страница 12: ...rms are built around specific hardware and software platforms and offer a choice of interconnects For example the HP Cluster Platform CL3000BL uses the HP BL2x220c G5 BL280c G6 and BL460c blade servers as the compute node with a choice of GbE or InfiniBand interconnects No longer unique to Linux or HP UX environments HPC clustering is now supported through Microsoft Windows Server HPC 2003 with na...

Страница 13: ... ISVs for developing and running MPI based applications across multiple platforms and interconnect types Software development and support become simplified since interconnects from a variety of vendors can be supported by an application written to the HP MPI protocol Parallel compute applications that involve a high degree of message passing between nodes benefit significantly from InfiniBand HP B...

Страница 14: ...ons in parallel compute systems NFS Network File System file storage protocol NHP Non Hot Pluggable drive QDR Quad Data Rate for InfiniBand clock rate of 10 Gbps 2 5 Gbps x 4 QSFP Quad Small Form factor Pluggable interconnect connector type RDMA Remote Direct Memory Access protocol allowing data movement in and out of system memory without CPU intervention RDS Reliable Datagram Sockets transport p...

Страница 15: ...adeSystem http h18004 www1 hp com products blades compone nts c class tech function html Call to action Send comments about this paper to TechCom HP com 2009 Hewlett Packard Development Company L P The information contained herein is subject to change without notice The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and serv...

Отзывы: