1:206
Volume 1, Part 2: Floating-point Applications
6.2.2
Execution Bandwidth
When sufficient ILP exists and can be exploited, the performance limitation is the
availability of the execution resources – or the execution bandwidth of the machine.
Consider the dense matrix multiply kernel from the BLAS3 library.
DO 1 i = 1, N
DO 1 j = 1, P
DO 1 k = 1, M
1
C[i,j] = C[i,j] + A[i,k]*B[k,j]
Common techniques of loop interchange, loop unrolling, and unroll-and-jam, can be
used to increase the available ILP in the inner loop. When this is done, the inner loop
contains an abundance of independent floating-point computations with a relatively
small number of memory operations. The performance constraint is then largely the
floating-point execution bandwidth of the machine (assuming sufficient registers are
available to hold the accumulators –
C[i,j]
and the intermediate computations).
6.2.3
Memory Latency
While cycle time disparity between the processor and memory creates a general
memory latency problem for most codes, there are a few special conditions in
floating-point codes that exacerbate its impact.
One such condition is the use of indirect addressing. Gather/scatter codes in general
and sparse matrix vector multiply code (below) in particular are good examples.
DO 1 ROW = 1, N
R[ROW] = 0.0d0
DO 1 I = ROWEND(ROW-1)+1, ROWEND(ROW)
1
R[ROW] = R[ROW] + A[I] * X[COL[I]]
The memory latency of the access of
COL[I]
is exposed, since it is used to index into
the vector
X
. The access of the element of
X
, the computation of the product, and the
summation of the product on
R[ROW]
are all dependent on the memory latency of the
access of
COL[I]
.
Another common condition in floating-point codes where memory latency impact is
exacerbated is the presence of ambiguous memory dependencies. Consider the
incomplete Cholesky conjugate gradient excerpt kernel, again from the Livermore
Fortran Kernel suite.
II =
n
IPNTP =
0
222 IPNT = IPNTP
IPNTP = IPNTP + II
II =
II/2
I
= IPNTP + 1
cdir$ ivdep
DO 2 K = IPNT+2, IPNTP, 2
I =
I+1
2
X[I] = X[K] - V[K] * X[K-1] - V[K-1] * X[K+1]
IF (II .GT. 1) GO TO 222
Summary of Contents for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS VOLUME 3 REV 2.3
Page 1: ......
Page 11: ...x Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 13: ...1 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 33: ...1 22 Volume 1 Part 1 Introduction to the Intel Itanium Architecture ...
Page 57: ...1 46 Volume 1 Part 1 Execution Environment ...
Page 147: ...1 136 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 149: ...1 138 Volume 1 Part 2 About the Optimization Guide ...
Page 191: ...1 180 Volume 1 Part 2 Predication Control Flow and Instruction Stream ...
Page 230: ......
Page 248: ...236 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 250: ...2 2 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 264: ...2 16 Volume 2 Part 1 Intel Itanium System Environment ...
Page 380: ...2 132 Volume 2 Part 1 Interruptions ...
Page 398: ...2 150 Volume 2 Part 1 Register Stack Engine ...
Page 486: ...2 238 Volume 2 Part 1 IA 32 Interruption Vector Descriptions ...
Page 750: ...2 502 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 754: ...2 506 Volume 2 Part 2 About the System Programmer s Guide ...
Page 796: ...2 548 Volume 2 Part 2 Interruptions and Serialization ...
Page 808: ...2 560 Volume 2 Part 2 Context Management ...
Page 842: ...2 594 Volume 2 Part 2 Floating point System Software ...
Page 850: ...2 602 Volume 2 Part 2 IA 32 Application Support ...
Page 862: ...2 614 Volume 2 Part 2 External Interrupt Architecture ...
Page 870: ...2 622 Volume 2 Part 2 Performance Monitoring Support ...
Page 891: ......
Page 1099: ...3 200 Volume 3 Instruction Reference padd Interruptions Illegal Operation fault ...
Page 1295: ...3 396 Volume 3 Resource and Dependency Semantics ...
Page 1296: ......
Page 1302: ...402 Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1494: ...4 192 Volume 4 Base IA 32 Instruction Reference FWAIT Wait See entry for WAIT ...
Page 1647: ...Volume 4 Base IA 32 Instruction Reference 4 345 ROL ROR Rotate See entry for RCL RCR ROL ROR ...
Page 1884: ...4 582 Volume 4 IA 32 SSE Instruction Reference ...
Page 1885: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 Index ...
Page 1886: ...Index Intel Itanium Architecture Software Developer s Manual Rev 2 3 ...
Page 1898: ...INDEX Index 12 Index for Volumes 1 2 3 and 4 ...