Optimizing for SIMD Floating-point Applications
5
5-7
To utilize all 4 computation slots, the vertex data can be reorganized to
allow computation on each component of 4 separate vertices, that is,
processing multiple vectors simultaneously. This can also be referred to
as an SoA form of representing vertices data shown in Table 5-1.
Organizing data in this manner yields a unique result for each
computational slot for each arithmetic operation.
Vertical computation takes advantage of the inherent parallelism in 3D
geometry processing of vertices. It assigns the computation of four
vertices to the four compute slots of the Pentium
III
processor, thereby
eliminating the disadvantages of the horizontal approach described
earlier (using SSE alone). The dot product operation implements the
SoA representation of vertices data. A schematic representation of dot
product operation is shown in Figure 5-2.
Table 5-1
SoA Form of Representing Vertices Data
Vx array
X1
X2
X3
X4
.....
Xn
Vy array
Y1
Y2
Y3
Y4
.....
Yn
Vz array
Z1
Z2
Z3
Y4
.....
Zn
Vw array
W1
W2
W3
W4
.....
Wn
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...