Revision 1.0
Performance Tips
129
for loops
Programming constructs like:
for (i=0; i<n; i++) {}
perform the same thing on a bunch of data. This is
exactly a “vector” operation.
conversely,
switch
Programming constructs which separate data
(
switch(), if()
), performing different tasks in
different data situations do not vectorize well.
scalar arithmetic
General “bookkeeping” code, which increments a
counter, manipulates a pointer, etc. This kind of code is
usually bad because they are unique operations. (there
is a formal description of this: essentially there is a
“number of items”, below which it does not pay to use
vector operations. This has to do with vectorization
setup and pipeline priming.)
pointer de-reference
For most vectorizing
C
compilers, accessing data
through pointer de-references is hard for vector
processors. Constructs like “
a[b.x].value
” are
preferred to pointer usage like “
b->x->value
”. This is
because computing structure offsets is a simple
addition, rather than another memory access. (This is
not a not a major point for the RSP, as we lack a
vectorizing
C
compiler)
There is another important lesson worth mentioning from the body of
previous vectorization work. Most of the recent efforts in compiler design
and high-level software engineering for SIMD systems are designed to be
scalable
; as more vector units are added, performance improves. Lots of
recent work has been applied to developing good vectorizing compilers
1
. In
those efforts, the focus has been to automatically distribute the data across
the vector units and minimize vectorization start-up costs, letting the
programmer not really worry about the number of vector elements. This is
an important difference from our situation for two reasons: (1) we are
programming at a much lower level. We know how many vector elements
1
For a good introduction and references to further reading, consult Hennessy, J., Patterson, D., “Computer
Architecture, A Quantitative Approach”, Morgan Kauffmann Publishers, 1990, ISBN 1-55880-069-8.
Summary of Contents for Ultra64
Page 2: ...2 ...
Page 10: ...10 ...
Page 12: ...12 Figure 6 2 buildtask Operation 137 ...
Page 14: ...14 ...
Page 80: ...80 Vector Unit Instructions vmadm dres_int dres_int vconst 3 vmadn dres_frac vconst vconst 0 ...
Page 104: ...104 RSP Coprocessor 0 ...
Page 150: ...150 Advanced Information ...
Page 155: ...Revision 1 0 155 ...
Page 248: ...248 Exceptions None ...
Page 251: ...Revision 1 0 251 Exceptions None ...
Page 254: ...254 Exceptions None ...
Page 257: ...Revision 1 0 257 Exceptions None ...
Page 293: ...Revision 1 0 293 Exceptions None ...
Page 316: ...316 Exceptions None ...