Refining C/C++ Code
3-31
Optimizing C/C++ Code
Example 3–15. Float Dot Product Using Intrinsics
float dotprod2(const double a[restrict], const double b[restrict])
{
int i;
float sum0 = 0;
float sum1 = 0;
for (i=0; i<512/2; i++)
{
sum0 += _itof(_hi(a[i])) * _itof(_hi(b[i]));
sum1 += _itof(_lo(a[i])) * _itof(_lo(b[i]));
}
return sum0 + sum1;
}
#pragma DATA_ALIGN(a, 8);
#pragma DATA_ALIGN(b,8);
float ret_val, a[SIZE_A], b[SIZE_B];
void main()
{
ret_val = dotprod2((double *)a, (double *)b);
}
In Example 3–16, the dot product example is unrolled to maximize perfor-
mance. The preprocessor is used to define convenient macros FHI() and
FLO() for accessing the high and low 32-bit values in a double word. In this
version of the loop, 8 float values are computed every 4 cycles.