Sun Microelectronics
234
UltraSPARC User’s Manual
Code Example 13-5 Byte-Aligned Block Copy Inner Loop
Note that the loop must be unrolled two times to achieve maximum
performance. All FP registers are double-precision. Eight versions of
this loop are needed to handle all the cases of double word
misalignment between the source and destination.
loop:
faligndata
%f0, %f2, %f34
faligndata
%f2, %f4, %f36
faligndata
%f4, %f6, %f38
faligndata
%f6, %f8, %f40
faligndata
%f8, %f10, %f42
faligndata
%f10, %f12, %f44
faligndata
%f12, %f14, %f46
addcc
l0, -1, l0
bg,pt
l1
fmovd
%f14, %f48
(end of loop handling)
l1: ldda
[regaddr] ASI_BLK_P, %f0
stda
%f32, [regaddr] ASI_BLK_P
faligndata
%f48, %f16, %f32
faligndata
%f16, %f18, %f34
faligndata
%f18, %f20, %f36
faligndata
%f20, %f22, %f38
faligndata
%f22, %f24, %f40
faligndata
%f24, %f26, %f42
faligndata
%f26, %f28, %f44
faligndata
%f28, %f30, %f46
addcc
l0, -1, l0
be,pnt
done
fmovd
%f30, %f48
ldda
[regaddr] ASI_BLK_P, %f16
stda
%f32, [regaddr] ASI_BLK_P
ba
loop
faligndata
%f48, %f0, %f32
done:
(end of loop processing)
Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com