14
Software Examples
14 – 11
}
.ENTRY
spmm;
spmm:
DO row_loop UNTIL CE;
I5=I6;
{I5 = start of Y}
CNTR=M5;
DO column_loop UNTIL CE;
I0=I2;
{Set I0 to current X row}
I4=I5;
{Set I4 to current Y col}
CNTR=M1;
MR=0, MX0=DM(I0,M0), MY0=PM(I4,M5); {Get 1st data}
DO element_loop UNTIL CE;
element_loop:
MR=MR+MX0*MY0 (SS), MX0=DM(I0,M0),
MY0=PM(I4,M5);
SR=ASHIFT MR1 (HI), MY0=DM(I5,M4);
{Update I5}
SR=SR OR LSHIFT MR0 (LO);
{Finish shift}
column_loop:
DM(I1,M0)=SR1;
{Save output}
row_loop:
MODIFY(I2,M1);
{Update I2 to next X row}
RTS;
.ENDMOD;
Listing 14.4 Single-Precision Matrix Multiply
14.7
RADIX-2 DECIMATION-IN-TIME FFT
The FFT program includes three subroutines. The first subroutine
scrambles the input data (places the data in bit-reversed address order), so
that the FFT output will be in the normal, sequential order. The next
subroutine computes the FFT and the third scales the output data to
maintain the block floating-point data format.
The program is contained in four modules. The main module declares and
initializes data buffers and calls subroutines. The other three modules
contain the FFT, bit reversal, and block floating-point scaling subroutines.
The main module calls the FFT and bit reversal subroutines. The FFT
module calls the data scaling subroutine.
The FFT is performed in place; that is, the outputs are written to the same
buffer that the inputs are read from.
14.7.1
Main Module
The dit_fft_main module is shown in Listing 14.5. N is the number of
points in the FFT (in this example, N=1024) and N_div_2 is used for
specifying the lengths of buffers. To change the number of points in the