Developer’s Manual
January, 2004
215
Intel XScale® Core
Developer’s Manual
Optimization Guide
A.5.5
Scheduling the MRA and MAR Instructions (MRRC/MCRR)
The MRA (MRRC) instruction has an issue latency of 1 cycle, a result latency of 2 or 3 cycles
depending on the destination register value being accessed and a resource latency of 2 cycles.
Consider the code sample:
mra r6, r7, acc0
mra r8, r9, acc0
add r1, r1, #1
The code shown above would incur a 1-cycle stall due to the 2-cycle resource latency of an MRA
instruction. The code can be rearranged as shown below to prevent this stall.
mra r6, r7, acc0
add r1, r1, #1
mra r8, r9, acc0
Similarly, the code shown below would incur a 2 cycle penalty due to the 3-cycle result latency for
the second destination register.
mra r6, r7, acc0
mov r1, r7
mov r0, r6
add r2, r2, #1
The stalls incurred by the code shown above can be prevented by rearranging the code:
mra r6, r7, acc0
add r2, r2, #1
mov r0, r6
mov r1, r7
The MAR (MCRR) instruction has an issue latency, a result latency, and a resource latency of 2
cycles. Due to the 2-cycle issue latency, the pipeline would always stall for 1 cycle following a
MAR instruction. The use of the MAR instruction should, therefore, be used only where
absolutely necessary.