
GR716-DS-UM, May 2019, Version 1.29
122
www.cobham.com/gaisler
GR716
Table 114 lists the cycles per instruction (assuming local instruction and data memory are used):
1
Assuming instruction in JMPL delay slot takes one cycle. Additional cycles spent in the delay slot will reduce the effective
time of the JMPL to 2 or 1.
A number of conditions can extend an instruction’s duration in the pipeline:
Branch interlock:
When a conditional branch or trap is performed 1-2 cycles after an instruction
which modifies the condition codes, 1-2 cycles of delay is added to allow the condition to be com-
puted. If static branch prediction is enabled, this extra delay is incurred only if the branch is not taken.
Load delay:
When using data resulting on a load shortly after the load, the instruction will be delayed
to satisfy the pipeline’s load delay. The processor pipeline is configured for one cycles load delay.
Hold cycles:
When blocking on the store buffer, the pipeline will be held still until the data is ready,
effectively extending the execution time of the instruction causing the miss by the corresponding
number of cycles. Note that since the whole pipeline is held still, hold cycles will not mask load delay
or interlock delays.
FPU/Coprocessor:
The floating-point unit or coprocessor may need to hold the pipeline or extend a
specific instruction. When this is done is specific to the FP/CP unit.
16.2.3 SPARC Implementor’s ID
Cobham Gaisler is assigned number 15 (0xF) as SPARC implementor’s identification. This value is
hard-coded into bits 31:28 in the %psr register. The version number for LEON3 is 3, which is hard-
coded in to bits 27:24 of the %psr.
16.2.4 Divide instructions
Full support for SPARC V8 divide instructions is provided (SDIV, UDIV, SDIVCC & UDIVCC). The
divide instructions perform a 64-by-32 bit divide and produce a 32-bit result. Rounding and overflow
detection is performed as defined in the SPARC V8 manual.
16.2.5 Multiply instructions
The LEON processor supports the SPARC integer multiply instructions UMUL, SMUL UMULCC
and SMULCC. These instructions perform a 32x32-bit integer multiply, producing a 64-bit result.
SMUL and SMULCC performs signed multiply while UMUL and UMULCC performs unsigned
multiply. UMULCC and SMULCC also set the condition codes to reflect the result. The multiply
instructions are performed using a 16x16 hardware multiplier which is iterated four times.
Table 114.
Instruction timing
Instruction
Cycles
JMPL
3
1
JMPL,RETT pair
4
Double load
2
Single store
2
Double store
3
SMUL/UMUL
4
SDIV/UDIV
35
Taken Trap
5
Atomic load/store
3
All other instructions
1