![Intel NIOS II Owner Reference Manual Download Page 124](http://html1.mh-extra.com/html/intel/nios-ii/nios-ii_owner-reference-manual_2071826124.webp)
ALU Option
Hardware Details
Cycles per
Instruction
Result Latency
Cycles
Supported
Instructions
32-bit multiplier
ALU includes 32 x 32-bit
multiplier
1
+2
mul
,
muli
,
mulxss
,
mulxsu
,
mulxuu
16-bit multiplier
ALU includes 3 16 x 16-bit
multiplier
1
+2
mul
,
muli
16-bit multiplier
ALU includes 4 16 x 16-bit
multiplier
2
+2
mul
,
muli
,
mulxss
,
mulxsu
,
mulxuu
Hardware divide
ALU includes SRT Radix-2
divide circuit
35
+2
div
,
divu
The cycles per instruction value determines the maximum rate at which the ALU can
dispatch instructions and produce each result. The latency value determines when the
result becomes available. If there is no data dependency between the results and
operands for back-to-back instructions, then the latency does not affect throughput.
However, if an instruction depends on the result of an earlier instruction, then the
processor stalls through any result latency cycles until the result is ready.
In the following code example, a multiply operation (with 1 instruction cycle and 2
result latency cycles) is followed immediately by an add operation that uses the result
of the multiply. On the Nios II/f core, the
addi
instruction, like most ALU instructions,
executes in a single cycle. However, in this code example, execution of the
addi
instruction is delayed by two additional cycles until the multiply operation completes.
mul r1, r2, r3 ; r1 = r2 * r3
addi r1, r1, 100 ; r1 = r1 + 100 (Depends on result of mul)
In contrast, the following code does not stall the processor.
mul r1, r2, r3 ; r1 = r2 * r3
or r5, r5, r6 ; No dependency on previous results
or r7, r7, r8 ; No dependency on previous results
addi r1, r1, 100 ; r1 = r1 + 100 (Depends on result of mul)
5.2.2.2. Shift and Rotate Performance
The performance of shift operations depends on the hardware multiply option. When a
hardware multiplier is present, the ALU achieves shift and rotate operations in three or
four clock cycles. Otherwise, the ALU includes dedicated shift circuitry that achieves
one-bit-per-cycle shift and rotate performance.
Related Information
Instruction Performance
on page 130
5.2.3. Memory Access
The Nios II/f core provides optional instruction and data caches. The cache size for
each is user-definable, between 512 bytes and 64 KB.
5. Nios II Core Implementation Details
NII-PRG | 2018.04.18
Nios II Processor Reference Guide
124