![Digital Equipment Alpha 21164PC Hardware Reference Manual Download Page 55](http://html.mh-extra.com/html/digital-equipment/alpha-21164pc/alpha-21164pc_hardware-reference-manual_2498508055.webp)
29 September 1997 – Subject To Change
Internal Architecture
2–25
Scheduling and Issuing Rules
1
The multiplier is unable to receive data from IEU bypass paths. The instruction issues at the expected time,
but its latency is increased by the time it takes for the input data to become available to the multiplier. For
example, an IMULL instruction issued one cycle later than an ADDL instruction, which produced one of its
operands, has a latency of 10 (8 + 2). If the IMULL instruction is issued two cycles later than the ADDL
instruction, the latency is 9 (8 + 1).
2
When idle, Bcache arbitration predicts a load miss in E0. If a load actually does miss in E0, it is sent to the
Bcache immediately. If it hits in the Bcache, and no other event in the CBU affects the operation, the requested
data is available for use in 10 or more cycles. Otherwise, the request takes longer (possibly much longer,
depending on the state of the CBU and memory). It should be possible to schedule some unrolled code loops
for Bcache by prefetching data into the Dcache using LDQ R31, x(Rx).
3
A special bypass provides an effective latency of 0 (zero) cycles for an ICMP or ILOG instruction producing
the test operand of an IBR or CMOV instruction. This is true only when the IBR or CMOV instruction issues
in the same cycle as the ICMP or ILOG instruction that produced the test operand of the IBR or CMOV
instruction. In all other cases, the effective latency of ICMP and ILOG instructions is 1 cycle.
IMULQ Latency=12, plus up to 2 cycles of added latency, depending on
the source of the data.
Latency until next IMULL, IMULQ, or
IMULH instruction can issue (if there are no data dependencies) is
8 cycles plus the number of cycles added to the latency.
1 cycle
IMULH Latency=14, plus up to 2 cycles of added latency, depending on
the source of the data.
Latency until next IMULL, IMULQ, or
IMULH instruction can issue (if there are no data dependencies) is
8 cycles plus the number of cycles added to the latency.
1 cycle
MVI
Latency=2.
1 cycle
FADD
Latency=4.
—
FDIV
Data-dependent latency: 15 to 31 single precision, 22 to 60 double
precision. Next floating divide can be issued in the same cycle.
The result of the previous divide is available, regardless of data
dependencies.
—
FMUL
Latency=4.
—
FCYPS Latency=4.
—
MISC
RPCC, latency=2. TRAPB produces no result.
1 cycle
UNOP
UNOP produces no result.
—
Table 2–9 Instruction Latencies
(Sheet 2 of 2)
Class
Latency
Additional Time Before
Result Available to
Integer Multiply Unit
1