Sun Microelectronics
297
17. Grouping Rules and Stalls
MOVcc based on a floating-point condition code can be in the same group as an
FCMP{E}{s,d}, however, if they reference different condition codes. For example:
Latencies between dependent floating-point and graphics instructions are shown
in Table 17-1, “Latencies for Floating-Point and Graphics Instructions,” on
page 300. Latencies depend on the instruction generating the result (use the left
column of the table to select a row) and the operation using the result (use the
top row of the table to select a column). For example:
FDIV{s,d}
,
FSQRT{s,d}
, block load, block store,
ST{X}FSR,
and
LD{X}FSR
instructions
wait in the G Stage for the remaining latency of the previous divide or square
root, even if there is no data dependency. An FGA or FGM instruction (see
Table 17-1) that first enters the G Stage one cycle before an
FDIV
or
FSQRT
depen-
dent instruction would be released will be held for one clock, regardless of data
dependency.
FDIV
and
FSQRT
use the floating-point multiplier for final rounding, so an
M-Class operation cannot be dispatched in the third clock before the divide is fin-
ished. A load use stall that occurs in the third or fourth clock before normal di-
vide completion will delay completion by a corresponding amount.
FDIV
and
FSQRT
stall earlier instructions with the same rd (including floating-
point loads) for the same time as a source register dependency.
Graphics instructions,
FdTOi
,
FxTOs
,
FdTOs
,
FDIVs,
and
FSQRTs
lock the double-
precision register containing the single-precision result for data dependency
checking. For example:
FCMP
fcc0, f2, f4
G
E
C
N
1
N
2
N
3
W
MOVcc
fcc1, f6, f8
G
E
C
N
1
N
2
N
3
W
FADDs
f2, f3,
f0
G
E
C
N
1
N
2
N
3
W
FMULs f6, f1, f2
G
E
C
N
1
N
2
N
3
FADDs
f2, f3,
f0
G
E
C
N
1
N
2
N
3
W
FMOVs f6,f1,f2
G
E
C
N
1
N
2
FORs
f2, f4,
f0
G
E
C
N
1
N
2
N
3
W
FANDs f1, f1, f1
G
E
C
N
1
N
2
N
3
W
Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com