IA-32 Intel® Architecture Optimization
C-6
Latency and Throughput with Register Operands
IA-32 instruction latency and throughput data are presented in
Table C-2 through Table C-8. The tables include the Streaming SIMD
Extension 3, Streaming SIMD Extension 2, Streaming SIMD Extension,
MMX technology and most of commonly used IA-32 instructions.
Instruction latency and throughput of the Pentium 4 processor and of the
Pentium M processor are given in separate columns. Pentium 4
processor instruction timing data is implementation specific, i.e. can
vary between model encoding value = 3 and model < 2. Separate data
sets of instruction latency and throughput are shown in the columns for
CPUID signature 0xF2n and 0xF3n. The notation 0xF2n represents the
hex value of the lower 12 bits of the EAX register reported by CPUID
instruction with input value of EAX = 1; ‘F’ indicates the family
encoding value is 15, ‘2’ indicates the model encoding is 2, ‘n’ indicates
it applies to any value in the stepping encoding. Pentium M processor
instruction timing data is shown in the columns represented by CPUID
signature 0x69n. The instruction timing for Pentium M processor with
CPUID signature 0x6Dn is the same as that of 0x69n.
Table C-1
Streaming SIMD Extension 3 SIMD Floating-point Instructions
Instruction
Latency
1
Throughput
Execution Unit
CPUID
0F3n
0F3n
0F3n
ADDSUBPD/ADDSUBPS 5
2
FP_ADD
HADDPD/HADDPS
13
4
FP_ADD,FP_MISC
HSUBPD/HSUBPS
13
4
FP_ADD,FP_MISC
MOVDDUP xmm1, xmm2
4
2
FP_MOVE
MOVSHDUP xmm1,
xmm2
6
2
FP_MOVE
MOVSLDUP xmm1,
xmm2
6
2
FP_MOVE
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...