
IA-32 Instruction Latency and Throughput
C
C-3
While several items on the above list involve selecting the right
instruction, this appendix focuses on the following issues. These are
listed in an expected priority order, though which item contributes most
to performance will vary by application.
•
Maximize the flow of
μ
ops into the execution core. IA-32
instructions which consist of more than four
μ
ops require additional
steps from microcode ROM. These instructions with longer
μ
op
flows incur a delay in the front end and reduce the supply of uops to
the execution core. In Pentium 4 and Intel Xeon processors,
transfers to microcode ROM often reduce how efficiently
μ
ops can
be packed into the trace cache. Where possible, it is advisable to
select instructions with four or fewer
μ
ops. For example, a 32-bit
integer multiply with a memory operand fits in the trace cache
without going to microcode, while a 16-bit integer multiply to
memory does not.
•
Avoid resource conflicts. Interleaving instructions so that they don’t
compete for the same port or execution unit can increase
throughput. For example, alternating
PADDQ
and
PMULUDQ
, each have
a throughput of one issue per two clock cycles. When interleaved,
they can achieve an effective throughput of one instruction per cycle
because they use the same port but different execution units.
Selecting instructions with fast throughput also helps to preserve
issue port bandwidth, hide latency and allows for higher software
performance.
•
Minimize the latency of dependency chains that are on the critical
path. For example, an operation to shift left by two bits executes
faster when encoded as two adds than when it is encoded as a shift.
If latency is not an issue, the shift results in a denser byte encoding.
http://developer.intel.com/software/products/index.htm
include the VTune Performance Analyzer, with its performance-
monitoring capabilities.
Содержание ARCHITECTURE IA-32
Страница 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Страница 220: ...IA 32 Intel Architecture Optimization 3 40...
Страница 434: ...IA 32 Intel Architecture Optimization 9 20...
Страница 514: ...IA 32 Intel Architecture Optimization B 60...
Страница 536: ...IA 32 Intel Architecture Optimization C 22...
Страница 560: ...IA 32 Intel Architecture Optimization E 14...