
Optimizing for SIMD Integer Applications
4
4-5
•
Don’t empty when already empty
: If the next instruction uses an
MMX register,
_mm_empty()
incurs a cost with no benefit.
•
Group Instructions:
Try to partition regions that use
x87 FP
instructions from those that use 64-bit SIMD integer instructions.
This eliminates needing an
emms
instruction within the body of a
critical loop.
•
Runtime initialization:
Use
_mm_empty()
during runtime
initialization of
__m64
and
x87 FP
data types. This ensures
resetting the register between data type transitions. See Example 4-1
for coding usage.
Further, you must be aware that your code generates an MMX
instruction, which uses the MMX registers with the Intel C++ Compiler,
in the following situations:
•
when using a 64-bit SIMD integer intrinsic from MMX technology,
SSE, or SSE2
•
when using a 64-bit SIMD integer instruction from MMX
technology, SSE, or SSE2 through inline assembly
•
when referencing an
__m64
data type variable
Additional information on the x87 floating-point programming model
can be found in the
IA-32 Intel® Architecture Software Developer’s
. For more documentation on
emms
, visit
Example 4-1
Resetting the Register between __m64 and FP Data Types
Incorrect Usage
Correct Usage
__m64 x = _m_paddd(y, z);
__m64 x = _m_paddd(y, z);
float f = init();
float f = (_mm_empty(), init());
Содержание ARCHITECTURE IA-32
Страница 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Страница 220: ...IA 32 Intel Architecture Optimization 3 40...
Страница 434: ...IA 32 Intel Architecture Optimization 9 20...
Страница 514: ...IA 32 Intel Architecture Optimization B 60...
Страница 536: ...IA 32 Intel Architecture Optimization C 22...
Страница 560: ...IA 32 Intel Architecture Optimization E 14...