
Coding for SIMD Architectures
3
3-11
specific optimizations. Where appropriate, the coach displays
pseudo-code to suggest the use of highly optimized intrinsics and
functions in the Intel
®
Performance Library Suite. Because VTune
analyzer is designed specifically for all of the Intel architecture
(IA)-based processors, including the Pentium 4 processor, it can offer
these detailed approaches to working with IA. See “Code Optimization
Options” in Appendix A for more details and example of a code coach
advice.
Determine If Code Benefits by Conversion to SIMD Execution
Identifying code that benefits by using SIMD technologies can be
time-consuming and difficult. Likely candidates for conversion are
applications that are highly computation intensive, such as the
following:
•
speech compression algorithms and filters
•
speech recognition algorithms
•
video display and capture routines
•
rendering routines
•
3D graphics (geometry)
•
image and video processing algorithms
•
spatial (3D) audio
•
physical modeling (graphics, CAD)
•
workstation applications
•
encryption algorithms
•
complex arithmetics
Generally, good candidate code is code that contains small-sized
repetitive loops that operate on sequential arrays of integers of 8, 16 or
32 bits, single-precision 32-bit floating-point data, double precision
64-bit floating-point data (integer and floating-point data items should
be sequential in memory). The repetitiveness of these loops incurs
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...