IA-32 Intel® Architecture Optimization
2-86
In some situations, the byte count of the data to operate is known by the
context (versus from a parameter passed from a call). One can take a
simpler approach than those required for a general-purpose library
routine. For example, if the byte count is also small, using rep
movsb/stosb with count less than four can ensure good address
alignment and loop-unrolling to finish the remaining data; using
movsd/stosd can reduce the overhead associated with iteration.
Using a REP prefix with string move instructions can provide high
performance in the situations described above. However, using a REP
prefix with string scan instructions (scasb, scasw, scasd, scasq) or
compare instructions (cmpsb, cmpsw, smpsd, smpsq) is not
recommended for high performance. Consider using SIMD instructions
instead.
Address Calculations
Use the addressing modes for computing addresses rather than using the
general-purpose computation. Internally, memory reference instructions
can have four operands:
•
relocatable load-time constant
•
immediate constant
•
base register
•
scaled index register
In the segmented model, a segment register may constitute an additional
operand in the linear address calculation. In many cases, several integer
instructions can be eliminated by fully using the operands of memory
references.
Summary of Contents for ARCHITECTURE IA-32
Page 1: ...IA 32 Intel Architecture Optimization Reference Manual Order Number 248966 013US April 2006...
Page 220: ...IA 32 Intel Architecture Optimization 3 40...
Page 434: ...IA 32 Intel Architecture Optimization 9 20...
Page 514: ...IA 32 Intel Architecture Optimization B 60...
Page 536: ...IA 32 Intel Architecture Optimization C 22...