INTEL
®
CELERON® PROCESSOR SPECIFICATION UPDATE
58
or IMUL AX, word ptr <memory address> (opcode 0F AF /r) or IMUL AX, BX, 16 (opcode 6B /r ib)
or IMUL AX, word ptr <memory address>, 16 (opcode 6B /r ib) or IMUL AX, 8 (opcode 6B /r ib)
or IMUL AX, BX, 1024 (opcode 69 /r iw)
or IMUL AX, word ptr <memory address>, 1024 (opcode 69 /r iw)
or IMUL AX, 1024 (opcode 69 /r iw) or CBW
3.
MOVD MM0, EAX or CVTSI2SS MM0, EAX
Note that the values for immediate byte/words are merely representative (i.e., 8, 16, 1024) and that any value
in the range for the size is affected. Also, note that this erratum may occur with “EAX” replaced with any 32-bit
general-purpose register, and “AX” with the corresponding 16-bit version of that replacement. “BL” or “BX” can
be replaced with any 8-bit or 16-bit general-purpose register. The CBW and IMUL (opcode F6 /5) instructions
are specific to the EAX register only.
In the above example, EAX is forced to contain 0 by the XOR or SUB instructions. Since the four types of the
MOVSX or IMUL instructions and the CBW instruction only modify bits 15:8 of EAX by sign extending the
lower 8 bits of EAX, bits 31:16 of EAX should always contain 0. This implies that when MOVD or CVTSI2SS
copies EAX to MM0, bits 31:16 of MM0 should also be 0. In certain scenarios, bits 31:16 of MM0 are not 0, but
are replicas of bit 15 (the 16th bit) of AX. This is noticeable when the value in AX after the MOVSX, IMUL or
CBW instruction is negative (i.e., bit 15 of AX is a 1).
When AX is positive (bit 15 of AX is 0), MOVD or CVTSI2SS will produce the correct answer. If AX is negative
(bit 15 of AX is 1), MOVD or CVTSI2SS may produce the right answer or the wrong answer, depending on the
point in time when the MOVD or CVTSI2SS instruction is executed in relation to the MOVSX, IMUL or CBW
instruction.
The PINSRW instruction can fail to correctly load a zero when used with a partial register zeroing instruction
(SUB or XOR):
1. mov di, 0FFFF8914h
2. xor eax, eax
3. add ax, di
4. xor ah, ah
5. pinsrw mm1, eax, 00h
In this case, the programmer expects mm1 to contain 0014h in it’s least significant word. This erratum would
cause MM1 to contain 8914h. The number of intervening instructions between steps 4 and 5 is the same as
noted in the sign extension example above between steps 2 and 3.
Implication:
The effect of incorrect execution will vary from unnoticeable, due to the code sequence
discarding the incorrect bits, to an application failure.
Workaround:
There are two possible workarounds for this erratum:
1. Rather than using the MOVSX-MOVD/CVTSI2SS, IMUL-MOVD/CVTSI2SS or CBW-MOVD/CVTSI2SS
pairing to handle one variable at a time, use the sign extension capabilities (PSRAW, etc.) within MMX
technology for operating on multiple variables. This will also result in higher performance.
2. Insert another operation that modifies or copies the sign-extended value between the MOVSX/IMUL/CBW
instruction and the MOVD or CVTSI2SS instruction as in the example below:
XOR EAX, EAX (or SUB EAX, EAX)
MOVSX AX, BL (or other MOVSX, other IMUL or CBW instruction)
*MOV
EAX,
EAX
MOVD MM0, EAX or CVTSI2SS MM0, EAX
3. Avoid using a sub or xor to zero a partial register prior to the use of any of these three instructions. Instead,
use a mov immediate (e.g. “mov ah, 0h”).