92
Efficient Implementation of Population Count Function
AMD Athlon™ Processor x86 Code Optimization
22007E/0—November 1999
Step 3
For the first time, the value in each k-bit field is small enough
that adding two k-bit fields results in a value that still fits in the
k-bit field. Thus the following computation is performed:
y = (x + (x >> 4)) & 0x0F0F0F0F
The result is four 8-bit fields whose lower half has the desired
sum and whose upper half contains "junk" that has to be
masked out. In a symbolic form:
x = 0aaa0bbb0ccc0ddd0eee0fff0ggg0hhh
x >> 4 = 00000aaa0bbb0ccc0ddd0eee0fff0ggg
sum = 0aaaWWWWiiiiXXXXjjjjYYYYkkkkZZZZ
Th e W W W W, X X X X , Y Y Y Y, a n d Z Z Z Z va l u e s a re t h e
interesting sums with each at most 1000b, or 8 decimal.
Step 4
The four 4-bit sums can now be rapidly accumulated by means
of a multiply with a "magic" multiplier. This can be derived
from looking at the following chart of partial products:
0p0q0r0s * 01010101 =
:0p0q0r0s
0p:0q0r0s
0p0q:0r0s
0p0q0r:0s
000pxxww:vvuutt0s
Here p, q, r, and s are the 4-bit sums from the previous step, and
vv is the final result in which we are interested. Thus, the final
result:
z = (y * 0x01010101) >> 24
Example:
unsigned int popcount(unsigned int v)
{
unsigned int retVal;
__asm {
MOV EAX, [v]
;v
MOV EDX, EAX
;v
SHR EAX, 1
;v >> 1
AND EAX, 055555555h ;(v >> 1) & 0x55555555
SUB EDX, EAX
;w = v - ((v >> 1) & 0x55555555)
MOV EAX, EDX
;w
SHR EDX, 2
;w >> 2
AND EAX, 033333333h ;w & 0x33333333
AND EDX, 033333333h ;(w >> 2) & 0x33333333
Summary of Contents for Athlon Processor x86
Page 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Page 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Page 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...