
Efficient Implementation of Population Count Function
91
22007E/0—November 1999
AMD Athlon™ Processor x86 Code Optimization
Efficient Implementation of Population Count Function
Population count is an operation that determines the number of
set bits in a bit string. For example, this can be used to
determine the cardinality of a set. The following example code
shows how to efficiently implement a population count
operation for 32-bit operands. The example is written for the
inline assembler of Microsoft Visual C.
Function popcount() implements a branchless computation of
the population count. It is based on a O(log(n)) algorithm that
successively groups the bits into groups of 2, 4, 8, 16, and 32,
while maintaining a count of the set bits in each group. The
algorithms consist of the following steps:
Step 1
Partition the integer into groups of two bits. Compute the
population count for each 2-bit group and store the result in the
2-bit group. This calls for the following transformation to be
performed for each 2-bit group:
00b -> 00b
01b -> 01b
10b -> 01b
11b -> 10b
If the original value of a 2-bit group is v, then the new value will
be v - (v >> 1). In order to handle all 2-bit groups simultaneously,
it is necessary to mask appropriately to prevent spilling from
one bit group to the next lower bit group. Thus:
w = v - ((v >> 1) & 0x55555555)
Step 2
Add the population count of adjacent 2-bit group and store the
sum to the 4-bit group resulting from merging these adjacent
2-bit groups. To do this simultaneously to all groups, mask out
the odd numbered groups, mask out the even numbered groups,
and then add the odd numbered groups to the even numbered
groups:
x = (w & 0x33333333) + ((w >> 2) & 0x33333333)
Each 4-bit field now has value 0000b, 0001b, 0010b, 0011b, or
0100b.
Содержание Athlon Processor x86
Страница 1: ...AMD Athlon Processor x86 Code Optimization Guide TM...
Страница 12: ...xii List of Figures AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 16: ...xvi Revision History AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 60: ...44 Code Padding Using Neutral Code Fillers AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 92: ...76 Push Memory Data Carefully AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 122: ...106 Take Advantage of the FSINCOS Instruction AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 156: ...140 AMD Athlon Processor Microarchitecture AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 176: ...160 Write Combining Operations AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 202: ...186 Page Attribute Table PAT AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 252: ...236 VectorPath Instructions AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...
Страница 256: ...240 Index AMD Athlon Processor x86 Code Optimization 22007E 0 November 1999...