Cryptographic Performance on the
2
nd
Generation Intel® Core™ processor family
7
processors based on Intel® microarchitecture code named Westmere, we
maximize performance with 4 buffers.
Performance
The performance results provided in this section were measured on two
processors supporting Intel
®
AES-NI:
•
Intel
®
Core™ i5-650 processor at a frequency of 3.20 GHz, based on
the
Intel® microarchitecture code named Westmere
•
Intel® Core™ i7-2600 processor at a frequency of 3.40 GHz, based on
the
2
nd
Generation Intel® Core™ processor family
The fact that the two processors have different clock frequencies does not
affect the comparison, as we have normalized the performance results to
cycles.
The tests, conducted by Intel, were run with Intel
®
Turbo Boost Technology
off, and represent the performance without Intel
®
Hyper-Threading
Technology (Intel
®
HT Technology) on a single core.
The multi-buffer code bases for AES, MD5, SHA1, SHA256 were measured on
64-byte fixed size data buffers without a scheduler. The AES-128 CBC Encrypt
implementation used pre-expanded keys. AES using 128-bit keys is a
common usage; the results will be similar and scale for other key sizes such
as 192 and 256 bit keys.
The modular exponentiation code bases were measured on 512-bit and 1024-
bit keys. These algorithms form the basis of one of the most critical Server
workloads, the RSA Signing Algorithm. RSA Sign performs a decryption
process that can be implemented efficiently with the Chinese Remainder
Theorem (CRT). As a result of the CRT implementation, RSA2048 requires
1024-bit modular exponentiation and RSA1024 requires 512-bit modular
exponentiation.
We present results here with warm data. When a test for warm data is called,
it is first run numerous times to warm up the cache.
The timing is measured using the rdtsc() function which returns the
processor time stamp counter (TSC). The TSC is the number of clock cycles
since the last reset. The ‘TSC_initial’ is the TSC recorded before the function
is called. Then, the function is called for the specified number of times for
data buffers of a given size. After the runs are complete, the rdtsc() is called
again to record the new cycle count ’TSC_final’. The effective cycle count for
the called routine is computed using
# of cycles = (TSC_final-TSC_initial)/(number of iterations).