324952
Overview
Cryptographic algorithms, such as secure hashing or encryption, occur in
networking, storage and other applications. Since the amount of data being
processed is large and increasing at a rapid rate, there is an ever-increasing
need for very high performance implementations of these algorithms. The
Intel® microarchitecture code named Westmere is capable of excellent
performance as demonstrated in [2], [3] and [4]. However, the introduction
of the 2
nd
Generation Intel® Core™ processor family brings an additional
substantial boost in performance on cryptographic algorithms.
We examine the performance of a representative set of algorithms such as
modular exponentiation which forms the basis of most public key
cryptographic protocols, AES Encryption in the CBC Mode for
private/symmetric key encryption, and the MD5, SHA1, SHA256 secure
hashing algorithms for authentication. To compare the performance of an
algorithm on the processor families, we use the most optimized
implementation of that algorithm for each processor.
Improving Cryptographic Processing
Modular exponentiation forms the basis of almost all the prevalent public key
algorithms currently, such as the RSA, DSA and Diffie-Hellman algorithms. In
[3], we demonstrate the fastest implementation of modular exponentiation on
Intel processors. The 2
nd
Generation Intel® Core™ processor family improves
the performance of the multiply and adc (add with carry) instructions,
resulting in much faster modular exponentiation. Our implementation is a
constant-time one, safe from branch and cache based side-channel attacks,
and in [3] we demonstrate that our implementation is substantially faster on
Intel processors than the best-known publicly available implementation in
OpenSSL.
The hashing and private key encryption can be implemented using the multi-
buffer technique described in [4] for best performance on Intel processors.
There are two basic ways that processing multiple buffers in parallel can
improve performance: processing the buffers with SIMD instructions or
processing multiple buffers in parallel to reduce data dependency limits. In
[4], we describe how using multi-buffer techniques result in the best
performance for AES Encryption and the SHA1 secure hashing algorithm,
compared to the best single-buffer implementations. This is also the case for
the MD5 and SHA256 algorithms which are included in this study. In [4], we
describe a job scheduler to manage multiple buffers of varying sizes, for a
fully generalized solution to the multi-buffer problem for all usage models.