background image

 

 

324952 

Overview 

Cryptographic algorithms, such as secure hashing or encryption, occur in 

networking, storage and other applications. Since the amount of data being 

processed is large and increasing at a rapid rate, there is an ever-increasing 

need for very high performance implementations of these algorithms. The 

Intel® microarchitecture code named Westmere is capable of excellent 

performance as demonstrated in [2], [3] and [4]. However, the introduction 

of the 2

nd

 Generation Intel® Core™ processor family brings an additional 

substantial boost in performance on cryptographic algorithms. 

We examine the performance of a representative set of algorithms such as 

modular exponentiation which forms the basis of most public key 

cryptographic protocols, AES Encryption in the CBC Mode for 

private/symmetric key encryption, and the MD5, SHA1, SHA256 secure 

hashing algorithms for authentication. To compare the performance of an 

algorithm on the processor families, we use the most optimized 

implementation of that algorithm for each processor. 

Improving Cryptographic Processing  

Modular exponentiation forms the basis of almost all the prevalent public key 

algorithms currently, such as the RSA, DSA and Diffie-Hellman algorithms. In 

[3], we demonstrate the fastest implementation of modular exponentiation on 

Intel processors. The 2

nd

 Generation Intel® Core™ processor family improves 

the performance of the multiply and adc (add with carry) instructions, 

resulting in much faster modular exponentiation. Our implementation is a 

constant-time one, safe from branch and cache based side-channel attacks, 

and in [3] we demonstrate that our implementation is substantially faster on 

Intel processors than the best-known publicly available implementation in 

OpenSSL. 

The hashing and private key encryption can be implemented using the multi-

buffer technique described in [4] for best performance on Intel processors. 

There are two basic ways that processing multiple buffers in parallel can 

improve performance: processing the buffers with SIMD instructions or 

processing multiple buffers in parallel to reduce data dependency limits. In 

[4], we describe how using multi-buffer techniques result in the best 

performance for AES Encryption and the SHA1 secure hashing algorithm, 

compared to the best single-buffer implementations. This is also the case for 

the MD5 and SHA256 algorithms which are included in this study. In [4], we 

describe a job scheduler to manage multiple buffers of varying sizes, for a 

fully generalized solution to the multi-buffer problem for all usage models. 

Summary of Contents for BX80623I72600K

Page 1: ...Performance on the 2nd Generation Intel Core processor family January 2011 White Paper Vinodh Gopal Jim Guilford Wajdi Feghali Erdinc Ozturk Gil Wolrich Kirk Yap Sean Gulley Martin Dixon IA Architect...

Page 2: ...e net result is an improvement in cryptographic performance up to 1 8X over the previous Intel processors1 The Intel Embedded Design Center provides qualified developers with web based access to techn...

Page 3: ...Cryptographic Performance on the 2nd Generation Intel Core processor family 3 and the embedded community Design Fast Design Smart Get started today www intel com embedded edc...

Page 4: ...ocessor family 4 Contents Overview 5 Improving Cryptographic Processing 5 Multi Buffer via SIMD 6 Multi Buffer via Data Dependency Hiding 6 Performance 7 Private Key and Secure Hashing Performance 8 P...

Page 5: ...the RSA DSA and Diffie Hellman algorithms In 3 we demonstrate the fastest implementation of modular exponentiation on Intel processors The 2nd Generation Intel Core processor family improves the perf...

Page 6: ...ntel AVX Intel Advanced Vector Extensions instruction set 5 resulting in much faster multi buffer hashing Although we process the same number of buffers as with SSE we gain some performance efficiency...

Page 7: ...ded keys AES using 128 bit keys is a common usage the results will be similar and scale for other key sizes such as 192 and 256 bit keys The modular exponentiation code bases were measured on 512 bit...

Page 8: ...2 bit Modular Exponentiation 360 880 246 899 1 46 1024 bit Modular Exponentiation 2 722 590 1 906 555 1 43 In this case we observe a large performance boost with the Intel Core i7 2600 processor based...

Page 9: ...erence http softwarecommunity intel com isn downloads intelavx Intel AVX Programming Reference 31943302 pdf The Intel Embedded Design Center provides qualified developers with web based access to tech...

Page 10: ...Gopal Jim Guilford Erdinc Ozturk Gil Wolrich Wajdi Feghali Kirk Yap Sean Gulley and Martin Dixon are IA Architects with the IAG Group at Intel Corporation Acronyms IA Intel Architecture API Applicati...

Page 11: ...tem Performance will vary depending on the specific hardware and software you use For more information including details on which processors support HT Technology see here 64 bit computing on Intel ar...

Reviews: