170 likes | 292 Vues
This project presents a comprehensive approach to accelerate secure hash cryptography algorithms, specifically MD5 and SHA-1, implemented on the Tensilica Xtensa processor. By utilizing SystemC for hardware descriptions and custom instruction sets, significant performance improvements were achieved, attaining speeds of up to 320 Mbps. This integration with the OpenSSL application illustrates the effectiveness of applying architectural exploration and compound instructions. While the process required extensive source code transformations, it ultimately yielded results comparable to hand-crafted implementations, demonstrating that competitive designs can be developed with diminished hardware expertise.
E N D
CS343 Project Presentation:Secure Hash Acceleration Winnie Cheng Alvin Cheung Paul Hartke June 4, 2003
Project Overview • Accelerate secure hash cryptography algorithms in common use versus a standalone processor • Focus on Md5 and Sha1 • Utilize two different implementation methodologies • Tensilica Xtensa Processor and SystemC • Integrate implementations in real application • Open-Source OpenSSL package selected as target application • Utilizes a number of encryption algorithms • Integrate system in operational system versus using synthetic benchmarks
SHA-1 Basic Round repeated 80 times
Tensilica Processor Extensions • Create compound instructions to perform more of the algorithm per clock cycle • 25 instructions/byte of input data @ 200Mhz clock 64Mbps • Reduce to 5 instructions per cycle • 5 instructions/byte of input data @ 200Mhz clock 320Mbps • 5 cycles from the critical path of the operations using a 200Mhz clock
Custom Instruction Sharing • Sharing between instructions appears attractive • Both algorithm rounds dominated by adder trees, shifts, and logical functions • However, the overlap of actual specific groups of operations was minimal • Results in separate instructions for each algorithm
Architectural Exploration with SystemC • Objective is to take the same source md5/sha1 high level C source code and directly generate a hardware implementation • Then compare to existing hand verilog implementations and extended TIE processor
SystemC Limitations • Original source code not directly usable by SystemC • Pointers not synthesizable requires rewrite of original source • Minimal architectural transformations performed • no loop fusion • no automatic loop unrolling exploration
Successive Design Iterations • Iterative flow results required successive source code transformations to achieve better size and area • Scheduling analysis indicated target areas for improvement • Areas of low utilization • Excessive resource dependencies • In the end, final source code gave results close to hand verilog implementation • But final code had very little resemblance with original C source but did resemble hand verilog
SystemC Implementation Observations • Successive iteration asymptotically approached area/performance of hand-code • Implementation time is about the same as for experienced verilog designer but no extensive hardware expertise required • Bus interface and Device drivers still required to interface with processor • Included with TIE implementation “for free”
OpenSSL Integration Methodology • Wrote custom sha1 / md5 routines with Tensilica extensions and compiled to xtensa elf files • Created a wrapper for xtensa ISS to run the encryption routines • Statically linked the wrapper ISS into OpenSSL • When OpenSSL calls sha1 or md5, system traps down into emulated function that will in turn execute operation on wrapped simulator
OpenSSL Integration Challenges • Original approach was to statically link in the custom ISS using the OpenSSL “Engine” hardware accelerator interface • Openssl supports the dynamic loading of custom encryption engines and allows the user to choose which engine to use for a particular encryption routine • But the ISS uses dynamic libraries that cannot be statically linked in • So we kept the ISS as an executable and runs it as a separate process outside openssl, and returns results via external files • Openssl engine interface is not completely developed and does not fully support SSL functionalities • So instead of using the engine interface we replaced the OpenSSL original sha1 / md5 routines with our implementations that invoke the ISS
Conclusions • Neither Tensilica nor SystemC implementations were fully automatic tools • However, they both led to implementations competitive with a hand implementation • Key advantage is that designs can be implemented with much less expertise • Especially much less hardware design expertise