1 / 24

GMU SHA Core Interface & Hash Function Performance Metrics

GMU SHA Core Interface & Hash Function Performance Metrics. Interface. Why Interface Matters?. Pin limit. Total number of i/o ports ≤ Total number of an FPGA i/o pins. Support for the maximum throughput. Time to load the next message block ≤ Time to process current block.

rkittrell
Télécharger la présentation

GMU SHA Core Interface & Hash Function Performance Metrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GMU SHA Core Interface & Hash Function Performance Metrics

  2. Interface

  3. Why Interface Matters? • Pin limit Total number of i/o ports ≤ Total number of an FPGA i/o pins • Support for the maximum throughput Time to load the next message block ≤ Time to process current block

  4. Interface: Two possible solutions msg_bitlen SHA core message end_of_msg zero_word Length of the message communicated at the beginning Dedicated end-of-message port − more intelligent source circuit required + no need for internal message bit counter • + easy to implement • passive source circuit • − area overhead for the counter • of message bits

  5. SHA Core: Interface & Typical Configuration clk rst clk rst clk rst rst rst clk clk rst clk Input FIFO Output FIFO SHA core ext_idata ext_odata odata idata dout din dout din dout din w w w w fifoout_empty fifoin_empty fifoout_full fifoin_full empty dst_ready src_ready empty full full fifoin_read fifoout_write fifoin_write fifoout_read dst_write src_read write read write read • SHA core is an active component; surrounding FIFOs are passive and • widely available • Input interface is separate from an output interface • Processing a current block, reading the next block, and storing • a result for the previous message can be all done in parallel

  6. SHA Core Interface rst clk rst clk SHA core w w din dout dst_ready src_ready dst_write src_read

  7. SHA Core Interface + Surrounding FIFOs rst rst clk clk rst clk rst rst clk clk rst clk Input FIFO Output FIFO SHA core ext_idata ext_odata odata idata dout din dout din dout din w w w w fifoout_empty fifoin_empty fifoout_full fifoin_full empty dst_ready src_ready empty full full fifoin_read fifoout_write fifoin_write fifoout_read dst_write src_read write read write read

  8. Operation of FIFO

  9. Communication Protocol for Unpadded Messages b) a) w bits w bits msg_bitlen seg_0_bitlen . . . seg_0 seg_1_bitlen message seg_1 −−−−−    zero_word seg_n-1_bitlen seg_n-1 −−−−− zero_word

  10. SHA Core Interface with Additional Faster I/O Clock rst io_clk clk rst io_clk clk SHA core w w din dout dst_ready src_ready dst_write src_read

  11. SHA Core Interface with Two Clocks + Surrounding FIFOs rst rst io_clk io_clk clk rst io_clk rst rst clk io_clk clk rst clk Input FIFO Output FIFO SHA core ext_idata ext_odata odata idata dout din dout din dout din w w w w fifoout_empty fifoin_empty fifoout_full fifoin_full empty dst_ready src_ready empty full full fifoin_read fifoout_write fifoin_write fifoout_read dst_write src_read write read write read

  12. Communication Protocol for Padded MessagesWithout Message Splitting w bits msg_len_ap | last = 1 msg_len_bp message msg_len_ap – message length after padding [bits] msg_len_bp – message length before padding [bits]

  13. Communication Protocol for Padded MessagesWith Message Splitting w bits seg_0_len_ap | last=0 . . . seg_0 seg_1_len_ap | last=0 seg_i_len_ap – segment i length after padding* [bits] seg_i_len_bp – segment i length before padding [bits] seg_1    * For all i < n-1 segment i length after padding is assumed to be a multiple of the message block size, b [characteristic to each function], and thus also the word size, w. The last segment cannot consist of only padding bits. It must include at least one message bit. seg_n-1_len_ap | last=1 seg_n-1_len_bp seg_n-1

  14. Performance Metrics

  15. Performance Metrics - Speed Throughput for Long Messages [Mbit/s] Throughput for Short Messages [Mbit/s] Execution Time for Short Messages [ns] Allows for easy cross-comparison among implementations in software (microprocessors), FPGAs (various vendors), ASICs (various libraries)

  16. Performance Metrics - Speed Time to hash N blocks of message [cycles] = Htime(N) The exact formula from analysis of a block diagram, confirmed by functional simulation. Minimum Clock Period [ns] = T From a place & route and/or static timing analysis report file.

  17. Time to Hash N Blocks of the Message [clock cycles]

  18. Performance Metrics - Speed Minimum time to hash N blocks of message [ns] = Htime(N)⋅T block_size Maximum Throughput (for long messages) = T * (Htime(N+1) - Htime(N)) block_size = T * block_processing_time Effective maximum throughput for short messages:

  19. Performance Metrics - Speed from specification Maximum Throughput (for long messages) block_size = T * block_processing_time from place & route report and/or static timing analysis report from analysis of block diagram and/or functional simulation

  20. Performance Metrics - Area For the basic, folded, and unrolled architectures, we force these vectors to look as follows through the synthesis and implementation options: 0 0 0 0 Areaa

  21. Primary Optimization Target: Throughput to Area Ratio Features: practical: good balance between speed and cost very reliable guide through the entire design process, facilitating the choice of high-level architecture implementation of basic components choice of tool options leads to high-speed, close-to-maximum-throughput designs Choice of Optimization Target

  22. Our Design Flow Interface Specification Controller Template Datapath Block diagram Controller ASM Chart Library of Basic Components VHDL Code Formulas for Throughput & Hash time Max. Clock Freq. Resource Utilization Throughput, Area, Throughput/Area, Hash Time for Short Messages

  23. How to compare hardware speed vs. software speed? EBASH reports (http://bench.cr.yp.to/results-hash.html) In graphs Time(n) = Time in clock cycles vs. message size in bytes for n-byte messages, with n=0,1, 2, 3, … 2048, 4096 In tables Performance in cycles/byte for n=8, 64, 576, 1536, 4096, long msg Time(4096) – Time(2048) Performance for long message = 2048

  24. How to compare hardware speed vs. software speed? 8 bits/byte ⋅ clock frequency [GHz] Throughput [Gbit/s] = Performance for long message [cycles/byte]

More Related