Download
exploiting forwarding to improve data bandwidth of instruction set extensions n.
Skip this Video
Loading SlideShow in 5 Seconds..
Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions PowerPoint Presentation
Download Presentation
Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions

Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions

65 Vues Download Presentation
Télécharger la présentation

Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions RamkumarJayaseelan, Haibin Liu, TulikaMitra School of Computing, National University of Singapore {ramkumar, liuhb, tulika}@comp.nus.edu.sg Presented by Alex Oumantsev

  2. Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions • Introduce the material • Related Work • Proposed Architecture • Compilation Toolchain • Experimental Evaluation • Conclusion

  3. Application-Specific instruction-set extensions (Custom Instructions) • Extend the instruction-set architecture • Balance performance and time-to-market • Frequently used computation patterns • Custom Functional Units • Parallelization and chaining of operations • Processor Support – RISC-style • Altera Nios-II • Tensilica Xtensa

  4. Base Processor – Custom Instruction mismatch • RISC-style • Fixed-length instructions • Two input operations per instruction • Custom Instructions • Complex • Multiple inputs per operation

  5. Number of Inputs per Custom Instruction

  6. Data Forwarding • Present on a typical RISC processor • Register Bypassing • Supplies data to a Functional Unit from buffer • Resolves Data hazards between instructions • Input operands for Custom Instruction • Use existing Logic

  7. Related Work • Design Space Exploration • Data Bandwidth • Nios-II Internal Register Files • Extra cycles wasted on explicit MOV • MicroBalaze Xilinx : Fast Simplex Link • put and get instructions • Relaxing register file port constraints • Fixed length instruction problem

  8. Proposed Architecture • MIPS-like 5 stage pipeline

  9. Data Forwarding • CUST instruction draws 2 inputs from Forwarding • Able to take up to 4 inputs • Modification – Do not read from Register in ID if Forwarding

  10. Instruction Encoding • Transparent to regular instructions • Minimize number of bits for operands • NIOS-II Example • Use 11 bits of OPX field • OPD defines operands from forwarding • COP specifies the custom instruction

  11. Predictable Forwarding • Two prior instructions can be used • Problems with Multicycle and Cache Miss • Create bubbles in the pipeline • Can’t rely on forwarding • Modify to send Stall signal to all stages • Pauses the pipeline till ready • No need for NOP instruction

  12. Multicycle Delays

  13. Cache Miss Delays

  14. Compilation Toolchain • Compiler cooperation needed • Determine if operand can be forwarded • Encode custom instruction correctly • Schedule to maximize forwarding

  15. Compilation Toolchain • IR Scheduling • Pattern Identification • Identify all possible patterns for custom instructions • Pattern Selection • Heuristic pattern Priority=speedup * frequency • Instruction Scheduling • Find optimal scheduling with forwarding • Forwarding Check and MOV Insertion • Insert MOV from x reg to x reg if needed

  16. Experimental Evaluation • SimpleScalar tool set used • Constraint of max 4 inputs and one output • Selected benchmarks

  17. Speedup • Speedup = (CycleOrigin / CycleEx -1)*100 • Ideal – 4 Read Ports from Registers • Forwarding – Discussed solution (may have MOV) • MOV – Nios-II implemented solution (forces MOV)

  18. Energy Consumption • Energy used by Registers • Ideal – 4 Read Ports from Registers • Forwarding – Discussed solution (may have MOV) • MOV – Nios-II implemented solution (forces MOV)

  19. Conclusion • Compiler modification • Minor pipeline modification • Data Forwarding used for MISO custom instructions • Overcome limited register ports • Compatible instruction encoding • Near-ideal speedup