1 / 32

CPRE 583 Reconfigurable Computing Lecture 23: Wed 11/16/2011 (High-level Acceleration Approaches)

CPRE 583 Reconfigurable Computing Lecture 23: Wed 11/16/2011 (High-level Acceleration Approaches). Instructor: Dr. Phillip Jones (phjones@iastate.edu) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA. http://class.ee.iastate.edu/cpre583/. Announcements/Reminders.

andres
Télécharger la présentation

CPRE 583 Reconfigurable Computing Lecture 23: Wed 11/16/2011 (High-level Acceleration Approaches)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPRE 583Reconfigurable ComputingLecture 23: Wed 11/16/2011(High-level Acceleration Approaches) Instructor: Dr. Phillip Jones (phjones@iastate.edu) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA http://class.ee.iastate.edu/cpre583/

  2. Announcements/Reminders • HW3: will be assigned as extra credit • Exam 2 • Reminder push back to Friday after Thanksgiving week • Weekly Project Updates due: Friday’s (midnight)

  3. Project Grading Breakdown • 50% Final Project Demo • 30% Final Project Report • 20% of your project report grade will come from your 5-6 project updates. Friday’s midnight • 20% Final Project Presentation

  4. Projects Ideas: Relevant conferences • Micro • Super Computing • HPCA • IPDPS • FPL • FPT • FCCM • FPGA • DAC • ICCAD • Reconfig • RTSS • RTAS • ISCA

  5. Projects: Target Timeline • Teams Formed and Topic: Mon 10/10 • Project idea in Power Point 3-5 slides • Motivation (why is this interesting, useful) • What will be the end result • High-level picture of final product • Project team list: Name, Responsibility • High-level Plan/Proposal: Fri 10/14 • Power Point 5-10 slides (presentation to class Wed 10/19) • System block diagrams • High-level algorithms (if any) • Concerns • Implementation • Conceptual • Related research papers (if any)

  6. Projects: Target Timeline • Work on projects: 10/19 - 12/9 • Weekly update reports • More information on updates will be given • Presentations: Finals week • Present / Demo what is done at this point • 15-20 minutes (depends on number of projects) • Final write up and Software/Hardware turned in: Day of final (TBD)

  7. Initial Project Proposal Slides (5-10 slides) • Project team list: Name, Responsibility (who is project leader) • Team size: 3-4 (5 case-by-case) • Project idea • Motivation (why is this interesting, useful) • What will be the end result • High-level picture of final product • High-level Plan • Break project into mile stones • Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip. • System block diagrams • High-level algorithms (if any) • Concerns • Implementation • Conceptual • Research papers related to you project idea

  8. Weekly Project Updates • The current state of your project write up • Even in the early stages of the project you should be able to write a rough draft of the Introduction and Motivation section • The current state of your Final Presentation • Your Initial Project proposal presentation (Due Wed 10/19). Should make for a starting point for you Final presentation • What things are work & not working • What roadblocks are you running into

  9. Common Questions

  10. Overview • Discuss some high-level approaches for accelerating applications.

  11. What you should learn • Start to get a feel for approaches for accelerating applications.

  12. Profiling Applications • Finding bottlenecks • Profiling tools • gprof: http://www.cs.nyu.edu/~argyle/tutorial.html • Valgrind

  13. Pipelining How many ns to process to process 100 input vectors? Assuming each LUT Has a 1 ns delay. Input vector <A,B,C,D> output A 4-LUT 4-LUT 4-LUT 4-LUT B C DFF DFF DFF DFF D How many ns to process 100 input vectors? Assume a 1 ns clock 1 DFF delay per output A 4-LUT 4-LUT 4-LUT 4-LUT B C DFF DFF DFF DFF D

  14. Pipelining (Systolic Arrays) Dynamic Programming • Start with base case • Lower left corner • Formula for computing • numbering cells • 3. Final result in upper • right corner.

  15. Pipelining (Systolic Arrays) Dynamic Programming • Start with base case • Lower left corner • Formula for computing • numbering cells • 3. Final result in upper • right corner. 1

  16. Pipelining (Systolic Arrays) Dynamic Programming • Start with base case • Lower left corner • Formula for computing • numbering cells • 3. Final result in upper • right corner. 1 1 1

  17. Pipelining (Systolic Arrays) Dynamic Programming 1 • Start with base case • Lower left corner • Formula for computing • numbering cells • 3. Final result in upper • right corner. 1 2 1 1 1

  18. Pipelining (Systolic Arrays) Dynamic Programming 1 3 • Start with base case • Lower left corner • Formula for computing • numbering cells • 3. Final result in upper • right corner. 1 2 3 1 1 1

  19. Pipelining (Systolic Arrays) Dynamic Programming 1 3 6 • Start with base case • Lower left corner • Formula for computing • numbering cells • 3. Final result in upper • right corner. 1 2 3 1 1 1

  20. Pipelining (Systolic Arrays) Dynamic Programming 1 3 6 • Start with base case • Lower left corner • Formula for computing • numbering cells • 3. Final result in upper • right corner. 1 2 3 1 1 1 How many ns to process if CPU can process one cell per clock (1 ns clock)?

  21. Pipelining (Systolic Arrays) Dynamic Programming 1 3 6 • Start with base case • Lower left corner • Formula for computing • numbering cells • 3. Final result in upper • right corner. 1 2 3 1 1 1 How many ns to process if FPGA can obtain maximum parallelism each clock? (1 ns clock)

  22. Pipelining (Systolic Arrays) Dynamic Programming 1 3 6 • Start with base case • Lower left corner • Formula for computing • numbering cells • 3. Final result in upper • right corner. 1 2 3 1 1 1 What speed up would an FPGA obtain (assuming maximum parallelism) for an 100x100 matrix. (Hint find a formula for an NxN matrix)

  23. Dr. James Moscola (Example) ROOT0 S0 g a c c a g IL1 IR2 1 2 3 MATP1 MP3 ML4 MR5 D6 ROOT0 1 MATP1 3 IL7 IR8 2 MATL2 MATL2 END3 ML9 D10 IL11 END3 E12

  24. Example RNA Model ROOT0 S0 g a c c a g IL1 IR2 1 2 3 MATP1 MP3 ML4 MR5 D6 ROOT0 1 MATP1 3 IL7 IR8 2 MATL2 MATL2 END3 ML9 D10 IL11 END3 E12

  25. Baseline Architecture Pipeline END3 MATL2 MATP1 ROOT0 E12 IL11 D10 ML9 IR8 IL7 D6 MR5 ML4 MP3 IR2 IL1 S0 u g g c g a c a c c c residue pipeline

  26. Processing Elements -INF -INF .40 -INF .44 .30 -INF .30 .72 .22 ML4 d  0 1 2 3 0 1 j  IL7,3,2 + 2 ML4_t(7) = 3 IR8,3,2 + ML4_t(8) = ML9,3,2 + ML4_t(9) = D10,3,2 ML4,3,3 = .22 + + ML4_t(10) ML4_e(A) ML4_e(C) ML4_e(G) ML4_e(U) input residue, xi

  27. Baseline Results for Example Model • Comparison to Infernal software • Infernal run on Intel Xeon 2.8GHz • Baseline architecture run on Xilinx Virtex-II 4000 • occupied 88% of logic resources • run at 100 MHz • Input database of 100 Million residues • Bulk of time spent on I/O (41.434s)

  28. Expected Speedup on Larger Models • Speedup estimated ... • using 100 MHz clock • for processing database of 100 Million residues • Speedups range from 500x to over 13,000x • larger models with more parallelism exhibit greater speedups

  29. Distributed Memory ALU Cache BRAM BRAM PE BRAM BRAM

  30. Short Overview of Good Reference • Achieving High Performance with FPGA-Based Computing • Reading #11 • Martin C. Herbordt, 2007

  31. Next Class • Evolvable Hardware

  32. Questions/Comments/Concerns • Write down • Main point of lecture • One thing that’s still not quite clear • If everything is clear, then give an example of how to apply something from lecture OR

More Related