150 likes | 231 Vues
This paper discusses stream programming on FPGAs, focusing on stream folding to maximize throughput while meeting area and latency constraints. Opportunities, problems, and results are presented.
E N D
A Computing Origami: Folding Streams in FPGAs DAC 2009, California, USA S. M. Farhad PhD Student University of Sydney
Outline • Motivation • Stream programming • FPGA • Problem • Stream Folding • Results • Conclusion 2
Stream Programming Paradigm • Programs expressed as stream graphs • Streams: Sequence of data elements • Actor: Functions applied to streams • Independent actors with explicit communication • Regular and repeating computation Streams Actor/Filter Streams
FPGA • FPGAs are widely available as programmable coprocessors • Opportunities to exploit FPGA-based acceleration • Multimedia, networking, graphics, and security codes
Problem • Maximizing throughput subject to • Area and latency constraints • Resolving bottleneck actors • The replicated filters do not require resynthesis
Outline • Motivation • Stream programming • FPGA • Problem • Stream Folding • Results • Conclusion 9
Area/Throughput Design Folding 1 foreachFilter f in S do 2 workFactor[f] = f.latency.S.runs(f); 3 designPointArea+ = f.area.workFactor[f]; 4 scaleLimit = minf.hasState (1/workFactor[f]); 5 scaling = min(AREA/designPointArea, scaleLimit); 6 foreachFilter f in S do 7 replication[f] = workFactor[f].scaling; 8 whilearea(replication) > AREA do 9 replication = reduceThroughput(replication);
Calculating Latency • FPGAs that are coupled to host processors • Initiation interval (DMA) • Replication improves throughput, it often increases the latency! • Major factors for latency variation • Non-periodic data arrival • Data-token reordering • Local congestion
Latency constrained design folding 1 latConf= null ; T = ∞; 2 whilethroughput(thrConf) ≤ T do 3 iffeasibleImprovement(thrConf) then 4 candidates = simAnnealing(thrConf, T); 5 foreachcandidate in candidates do 6 ifthroughput(candidate) < Tthen 7 latConf = candidate; 8 T = throughput(latConf); 9 thrConf = reduceThroughput(thrConf); 10 returnlatConf