110 likes | 201 Vues
This research focuses on enhancing PC performance for complex scientific tasks using FPGA-based coprocessors within a Steiner systems framework. The proposed solution aims to handle large volumes of low-resolution data efficiently, making it suitable for diverse applications in biology and chemistry. The study explores the implementation of parallel processing elements on an FPGA coprocessor to optimize data processing, demonstrating potential benefits for various compute-intensive analyses. The research also addresses input bandwidth challenges and proposes innovative solutions for maximizing computational efficiency, considering factors like wide-word solutions and distribution networks. Overall, this work aims to bridge the gap between theoretical concepts like Steiner systems and practical applications in data-intensive scientific domains.
E N D
Processor Memory Networks Based on Steiner Systems Tom VanCourt Boston University Martin C. Herbordt ECE Department
Introduction • Problem: “Big Science” brought genomes to the desktop Primary analysis engine is still PC! • Our Goal: Bring “Big Computation” to the desktop. • Proposed Solution: PC/Workstation plus FPGA-based coprocessor
Desired Problem Characteristics • Low resolution data Data have few bits • Large but manageable data sets Thousands of genes, thousands of samples (at a time) • High-dimensional parameter set Must be searched or enumerated to identify solution • Simple performance criteria (score functions) Evaluated for each candidate parameter vector • Decomposable search strategy Multiple small problems to solve in parallel • Heavy reuse of data Combinations, permutations, orientations
Biology/Chemistry Problems • Sequence alignment Approximate string matching, dynamic programming • Molecule interactions – voxel model 3-axis rotation, 3D convolution • Microarray data analysis Typical: 10 to 10 2 samples,10 3 to 10 4 genes/sample • Hidden Markov models Baum-Welch training, soft Viterbi decode • Compute-intensive statistical analysis Bootstraps, sampling background models
Va f Vb Vc Sample Task-Specific Processor • Input: Vectors V1, V2, … Vv • Query: Which set of t vectors maximizes f(Va, Vb, …)? • Architecture: Parallel PEs on FPGA coprocessor
Problem: Input Bandwidth • Assume ~128 PEs × 3 inputs per PE = 384 values per cycle × 4 bits per value = 1536 bits per cycle • Wide-word solution: INFEASIBLE 400-ported RAM? Data fetched faster than host can load it 384 input values needed
k 3 Distribution Network • X memory: k data values supply PEs. k= 9 84 PEs, 252 PEs X inputs, 28× reuse k= 10 120 PEs, 360 PEs X inputs, 36× reuse • Generates all size-3 subsets of k data values X1-10 Vector Data Memory … PE119 PE0 PE1 PE2 PE3 PE4
Vector Data Memory • Steiner systemS(v, k, t) Dividevobjects into subsets of sizek, so that every size-tsubset is in just onesize-ksubset. • t= 3 (triplets),k= X memories,v= total genes • Host selects vector sets {m1, m2,…}via RAM content ~100 vectors per X memory 105-106 sets of vector choices indx1 Vector Select RAM X1 indx2 X2 indxm Xm indxSEL
Two-level data reuse Temporal reuse by Vector Select Ram Spatial reuse by Distribution Network Whole VDM duplicated Double buffering Overlaps reload, reading Memory and Data Distribution k … Vector data memory (VDM) Dividevobjects into subsets of sizek … … so that every size-3subset is injust onesize-ksubset
Conditions for Success • Data must be strings/vectors If scalar, then Vector Select RAM would be enough … host to PE transfer would be enough • Longer vectors better Longer vector More time per Vector Select word … Longer time between reloads • Narrow data words better Fewer bits per vector, more vectors for bandwidth
Open problems • Steiner systems are special cases k-setsthat contain eacht-setexactly once Theory guarantees large numbers of cases • Set-covering problem in other cases k-setsthat contain eacht-setat least once • Finding collections of k-setshard Believed NP hard Constructive forms of existence theorems?