Processor Memory Networks Based on Steiner Systems

Processor Memory Networks Based on Steiner Systems Tom VanCourt Boston University Martin C. Herbordt ECE Department

Introduction • Problem: “Big Science” brought genomes to the desktop Primary analysis engine is still PC! • Our Goal: Bring “Big Computation” to the desktop. • Proposed Solution: PC/Workstation plus FPGA-based coprocessor

Desired Problem Characteristics • Low resolution data Data have few bits • Large but manageable data sets Thousands of genes, thousands of samples (at a time) • High-dimensional parameter set Must be searched or enumerated to identify solution • Simple performance criteria (score functions) Evaluated for each candidate parameter vector • Decomposable search strategy Multiple small problems to solve in parallel • Heavy reuse of data Combinations, permutations, orientations

Biology/Chemistry Problems • Sequence alignment Approximate string matching, dynamic programming • Molecule interactions – voxel model 3-axis rotation, 3D convolution • Microarray data analysis Typical: 10 to 10 2 samples,10 3 to 10 4 genes/sample • Hidden Markov models Baum-Welch training, soft Viterbi decode • Compute-intensive statistical analysis Bootstraps, sampling background models

Va f Vb Vc Sample Task-Specific Processor • Input: Vectors V1, V2, … Vv • Query: Which set of t vectors maximizes f(Va, Vb, …)? • Architecture: Parallel PEs on FPGA coprocessor

Problem: Input Bandwidth • Assume ~128 PEs × 3 inputs per PE = 384 values per cycle × 4 bits per value = 1536 bits per cycle • Wide-word solution: INFEASIBLE 400-ported RAM? Data fetched faster than host can load it 384 input values needed

k 3 Distribution Network • X memory: k data values supply PEs. k= 9  84 PEs, 252 PEs X inputs, 28× reuse k= 10 120 PEs, 360 PEs X inputs, 36× reuse • Generates all size-3 subsets of k data values X1-10 Vector Data Memory … PE119 PE0 PE1 PE2 PE3 PE4

Vector Data Memory • Steiner systemS(v, k, t) Dividevobjects into subsets of sizek, so that every size-tsubset is in just onesize-ksubset. • t= 3 (triplets),k= X memories,v= total genes • Host selects vector sets {m1, m2,…}via RAM content ~100 vectors per X memory 105-106 sets of vector choices indx1 Vector Select RAM X1 indx2 X2 indxm Xm indxSEL

Two-level data reuse Temporal reuse by Vector Select Ram Spatial reuse by Distribution Network Whole VDM duplicated Double buffering Overlaps reload, reading Memory and Data Distribution k … Vector data memory (VDM) Dividevobjects into subsets of sizek … … so that every size-3subset is injust onesize-ksubset

Conditions for Success • Data must be strings/vectors If scalar, then Vector Select RAM would be enough … host to PE transfer would be enough • Longer vectors better Longer vector  More time per Vector Select word … Longer time between reloads • Narrow data words better Fewer bits per vector, more vectors for bandwidth

Open problems • Steiner systems are special cases k-setsthat contain eacht-setexactly once Theory guarantees large numbers of cases • Set-covering problem in other cases k-setsthat contain eacht-setat least once • Finding collections of k-setshard Believed NP hard Constructive forms of existence theorems?

Processor Memory Networks Based on Steiner Systems

Processor Memory Networks Based on Steiner Systems

Presentation Transcript

Multiple Processor Systems

Multiple Processor Systems

Memory Management and Processor Management

Efficient Steiner Tree Construction Based on Spanning Graphs

Processor Memory Networks Based on Steiner Systems

Multiple Processor Systems

Memory Hierarchy- Power 5 Processor

Multiple Processor Systems

Memory/Processor

4: Processor -based Control Systems

CORDIC-Based Processor

Processor and Memory Organisation

Multiple Processor Systems

Multiple Processor Systems

Single Processor Machines: Memory Hierarchies and Processor Features

Basic procedures on processor networks

Multiple Processor Systems

Processor and Memory organization – Lesson-1 Processor organization

Multiple Processor Systems

Multiple Processor Systems

Multiple Processor Systems

Efficient Steiner Tree Construction Based on Spanning Graphs