210 likes | 344 Vues
The Cornell University Computer Systems Laboratory, led by Prof. José Martínez, is pioneering hardware accelerators for belief propagation algorithms on embedded SoCs for retail, automotive, home, and mobile applications. This innovative approach targets high-speed, low-power performance, optimizing message-passing in graphical models like Bayesian networks and Markov Random Fields. Our research aims to revolutionize image processing, speech recognition, and data mining through fast, programmable accelerators that significantly enhance user experience across various platforms.
E N D
Accelerating Belief Propagation in Hardware SkandHurkat and José Martínez Computer Systems Laboratory Cornell University http://www.csl.cornell.edu/
The Cornell Team • Prof. José Martínez (PI), Prof. RajitManohar@ Computer Systems Lab • Prof. Tsuhan Chen@ Advanced Multimedia Processing Lab • MS/Ph.D. students • Yuan Tian, MS ’13 • SkandHurkat • Xiaodong Wang
The Cornell Project Inference Algorithm Graph • Provide hardware accelerators for belief propagation algorithms on embedded SoCs(retail/car/home/mobile) • High speed • Very low power • Self-optimizing • Highly programmable BP Accelerator within SoC Result
What is belief propagation? Belief propagation is a message passing algorithm for performing inference on graphical models, such as Bayesian networks or Markov Random Fields
What is belief propagation? • Labelling problem • Energy as a measure of convergence • Minimize energy (MAP label estimation) • Exact results for trees • Converges in exactly two iterations • Approximate results for graphs with loops • Yields “good” results in practice • Minimum over large neighbourhoods • Close to optimal solution
Not all “that” alien to embedded Remember the Viterbi algorithm? • Used extensively in digital communications
What does this mean? • Every mobile device uses Viterbi decoders • Error correction codes (eg: turbo codes) • Mitigating inter-symbol interference (ISI) • Increasing number of mobile applications involve belief propagation • More general belief propagation accelerators can greatly improve user experience with mobile devices
Target markets Retail/Car/Home/Mobile • Image processing • De-noising • Segmentation • Object detection • Gesture recognition • Handwriting recognition • Improved recognition through context identification • Speech recognition • Hidden Markov models are key to speech recognition Servers • Data mining tasks • Part-of-speech tagging • Information retrieval • “Knowledge graph” like applications • Machine learning based tasks • Constructive machine learning • Recommendation systems • Scientific computing • Protein structure inference
Hardware accelerator for BP Inference Algorithm Graph BP Accelerator within SoC Result
Work done so far Software • General purpose MRF inference library • Support for arbitrary graphs • Floating point math • Parallel techniques for faster inference • Library optimized for grid graphs • Optimized data structures • Template can use any data type • Multiple inference techniques optimized for early vision • Stereo matching in 200 ms Hardware • High level synthesis of message update unit • Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard • ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference • Fixed point math • GraphGen collaboration • On-going work • Stereo matching task mapped to multiple platforms • 10x speedup on GPU w.r.t. CPU only implementation
Work done so far Software • General purpose MRF inference library • Support for arbitrary graphs • Floating point math • Parallel techniques for faster inference • Library optimized for grid graphs • Optimized data structures • Template can use any data type • Multiple inference techniques optimized for early vision • Stereo matching in 200 ms Hardware • High level synthesis of message update unit • Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard • ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference • Fixed point math • GraphGen collaboration • On-going work • Stereo matching task mapped to multiple platforms • 10x speedup on GPU w.r.t. CPU only implementation
Work done so far Software • General purpose MRF inference library • Support for arbitrary graphs • Floating point math • Parallel techniques for faster inference • Library optimized for grid graphs • Optimized data structures • Template can use any data type • Multiple inference techniques optimized for early vision • Stereo matching in 200 ms Hardware • High level synthesis of message update unit • Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard • ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference • Fixed point math • GraphGen collaboration • On-going work • Stereo matching task mapped to multiple platforms • 10x speedup on GPU w.r.t. CPU only implementation
Work done so far Software • General purpose MRF inference library • Support for arbitrary graphs • Floating point math • Parallel techniques for faster inference • Library optimized for grid graphs • Optimized data structures • Template can use any data type • Multiple inference techniques optimized for early vision • Stereo matching in 200 ms Hardware • High level synthesis of message update unit • Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard • ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference • Fixed point math • GraphGen collaboration • On-going work • Stereo matching task mapped to multiple platforms • 10x speedup on GPU w.r.t. CPU only implementation
Work done so far Software • General purpose MRF inference library • Support for arbitrary graphs • Floating point math • Parallel techniques for faster inference • Library optimized for grid graphs • Optimized data structures • Template can use any data type • Multiple inference techniques optimized for early vision • Stereo matching in 200 ms Hardware • High level synthesis of message update unit • Vivado HLS (C-to-gates) tool used to synthesize message update unit on ZedBoard • ∼2x improvement in inference speed on CPU+FPGA compared to CPU-only inference • Fixed point math • GraphGen collaboration • On-going work • Stereo matching task mapped to multiple platforms • 10x speedup on GPU w.r.t. CPU only implementation
GraphGen synthesis of BP-M • BP-M update (logspace messages) implemented using GraphGen (Intel/CMU/UW) • GPU implementation 10x faster than CPU based implementation • On-going work on FPGA based implementation and on implementing hierarchical update
Cornell Publications (2013 only) • 3x Comp. Vision & Pattern Recognition (CVPR) • 3x Asynchronous VLSI (ASYNC) • 2x Intl. Symp. Computer Architecture (ISCA) • 1x Intl. Conf. Image Processing (ICIP) • 1x ASPLOS (w/ GraphGen folks, under review)
Year 3 Plans • GraphGen extensions for BP applications • Multiple inference techniques • Extraction of “BP ISA” • Ops on arbitrary graphs • Efficient representation • Amplification work on UAV ensembles • Self-optimizing, collaborative SoCs • One-day “graph” workshop with GraphGen+UIUC
Accelerating Belief Propagation in Hardware SkandHurkat and José Martínez Computer Systems Laboratory Cornell University http://www.csl.cornell.edu/