Download
real time signal processing on embedded systems n.
Skip this Video
Loading SlideShow in 5 Seconds..
Real-time Signal Processing on Embedded Systems PowerPoint Presentation
Download Presentation
Real-time Signal Processing on Embedded Systems

Real-time Signal Processing on Embedded Systems

202 Vues Download Presentation
Télécharger la présentation

Real-time Signal Processing on Embedded Systems

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Real-time Signal Processing on Embedded Systems Advanced Cutting-edge Research Seminar I&III

  2. Practical Applications • Pedestrian Detection • FPGA-based system • Pedestrian Tracking • GPU-based system

  3. Hardware Architecture forHigh-Accuracy Real-Time Pedestrian Detection with CoHOG Features

  4. Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion

  5. Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion

  6. Pedestrian detection on automotive systems • Challenges: • Various appearances of pedestrians …Clothes’ shape and color, pose,etc. • Template-baseorsimplegradient-basemethoddoesnotperformhigh-accuracyrecognition • Viewpointmovement …allobjectsinanimagearemoving • Backgroundsubtractionorframesubtractioncannotbeused Arobustrecognitionmethodsuitableforpedestriansisrequired

  7. Pedestrian detection algorithms • Recent trend: • Combination of gradients and histograms • Gradient: robust for illumination and color change • Histogram: robust for deformation • Examples • Histograms of oriented gradients (HOG) • Co-occurrence histograms of oriented gradients (CoHOG)* • HOG-based method • Using pairs of oriented gradients • One of today’s best algorithms for pedestrian detection • However, Real-time execution is difficult to be achieved by software implementation(e.g. a few seconds are required for processing on a 320x240 image) Specialized hardware for real-time processing * T. Watanabe, S. Ito, and K. Yokoi, “Co-occurrence histograms of oriented gradients for pedestrian detection,” PSIVT2009

  8. Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion

  9. Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion

  10. Pedestrian detection using CoHOG Divideintosmallregions (BLOCKS) Pickuppairwisepixels Calculateco-occurrencehistograms Calculategradientorientations Co-occurrencehistogramoforientedgradients Offset1 CoHOGfeaturevector Classified by SVM Offset2 Repeatforvariouspositions of pixel pairs(called asOFFSETS) Variations of offsets(31 offsets) Gradientorientations

  11. Sliding window approach Feature vectors are extracted in a scan line order. Image size or window size is scaled to detect pedestrians in another scale. Detection procedure

  12. Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion

  13. Parallel execution ofCoHOG feature calculation • Large number of co-occurrence histograms must be calculated → All histograms can be calculated in parallel • Offsets • 31 parallel threads • Blocks • Horizontal:6parallel threads • Vertical: 12 parallel threads Large parallelism Weexecute31 parallel offsetsand6 horizontal block-threads=186 parallel threads Blocknumber:6x12=72 Processingperformanceisdrasticallyimproved! Offsetvariations:31

  14. Merging histogram calculation and SVM prediction Matrix size: 8x8=64 • Dimensions of CoHOG feature vector is very high • 64×31offsets×72blocks=about 140k dimensions • Large memory is required to store the feature vector • Many multiplications must be executed duringSVM prediction f(x)=sign(w・x+b) Blocknumber:6x12=72 Offsetvariations:31 Our proposal: Execute histogram calculation and SVM prediction simultaneously

  15. Merging histogram calculation and SVM prediction • Straightforwardapproach Histogram calculation +1toacorrespondingbin Scanimage i j +1 +1 +1 j SVM prediction i Histogram is generated ×wi,j ×wi,j ×wi,j ×wi,j Weightingvectorvalues + Inner product is calculated for SVM prediction

  16. Merging histogram calculation and SVM prediction • Proposed method Histogram calculation Scanimage i j +wi,j +wi,j SVM prediction +wi,j + Directlyaccumulateweightingvectorvalues LargememorytostorehistogramsandmanymultipliersforSVM prediction areunnecessary Circuitsizecanbedrasticallyreduced!

  17. Proposed architecture Gradientorientationimagegenerator CombinedmoduleforhistogramcalculationandSVMprediction Inputimage Shiftregisters Sobelfilter(horizontal) Orientationclassifier Linebuffers Framebuffer WxH Sobelfilter(vertical) Weighting vectorROMs 31offsets Controller Sub-windowdata 6blocks Accumulator Results

  18. Proposed architecture Gradientorientationimagegenerator CombinedmoduleforhistogramcalculationandSVMprediction • Parallelexecution • 31offsets×6blocks=186parallelthreads • MerginghistogramcalculationandSVMprediction • Nohistogrammemoryandmultipliers • Onlyweighting vectorROMsandanaccumulator Inputimage Shiftregisters Sobelfilter(horizontal) Orientationclassifier Linebuffers Framebuffer WxH Sobelfilter(vertical) Weighting vectorROMs 31offsets Controller Sub-windowdata 6blocks Accumulator Results Efficienthardwarearchitectureissuccessfully designedbyusingproposedmethods

  19. Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion

  20. FPGA implementation • Implementation result • Target FPGA: Xilinx Virtex-5 XC5VLS330T-2 Max delay: 5.997ns (Max frequency:167MHz) Capable for real-time processing on 38 fps 320x240 video sequence Our system can process139,166 sub-windows / second Intel Core i7 3.2GHz:about 1,100 sub-windows / second More than 100 times faster!

  21. Pedestrian detection system • FPGA board • Receives input images from host PC, and returns results of pedestrian detection • Xilinx Virtex-5 FPGA LX330T • PCI Express endpoint • DDR2 memory • HostPC • Transfers images captured by a camera, and displays detection results • CPU: Intel Core i7 3.2GHz • Camera: USB webcam (640x480 resolution) PCIExpress Detection result

  22. Outline • Introduction • Pedestrian detection using CoHOG features • Proposed hardware architecture • Parallel execution • Merging histogram calculation and SVM prediction • FPGA implementation • Conclusion

  23. Conclusion • High-performance and efficient hardware architecture for CoHOG-based pedestrian detection is proposed • Effectively exploits parallelism in CoHOG algorithm→ 186 parallel processing is realized • Drastically reduces circuit area (memory and multipliers) by proposing simultaneous execution of histogram calculation and SVM prediction • Achieves more than 100 times faster processing by FPGA implementation than CPU→ Capable for real-time processing on 38 fps 320x240 videosequence

  24. Parallel Implementation of Pedestrian Tracking Using Multiple Cues on GPGPU

  25. Outline • Introduction • Pedestrian Tracking using Multiple Cues • Parallel Implementation on NVIDIA GPU • Conclusion

  26. Outline • Introduction • Pedestrian Tracking using Multiple Cues • Parallel Implementation on NVIDIA GPU • Conclusion

  27. Introduction • Pedestrian recognition • Detection • Tracking Combination of 2 steps Track the pedestrians over the frames Scan entire image Input image Detection Tracking

  28. Introduction • Pedestrian Tracking • Particle Filter • HSV color histogram(K. Okuma et.al., ECCV2004) Succeed to track Fail to track Simple background Complex background HSV histogram within the rectangle

  29. Introduction Color information Red shirt Red car Gray gnd. Gray gnd. HSV histogram HSV histogram Shape information Combining both color and shape information

  30. Introduction • The contributions of this paper • New pedestrian tracking algorithm using both color and shape information based on particle filters • Parallel implementation on GPGPU for real-time processing

  31. Outline • Introduction • Pedestrian Tracking using Multiple Cues • Parallel Implementation on NVIDIA GPU • Conclusion

  32. Particle Filter (pedestrian tracking) Scatter particles Eliminate low likelihood particlesand replicate high likelihood particles. Measure the pedestrian likelihood Re-sampling (time t) Measurement Prediction Current frame (time t-1) Particle

  33. Particle Filter (pedestrian tracking) • To define pedestrian likelihood, • we use • Shape information…HOG feature • Color information…HSV histogram Measurement Re-sampling Prediction Current frame Particle

  34. Histograms of Oriented Gradients • Represent object shape information Calculate gradient orientation Aggregate gradient orientation of each block Map the vector on the feature space Learn beforehand by SVM Non-pedestrian Discriminant border Pedestrian HOG Feature space

  35. HSV Histogram • Represent object color information • Convert an input image into a HSV image • Calculate a HSV hist. • Calculate a Bhattacharyya dist. HSV color space Hue Bhattacharyya distance Saturation Value Reference HSV hist. Input image HSV feature space HSV histogram

  36. Pedestrian tracking using multiple cues Non-pedestrian Existing algorithm Reference HSV hist. Pedestrian Pedestrian likelihood Measurement Prediction HOG feature space HSV feature space Weighted coefficient [0,1]

  37. Tracking results • HOG+HSV (our proposed algorithm) • HSV only (K. Okuma et.al., ECCV2004) • HOG only

  38. Outline • Introduction • Pedestrian Tracking using Multiple Cues • Parallel Implementation on NVIDIA GPU • Conclusion

  39. NVIDIA GPU architecture SM SM SM • Streaming multiprocessors (SM) • 32-bit scalar processors (SP) • Shared memory • Read only cache • Device memory SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shrdmem Shrdmem Shrdmem Cache Cache Cache Device memory • In case of Tesla C1060, • 4GB Device memory • 30 streaming multiprocessors (total 240 SPs) • 1.3 GHz processor clock

  40. Implementation strategy SM SM SM • Run measurement process on GPU. • Almost 99% computation time SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shrdmem Shrdmem Shrdmem Cache Cache Cache Device memory Measurement Re-sampling Prediction Current frame

  41. Implementation strategy SM SM SM • Allocate each particle on SM • Independent process of each particle SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shrdmem Shrdmem Shrdmem Cache Cache Cache Device memory Measurement Re-sampling Prediction Current frame

  42. Implementation strategy SM SM SM • Exploit pixel level parallelism on SPs • Sync. among SPs is fast. SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Shrdmem Shrdmem Shrdmem Cache Cache Cache Device memory Measurement Re-sampling Prediction Current frame

  43. HSV likelihood calculation Transfer the results to the CPU memory Sum all the histograms Calculate the Bhattacharyya dist. Calculate HSV histogram on SPs per line Allocate each particle calculation to the SM Bhattacharyya distance Reference HSV hist. Input image HSV feature space HSV histogram

  44. HOG likelihood calculation Calculate the distance to the discriminant border Transfer the results to the CPU memory Sum histograms Calculate grad.and angle on SPs Calculate HOG histogram on SPs per some pixels Non-pedestrian Allocate each particle calculation to the SM Discriminant border Pedestrian HOG Feature space

  45. Processing time • GPU: NVIDIA Tesla C1060 • Number of multiprocessors: 30 • Total number of scalar processors: 240 • Comparing Intel Core i7 965 @ 3.2 GHz 13.9 times faster 113.6 fps

  46. Outline • Introduction • Pedestrian Tracking using Multiple Cues • Parallel Implementation on NVIDIA GPU • Conclusion

  47. Conclusion • Pedestrian tracking algorithm using HSV and HOG featuresis proposed • Real-time processing can be achieved by the parallel implementation using NVIDIA GPU

  48. Report subject (not mandatory) • What do you think about the advance of signal processing on embedded systems in the future? • Please submit the report by email to miya@is.naist.jp. • Please write your student ID and name. • Deadline: Feb 3rd 17:00

  49. レポート課題(必須ではない) • 組込みシステムにおける信号処理の今後について自由に述べよ(応用でも、やりたいことでも何でもOK) • 提出先 miya@is.naist.jp • IDと名前をメール本文に明記すること。 • 締切 2/317:00