an fpga co processor for statistical pattern recognition applications n.
Skip this Video
Loading SlideShow in 5 Seconds..
An FPGA Co-Processor for Statistical Pattern Recognition Applications PowerPoint Presentation
Download Presentation
An FPGA Co-Processor for Statistical Pattern Recognition Applications

An FPGA Co-Processor for Statistical Pattern Recognition Applications

141 Views Download Presentation
Download Presentation

An FPGA Co-Processor for Statistical Pattern Recognition Applications

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. An FPGA Co-Processor for Statistical Pattern Recognition Applications Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College of Engineering Department of Electrical and Computer Engineering

  2. Project Goal • To develop and implement a real-time image content analysis system using an FPGA Co-processor. Isaacs 248

  3. Outline • Pattern Recognition • Image Database • System Layout • Image Content Analysis • Hardware Implementation • Conclusions • Future Work Isaacs 248

  4. Pattern Recognition Overview • Pattern Recognition: “the act of taking raw data and taking an action based on the category of the pattern.” • Common Applications: speech recognition, fingerprint identification (biometrics), DNA sequence identification • Related Terminology: • Machine Learning: The ability of a machine to improve its performance based on previous results. • Machine Understanding: acting on the intentions of the user generating the data. • Related Fields: artificial intelligence, signal processing and discipline-specific research (e.g., target recognition, speech recognition, natural language processing). Isaacs 248

  5. Start Collect Data Choose Features Choose Model Train Classifier Evaluate Classifier End Design Flow • Key issues: • “There is no data like more data.” • Perceptually-meaningful features? • How do we find the best model? • How do we estimate parameters? • How do we evaluate performance? Isaacs 248

  6. Common Misconceptions • I got 100% accuracy on... • Almost any algorithm works some of the time, but few real-world problems have ever been completely solved. • Training on the evaluation data is forbidden. • Once you use evaluation data, you should discard it. • My algorithm is better because... • Statistical significance and experimental design play a big role in determining the validity of a result. • There is always some probability a random choice of an algorithm will produce a better result. Isaacs 248

  7. System Layout View Source <…jpg> URL Gigabit Ethernet Spider Dual P4 - XP Analyze and Classify 32/64 bit PCI Store Original Image and Class Vector Isaacs 248

  8. Classification System Spider (Webbot) Text Content Classifier Image Classifier Image Text Search Hyperlinks HTML Download Video Classifier Video Audio Audio Classifier WEB URL URL List URL Feature Vector Current research focused on RED path Isaacs 248

  9. Image Database: Web-Mining for Images • Images are an important class of data. • The Web is presently regarded as the largest global multimedia data repository, encompassing different types of images in addition to other multimedia data types. • To search the web for images, a crawler(also called a spider, mobile agent, or bot) is utilized. • src="home_page/images/rover_spin.jpg" alt="&quot; • width="124" height="70"></a><a • href="images/home_page/pgt_in_use.jpg"><img src="images/home_page/pgt_in_use_small.jpg" • The agent searches HTML documents for strings of type jpg, gif, and tif, stores the image and url. Isaacs 248

  10. [root@Nebula getURL]# ./getImages Enter URL: ./getURL > out.txt images/index_01.jpg images/index_02_new_2.jpg images/index_03.jpg images/index_04.jpg images/index_05.jpg images/index_06.jpg images/index_07.jpg images/index_08_new.jpg images/index_01.jpg length: 19 ./getURL > images/engA.jpg images/index_02_new_2.jpg length: 25 ./getURL > images/engB.jpg images/index_03.jpg length: 19 ./getURL > images/engC.jpg images/index_04.jpg length: 19 ./getURL > images/engD.jpg images/index_05.jpg length: 19 ./getURL > images/engE.jpg images/index_06.jpg length: 19 ./getURL > images/engF.jpg images/index_07.jpg length: 19 ./getURL > images/engG.jpg images/index_08_new.jpg length: 23 ./getURL > images/engH.jpg Web Mining Example: Software Process Isaacs 248

  11. Web Mining Example Images • Example results from our “getImages” software are shown to the right • These are from the website (more interesting than the ones from our engineering site) • Can prove useful when looking for faces or particular objects, such as the space shuttle • We are able to search either a particular group of sites, randomly search all known sites (not limited to US or Western Europe) , or search all pages within a certain domain, say Isaacs 248

  12. Example Image Objects • These are sample objects that could be the target objects of a specific search. These particular objects are from the COIL database. • They are used to train the analysis system Isaacs 248

  13. Image Analysis Implementation Model for Image Recognition SIGNAL PREPROCESING X* FEATURE EXTRACTION Y PATTERN RECOGNITION W* MATCHED VECTOR W Q Stored Patterns Observed input, RGB image X Recognized Image Feature Extraction is the process of determining a vector Y that represents an observed input X that enables accurate implementation of pattern recognition schemes. For this process, a mapping takes place such that X* is mapped to a vector Y. Isaacs 248

  14. 5x5 Scaled Spatial FiltersUsed for Feature Extraction % Gabor Filter 1 gabor1 = [-16 -19 -20 -19 -16;... -36 -43 -46 -43 -36;... 0 0 0 0 0;... 36 43 46 43 36;... 16 19 20 19 16]; gaborDiv = 1/1000; mask = zeros(5,5,1); mask(:,:,1) = gabor1; maskDiv = [gaborDiv]; Isaacs 248

  15. y where ( t ) is the " mother " wavelet ò = × y - W ( a , b ) f ( t ) ( ) dt t b a Scale1 Scale2 Wavelet Review Wavelet Transform: The Wavelet Transform has variable window lengths that allow it greater flexibility when analyzing signals. Therefore, it becomes an attractive tool for signal analysis. Isaacs 248

  16. Given a basis function : The dilation operation is indicated by : Then, a mother wavelet is defined by : Wavelet Review Isaacs 248

  17. FIR Coefficients for Daubechies “7” g(n) : high pass filter h(n) : low pass filter Isaacs 248

  18. FIR Implementation • Approximation is down-sampled and input to next level. • Detail is stored as coefficients. Isaacs 248

  19. The Spectral Histogram Representation • Properties • A spectral histogram is translation invariant. • A spectral histogram is a nonlinear operator. • With sufficient filters, a spectral histogram can uniquely represent any image up to a translation. • All the images sharing a spectral histogram define an equivalence class. • Preprocessing step in classification • Choose N image filter kernels to convolve with the image. • Perform the convolutions, generating n resultant responses. • For each response, generate a response image histogram. • Concatenate each of the histograms and send to the classifier. Isaacs 248

  20. The Spectral Histogram Representation • 1st step – choose N image filter kernels to convolve with the image. • Filter kernels chosen carefully from several image filter banks including intensity: δ(x,y), differencing or gradient filters, laplacian of gaussian filters: • Where t determines the scale of the filter, and finally the gabor filter defined by sine and cosine components: • 2nd step – perform the convolutions, generating n resultant responses. • To calculate each response pixel value, roughly m x n multiplies and adds must be performed, where m x n is the dimension of chosen kernel. Here m = n. • Thus for an M x N image a total of [k*M*N*(n)4]multiplies and adds must be performed, where subscript k implies the kth filter. Isaacs 248

  21. Feature Vector • Our feature vector is comprised of the spectral histograms of the images resulting from filtering • The feature vector is laid out as follows Gabor Features | Haar Features | LoG Features| Wavelet Features Isaacs 248

  22. Pattern Recognition:Neural Decision Tree • After the feature vectors have been created they are sent back to the host PC and tested against a Neural Decision Tree to determine the presence of selected objects or textures, e.g. faces, cars, or brick. Isaacs 248

  23. input output hidden x0 S0 Y0 . . . . . . . . . Feature Vector Number of Branches at Node n Yk x80 S7 k i j FeedforwardNeural Network Model Artificial Neural Network Model • Each node in the tree is comprised of an artificial neural network that is trained to separate the input into k classes. As the tree is traversed the leaf nodes represent objects or textures of interest. Isaacs 248

  24. Other Pattern Recognition Techniques • Density Estimation • Histogram Approach • Parzen-window method • Kn-Nearest-Neighbor Estimation • Principal Components Analysis • Fisher Linear Discriminant • MDA Our future work aims at creating a library of generic modules implementing all of these discrimination techniques. These methods were supposed to have been completed prior to this submission but have been delayed. Isaacs 248

  25. Summary of These Techniques • Kn-Nearest-Neighbor Estimation • To estimate p(x) from n training samples, we center a cell about x and let it grow until it captures kn samples, where kn is some specified function of n. • These samples are the kn nearest-neighbors of x. • If the density is high near x, the cell will be relatively small • Therefore, good resolution. • Component Analysis and Discriminants • How to reduce excessive dimensionality? Answer: Combine features. • Linear methods project high-dimensional data onto lower dimensional space. • Principal Components Analysis (PCA) - seeks the projection which best represents the data in a least-square sense. • Fisher Linear Discriminant - seeks the projection that best separates the data in a least-square sense Isaacs 248

  26. Summary of These Techniques Continued • Generalized Linear Discriminant Functions • The linear discriminant function g(x) can be written as • By adding d(d+1)/2 additional terms involving the products of pairs of components of x, we obtain the quadratic discriminant function • The separating surface defined by g(x)=0 is a second-degree or hyperquadric surface. • By continuing to add terms such as we can obtain the class of polynomial discriminant functions. Isaacs 248

  27. So, Why Move to Hardware? • Speed of classification is limited in software and with such a large database (Web), the faster the better. • For example, given a 128x128 8-bit gray scale image, the number of computations required to generate the spectral histogram for 10 5x5 filters is roughly 410k multiplies and 410k adds. • This is the main computational bottleneck. • A general purpose -processor can only perform one or two multiply/adds simultaneously (depending on the processor) • Some FPGAs allow for up to 88 simultaneous multiply operations and many adds to be performed in one or two clock cycles. • The filtering algorithm is inherently parallelizable, therefore well suited for a pipelined hardware implementation. Isaacs 248

  28. Target Hardware:Avnet’s Virtex II Pro Board • Uses Virtex II Pro XC2VP20 • Many Options for I/O. • 32 Bit PCI Bus has Data Throughput of Over 100 MB per Second. Isaacs 248

  29. Hardware vs. Software Tradeoffs • Not all tasks have such a drastic speedup in hardware. • Memory Accesses • Only one address per clock cycle can be read in SDRAM, Flash, or SRAM. • We require more than 32-bits per action, so we waist time reading data. • Possible to store more data in BRAM to create an initial data stack that would overcome future read times. • Combine hardware and software for optimal ease of design and speed of execution. • Need to determine optimal compromise. Isaacs 248

  30. Hardware Designs: Preliminary Test Designs and Final Implementations Isaacs 248

  31. 11x11 Filter Model Top Level This 4 11x11 Filter bank design was the first test design. We felt that an 11x11 kernel would allow for the best representation of our Filter bank set. Isaacs 248

  32. Filter Model: One Filter Bank Isaacs 248

  33. 11x11 Filter MAC System Isaacs 248

  34. Filter Model: Filter MAC System An addressable shift register (ASR) implements the input delay buffer. The address port runs n times faster than the data port, where n is the number of filter taps. The filter coefficients are stored in a ROM configured to use block memory. A down sampler reduces the capture register sample period to the output sample period. The block is configured with latency to obtain the most efficient hardware implementation. The down sampling rate is equal to the coefficient array length. A comparator generates the reset and enable pulse for the accumulator and capture register. The pulse is asserted when the address is 0 and is delayed to account for pipeline stages. Isaacs 248

  35. Device Utilization Summary:Four 11x11 Image Filters • Selected Device : 2vp20ff896-6 • Number of Slices: 7913 out of 9280 85% • Number of Slice Flip Flops: 10644 out of 18560 57% • Number of 4 input LUTs: 8770 out of 18560 47% • Number of bonded IOBs: 67 out of 556 12% • Number of GCLKs: 1 out of 16 6% • ============================================= • TIMING REPORT • Clock Information: • -----------------------------------+------------------------+-------+ • Clock Signal | Clock buffer(FF name) | Load | • -----------------------------------+------------------------+-------+ • clk | BUFGP | 15322 | • -----------------------------------+------------------------+-------+ • Timing Summary: • --------------- • Speed Grade: -6 • Minimum period: 4.542ns (Maximum Frequency: 220.192MHz) • Minimum input arrival time before clock: 3.006ns • Maximum output required time after clock: 3.615ns • Maximum combinational path delay: No path found The4 11x11 Filter bank design device utilization left little room for other logic our target device. Since, we felt that an 11x11 kernel would allow for the best representation of our Filter bank set we decided to target additional devices to leave our options open. Isaacs 248

  36. Device Utilization Summary:Six 11x11 Image Filters with New Target • Selected Device : 4vsx55ff1148-11 • Number of Slices: 9543 out of 24576 38% • Number of Slice Flip Flops: 11616 out of 49152 23% • Number of 4 input LUTs: 9816 out of 49152 19% • Number of bonded IOBs: 99 out of 642 15% • Number of GCLKs: 1 out of 32 3% • Number of DSP48s: 66 out of 512 12% • ============================================== • TIMING REPORT • Clock Information: • -----------------------------------+------------------------+-------+ • Clock Signal | Clock buffer(FF name) | Load | • -----------------------------------+------------------------+-------+ • clk | BUFGP | 18732 | • -----------------------------------+------------------------+-------+ • Timing Summary: • --------------- • Speed Grade: -11 •   Minimum period: 6.632ns (Maximum Frequency: 150.790MHz) • Minimum input arrival time before clock: 3.217ns • Maximum output required time after clock: 3.546ns • Maximum combinational path delay: No path found This 6 11x11 Filter bank design device utilization left more room for other logic our new target device. However, we did not possess this device and therefore had to consider our in house options. Thus, we moved toward a more V2P20 friendly design. Isaacs 248

  37. 5x5 Spectral Histogram System Top:The Best Fit Option Isaacs 248

  38. Device Utilization Summary: 5x5 with 10 Histograms • Selected Device : 2vp20ff896-6 • Number of Slices: 8775 out of 9280 94% • Number of Slice Flip Flops: 10768 out of 18560 58% • Number of 4 input LUTs: 10274 out of 18560 55% • Number of bonded IOBs: 343 out of 556 61% • Number of MULT18X18s: 50 out of 88 56% • Number of GCLKs: 1 out of 16 6% • =============================================== • TIMING REPORT • Clock Information: • -----------------------------------+------------------------+-------+ • Clock Signal | Clock buffer(FF name) | Load | • -----------------------------------+------------------------+-------+ • clk | BUFGP | 16755 | • -----------------------------------+------------------------+-------+ • Timing Summary: • --------------- • Speed Grade: -6 • Minimum period: 4.758ns (Maximum Frequency: 210.172MHz) • Minimum input arrival time before clock: 2.987ns • Maximum output required time after clock: 6.322ns • Maximum combinational path delay: No path found Note that a pipelined implementation without explicit use of the embedded multipliers exceeds the number of slices at 108%. Isaacs 248

  39. 5x5 Filter Systems for Spectral Histogram Isaacs 248

  40. 5x5 Gabor Filter Subsystem Isaacs 248

  41. Histogram Subsystem Isaacs 248

  42. Mcode Block for Histogram Bin-Sorter • function [bin10,bin9,bin8,bin7,bin6,bin5,bin4,bin3,bin2,bin1] = xhist(input1) • bin10 = 0;bin9 = 0;bin8 = 0;bin7 = 0;bin6 = 0; • bin5 = 0;bin4 = 0;bin3 = 0;bin2 = 0;bin1 = 0; • if input1 >= 224; bin10 = 1; • elseif input1 >=180; bin9 = 1; • elseif input1 >=158; bin8 = 1; • elseif input1 >=136; bin7 = 1; • elseif input1 >=114; bin6 = 1; • elseif input1 >=92; bin5 = 1; • elseif input1 >=70; bin4 = 1; • elseif input1 >=48; bin3 = 1; • elseif input1 >=26; bin2 = 1; • else bin1 = 1; end; Isaacs 248

  43. ModelSim Waveform Snapshot Histogram results for Gabor Filter 2 with Bin Ranges shown on the previous slide. Also, note that there is a 16 clock cycle delay before the bin sort result is posted. Isaacs 248

  44. Simulation Filter Results Isaacs 248

  45. Simulation Histogram Results Isaacs 248

  46. Conclusions/Future Work • In addition to the other pattern recognition techniques mentioned above, we intend optimize the PC/FPGA interfacing to create our own low-cost integrated system. • Our problems currently reside on the PCI interface design shipped with the Avnet Development Board. We are working hard to resolve this issue, but in the end we may have to consider another board. • We also wish to time the results (how many images can we process per second); is it real-time? • Possibly move to a board with better interfacing tools, as well as faster interfacing via PCI-X or PCI express, or DMA capabilities. • Finally, optimize calculating efficiency of the image analysis algorithm, i.e., consider a multi-stage pipeline with more efficient memory access algorithms. • The ultimate goal is to do real time search and recognition utilizing FPGAs as co-processors. Isaacs 248