1 / 26

Hanaa M. Hussain, Khaled Benkrid School of Engineering Edinburgh University, Edinburgh

School of Engineering University of Guelph. An Adaptive Implementation of a Dynamically Reconfigurable K-Nearest Neighbor Classifier On FPGA (2012). Hanaa M. Hussain, Khaled Benkrid School of Engineering Edinburgh University, Edinburgh Scotland, U.K. { h.hussain , k.benkrid }@ed.ac.uk.

rane
Télécharger la présentation

Hanaa M. Hussain, Khaled Benkrid School of Engineering Edinburgh University, Edinburgh

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. School of Engineering University of Guelph An Adaptive Implementation of a DynamicallyReconfigurable K-Nearest Neighbor ClassifierOn FPGA(2012) Hanaa M. Hussain, Khaled Benkrid School of Engineering Edinburgh University, Edinburgh Scotland, U.K. {h.hussain, k.benkrid}@ed.ac.uk HuseyinSeker Bio-Health Informatics Research Group De Montfort University, Leicester England, U.K. hseker@dmu.ac.uk DuniaJamma PhD Student Prof. ShawkiAriebi Course Instructor

  2. Outline • Introduction • Background of KNN • KNN and FPGA • The proposed architectures • Dynamic Partial Reconfigurable (DPR) part • The achievements • Advantages and Disadvantages • Conclusion

  3. Introduction • K-nearest neighbour (KNN) is a supervised classification technique • Applications of KNN (Data Mining, Image processing of satellite and medical images ... etc.) • KNN is known to be robust and simple to implement when dealing with data of small size • KNNperforms slowly when data are large and have high dimensions • KNN classifier is sensitive to parameter (K) Number of nearest neighbours • The selection of the label for the new query depends on voting on those K points.

  4. 1-Nearest Neighbor 3-Nearest Neighbor

  5. KNN Distance Methods • To calculate the distance between the new queries and the K’s points the Manhattan distance was used • The Manhattan is chosen in this work for its simplicity and lower cost compared to the Euclidean Xi: The new query’s matrix Yi: The trained sample's matrix K: # of samples

  6. KNN and FPGA • KNN classifiers can benefit from the parallelism offered by FPGAs • Distance computation is time consuming • Parallelizing the distance computation part • They propose two adaptive FPGA architectures (A1 and A2) of the KNN classifier, and compare the implementations of each one of them with an equivalent implementation running on a general purpose processor (GPP) • They propose a novel dynamic partial reconfiguration (DPR) architecture of the KNN classifier for K

  7. Used tools • Hardware implementation: • The hardware implementation targeted the ML 403 platform board which has a Xilinx XC4VFX12 FPGA chip on it • JTAG cable • Xilinx PlanAhead 12.2 tool along with Xilinx partial reconfiguration flow (DPR) • Software implementation: • Matlab (R2009b) bioinformatics toolbox • Intel Pentium Dual-Core E5300, running at 2.60 GHz and 3 GB RAM workstation • Using of Verilog as HDL configuration language

  8. The used data L M N Factors M: training samples N: Training Vectors L: Label Y: trained data X: New query Y = X =

  9. The proposed architectures • The KNN classifier has been divided into three modular blocks (Distance computation, KNN finder, and Query label finder) + FIFO M-Dist PEs K-KNN PEs PE = M + K +1 A1 Architecture N-Dist PEs N- KNN PEs PE = 2N +1 A2 Architecture

  10. The functionality of PEs Previous accumulative distance Dist 2 L2 Yi Dist1 Min L1 Max

  11. Distance computation • The distance computations are made in parallel every clock cycle • The latency of Dist PE is M cycles • A1: the throughput is one distance result every clock cycle A2: the throughput is one distance result every M clock cycle Complete Training

  12. Dist PE inner architecture

  13. K-Nearest Neighbour Finder • This block becomes active after M clock cycles The function of this block is completed after an M +N clock cycle

  14. Dynamic Partial Reconfigurable part (DPR) • The value of K parameter was dynamically reconfigured, when N, M, B, and C are fixed for a given classification problem. • Two cores (A1) • Distance computation core - Static • KNN core (KNN PE, Label PE) - Dynamic • The size of the RP is made large enough to accommodate the logic resources required by largest K • Advantages: saving in reconfiguration time, Power • Difficulties: • Limitations (resources), the cost, the verification of the interfaces between the static region and RP for all RMs

  15. The achievement • This DPR implementation offers 5x speed-up in the reconfiguration time of a KNN classifier on FPGA

  16. Advantages • Variation which allows the user to select the most appropriate architecture for the targeted application (available resources, performance, cost) • Enhancement in Performance • Parallelism-speed up • DPR-reconfigurable time • Efficiency in term of KNN performance - the DPR for K • Using the Manhattan’s theorem (simplicity and lower cost)

  17. Disadvantages • The amount of used resources • The not worthy achieved speed (5X) for DPR part comparing to the amount of used resources and effort • Constraints in A2 architecture and the DPR (area) • The latency due to pipelining manner of producing the results

  18. Conclusion • Efficient design for different KNN classifier applications • Two architectures A1 and A2 and the user can choose one of them • A1 can be used to target applications whereby N>>M, whereas A2 is used to target applications whereby N<<M • DPR part (could be reproduced with ICAP) • Achievements comparing to GPP • Speedup by 76X for A1 and 68X by A2 • Speedup by 5X in DPR

  19. Any question?

  20. Extra Slides

  21. Memory • Each FIFO is associated with one distance PE • The query vectors gets streamed to the PEs to be stored in registers- they will be required every clock cycle Where: B is the sample wordlength M is the number of samples N is the number of training vectors

  22. Class Label Finder • The block consists mainly of C counters each associated with one of the class labels • The hardware resources depends on user defined parameters K and C • The architecture of this block is identical in both A1 and A2

  23. A2 Architecture • N FIFOs are used to store the training set with each of them having a depth of M • The class labels get streamed and stored in registers within the distance PEs • A2 requires more CLB slices than A1, when N, M, and K are the same • the first distance result becomes ready after all samples are processed i.e., after M clock cycles

  24. DPR for K Maximum BW for JTAG is 66Mbps Maximum BW for ICAP is 3.2Gbps ICAP > 48x JTAG

  25. Dynamic Partial Reconfigurable part (DPR) The JTAG was used (BW = 66Mbps) Using of ICAP instead would decrease the configuration time (BW = 3.2Gbps) 26

More Related