100 likes | 253 Vues
This paper explores the implementation of hierarchical clustering algorithms in hardware to improve performance metrics significantly. Traditional clustering methods are computationally intensive, often unsuitable for large datasets with millions of items. By leveraging hardware acceleration, such as FPGAs, we aim to minimize processing times and enhance efficiency. The discussion includes existing works, challenges in top-down hierarchical clustering implementations, and potential for future exploration in high-volume datastreams. This approach could revolutionize applications in biology, internet data analysis, and more.
E N D
Dan Legorreta, Moshe Looks, Shobana Padmanabhan CSE 560 Oct 2005 Application Performance throughHardware Acceleration
[Hierarchical] Clustering [in Hardware] • Clustering • Assign points in a space to non-overlapping clusters • Minimize inter-cluster distances • Maximize intra-cluster distances • Hierarchical Clustering • Cluster the clusters; generates a tree (dendogram) showing hierarchical structure of the data • Agglomerative (bottom-up) or Partitioning (top-down) • Why do it in hardware? • Clustering often applied to biology or internet data with millions of items to cluster, and thousands of dimensions • Clustering may be applied to high-volume datastreams • Clustering algorithms are slow ~ O(n2d) or worse
What’s Been Done? • K-means, the most popular flat clustering algorithm, has been implemented in hardware: • M. Estlick, M. Leeser, J. Theiler, and J. J. Szymanski, “Algorithmic Transformations in the Implementation of K-means Clustering on Reconfigurable Hardware” (FPGA2001). • 17 citations, incl. other hardware implementations of flat clustering algorithms • Hierarchical Clustering • M.Y. Niamat, D. Bitter, and M.M. Jamali, “FPGA Implementation of Hierarchical Clustering Algorithms” (ISCAS1998). • Simple agglomerative clustering on 8 Xilinx 4003APC84 FPGAs • They just coded in VHDL and simulated it; no results given! • No other papers found • No known experimental results or implementations of top-down hierarchical clustering in hardware!
Liquid architecture platform Workstation program FPGA gcc SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Clustering application FPX LEON 001010 110110 001110 • LEON - SPARC8 compatible & • Open soft core
Application runtime Workstation Non-intrusive, cycle-accurate profiling from hardware implementation FPGA SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Statistics Module Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Request Timings FPX dotproduct 70% LEON
Improve performance through hardware implementation + dot product
Improve performance through hardware implementation Workstation program FPX FPGA gcc LEON SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller + dot product Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface APB
Hardware acceleration Workstation program FPX FPGA gcc LEON SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller + dot product Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface APB 001010 110110 001110
Dot product implementation Core Cache Controller Address/ Data bus AHB I-CACHE D-CACHE FPGA 0x800000D0 #2 #3 bitV #1 LEON 0x800000D4 Dot product circuit 001010 110110 001110 #2 #3 bitV #1 APB Memory Controller 0x800000D8 #2 #3 bitV #1 0x800000DC stat re result Command Controller
Plan • Changes: • APB device with memory-mapped registers, instead of changing compiler. • Due to the overhead with APB, we are planning to also look at co-processor interface. • New schedule: • APB implementation, including dot-product, this week. • Co-processor interface, as much as possible, from next week.