1 / 14

About Me

About Me. Swaroop Butala MSCS – graduating in Dec 09 Specialization: Systems and Databases Interests: Learning new technologies Application of technology to financial sectors. Coclustering Documents and words using Bipartite Spectral Graph Partitioning.

zanthe
Télécharger la présentation

About Me

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. About Me Swaroop Butala • MSCS – graduating in Dec 09 • Specialization: Systems and Databases • Interests: • Learning new technologies • Application of technology to financial sectors

  2. Coclustering Documents and words using BipartiteSpectral Graph Partitioning Author: Inderjit S. Dhillon Department of Computer Sciences University of Texas, Austin Presented by: Swaroop Butala, Fall 2008

  3. Clustering and Current Solutions(1) Clustering: • Collection of Objects • Future Navigation and Searches

  4. Clustering and Current Solutions(2) • Current Solutions • K-means • Fuzzy C-means • Hierarchical clustering • Document Clustering • Word Clustering

  5. Document Clustering • Problem • Vector Space Model • Extract Unique Content-Bearing Words • Word by Document matrix • Existing Solutions: • K-means Algorithm • Self organized maps • Computationally Prohibitive

  6. Word Clustering • Basis of documents in which they Co-occur • Words that typically associate together in documents should be associated with similar concepts. • Uses • Automatic Classification of documents

  7. Co-clustering Documents and Words • Novel Idea • Duality of word and document clustering • Use of Bipartite Graphs • The clustering problem can now be posed as a partitioning problem • Solution: • Spectral Co-Clustering algorithm

  8. Bipartite Graph(1) • No Edges between Words or between Documents

  9. Bipartite Graphs(2) Adjacency Matrix:

  10. The Partitioning Problem • Minimum cut vertex partitions in Bipartite Graphs • Optimal Solution is NP–Complete • Solutions: KL and FM algorithms exist • Spectral Algorithm gives a good global solution • Better solutions than KL and FM algorithms

  11. Graph Partitioning • To find equally sized vertex subsets such that the cut is minimum • Eigenvectors as optimal partition vectors • Since the discrete solution is NP complete • The Bipartitioning Algorithm

  12. Conclusions • A novel idea of Coclustering Words and Documents together is proposed • A real relaxation to optimal solution of partitioning is provided • Algorithm works well on real examples

  13. Critique • Actual motivation for combining document and word clustering is not stated • The solution is not completely optimal since the problem of Partitioning is NP complete

  14. Questions?

More Related