TagLearner: A P2P Classifier Learning System

TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents Haimonti Dutta1, Xianshu Zhu2, Tushar Muhale2, Hillol Kargupta2, Kirk Borne3, Codrina Lauth4, Florian Holz5, and Gerherd Heyer5 1Columbia University 2University of Maryland, Baltimore County 3George Mason University 4Fraunhofer Institute for Intelligent Analysis and Information Systems 5University of Leipzig

Outline • Introduction and Motivation • Related Work • TagLearner • Distributed Classifier-learning Algorithm • Experiments • Conclusion and Future Work

Introduction • Large Online Document Repositories: • Online Newspapers, Digital Libraries, etc. • Growing in size • Text categorization on the repositories: • No automated text classification mechanism • Performed by authorities, such as librarians Impractical

Introduction (cont.) • Collaborative tagging • Del.icio.us, Flickr, Google image labeler • Recruit web users to add tags to a resource • Help to utilize power of people’s knowledge • Pros and cons • Improve web search result, help on classification • Not support by most online text repositories • Lack of control • Absence of standard keywords • Errors in tagging due to spelling errors • Harder to manage due to increased content diversity

Motivation • Provide automated classification service • Utilize collaborative effort of users • Collaborative tagging in Peer-to-Peer network • Without repositories’ support P2P Classifier learning system

Related Work • Collaborative tagging: • Recommendation System (Tso-Sutter et al.) • Web search (Yahia et al.) • Classification accuracy (Brooks et al.) • Distributed Linear Programming: • Distributed Simplex Algorithm (Dutta et al.)

TagLearner: A P2P Classifier Learning System

TagLearner • Register service by creating a tagging group • Maintain a tagging group for this service • Predefined Labels used for tagging • Features for classification • Group members • Learnt classifier model Service provider: provide P2P classifier learning service

TagLearner • Interface: - Join or leave the tagging group - Tag the web documents • Distributed classifier learning algorithm Client side browser plugin

Class 2 Class 1 Classifier Design by Linear Programming • Classification problem can be framed as a linear programming problem :feature vector of k-th instance W : weight vector We want to find a W such that: W can be found by minimizing the error

Classifier Design by Linear Programming • Maximize: Subject to: where Use Simplex Method to solve it!

+ + = w 4 w 2 w 0 . 5 1 2 3 Distributed Linear Programming • Distributed data • Each user only has a collection of constraints • Objective function: • Constraints: Simplex Tableau

Distributed Simplex Algorithm User A User B User C User D Each user has different constraints, but wants to solve the same objective function.

Distributed Simplex Algorithm User A User B User C User D

Distributed Simplex Algorithm User A User B User C User D 0.5/3=1/6 0.5/2=1/4 0.5/7=1/14 0.5/3=1/6 0.5/6.5=13/4

Experimental Results • Distributed Data Mining Toolkit (DDMT) • “NSF Research Awards Abstracts 1990-2003” data set from the UCI Machine Learning Repository • We only consider abstracts belonging to Earth and Mathematical sciences • Features used for classification do not rely on collaboratively generated annotations.

Experiments (cont.) Figure 1. Communication cost versus the number of nodes in the network

Experiments (cont.)

Conclusion and Future Work • Conclusion: • P2P classifier learning system prototype • Scalable distributed classification algorithm based on linear programming • Future work: • extension of the classification algorithm for multi-class classification problems • Improve classification accuracy

Thank you !Questions ?

TagLearner: A P2P Classifier Learning System

TagLearner: A P2P Classifier Learning System

Presentation Transcript

Nervous System

Automatic Text Classification

Export documents

Unit 3 Legal Systems

Supervised learning for text

Author-Topic Models for Large Text Corpora

Extraction of Text Objects in Video Documents: Recent Progress

NEW GOVERNMENT ACCOUNTING SYSTEM FLOW OF DOCUMENTS

Information Extraction

4.RL.1

PROCEDURE TEXT

300+ Frequently Used Templates

Title

Nazife Dimililer Supervisor: Asst. Prof. Dr. Ekrem Varo ğlu

Beyond Text

Text Classification

依據群體模組監控之網路群體學習系統 Group model monitor on network group learning system

Title Simulation and Modeling Text Simulation, Modeling And Analysis By

Linda K. Kosnik, RN, MSN, ANP, CEN Chief Nursing Officer,Overlook Hospital Atlantic Health System

Transfer Documents