1 / 18

COLLABORATIVE CLASSIFIER AGENTS

COLLABORATIVE CLASSIFIER AGENTS. Weimao Ke, Javed Mostafa, and Yueyu Fu {wke, jm, yufu}@indiana.edu Laboratory of Applied Informatics Research Indiana University, Bloomington. Studying the Impact of Learning in Distributed Document Classification. Related Areas.

debraz
Télécharger la présentation

COLLABORATIVE CLASSIFIER AGENTS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COLLABORATIVE CLASSIFIER AGENTS Weimao Ke, Javed Mostafa, and Yueyu Fu {wke, jm, yufu}@indiana.edu Laboratory of Applied Informatics Research Indiana University, Bloomington Studying the Impact of Learning in Distributed Document Classification

  2. Related Areas • Text Classification (i.e. categorization) • Information Retrieval • Digital library • Indexing, cataloging, filtering, etc. •  Distributed Text Classification • Multi-Agent Modeling • Machine Learning: • Learning Algorithms • Multi-Agent Modeling •  Modeling Agent Collaboration

  3. TraditionalCentralized Classification A Centralized Classifier Classifier Knowledge

  4. Centralized? • Global repository is rarely realistic • Scalability • Intellectual property restrictions • … ?

  5. Knowledge Distribution Arts Sciences Distributed repositories History Politics

  6. Distributed Classification(without Collaboration) Distributed Classification without Collaboration Classifier Knowledge Doc Assignment

  7. DISTRIBUTED CLASSIFICATIONWITH COLLABORATION Distributed Classification with Collaboration Classifier Knowledge Doc Assignment Collaboration

  8. Research Questions • Motivation: Why distributed text classification? • Knowledge is distributed • No global knowledge repository • e.g. individual digital libraries • Advantages of distributed methods: • fault tolerance, adaptability, flexibility, privacy, etc. • Agents simulate distributed classifiers • Problems/Questions • Agents have to learn and collaborate. But how? • Effectiveness and efficiency of agent collaboration?

  9. Distributed + Collaboration 2 1 2 Doc Distributor 1 3 Documents # Agent (#: collaboration range) 4 Collaboration request Collaboration response

  10. Methodology • Compare • Traditional/centralized approach (upper-bound) • Distributed approach without collaboration (lower-bound) • Distributed approach with collaboration • Two learning/collaboration algorithms: • Algorithm 1: Pursuit Learning • Algorithm 2: Nearest Centroid Learning • Two parameters • r: Exploration Rate • g: Maximum Collaboration Range • Evaluation Measure • Effectiveness: precision, recall, F measure • Efficiency: time for classification

  11. Experiment & Evaluation • Reuters Corpus Volumes 1 (RCV1) • Training set: 6,394 documents • Test set: 2,500 documents • Feature selection: 4,084 unique terms • Evaluation measures • Precision = a / (a + b) • Recall = a / (a + c) • F1 = 2 * P * R / (P + R)

  12. Experimental Run Knowledge division 1 agent • Without collaboration Upper bound baseline 2 agents 4 agents … 37 agents Lower bound baseline

  13. Results - Effectiveness BaselinesWithout Collaboration

  14. Experimental Run Knowledge division 1 agent • With collaboration • Pursuit leaning • Nearest centroid learning • Parameters • Exploration rate • Max collaboration range Upper bound baseline 2 agents 4 agents … 37 agents Lower bound baseline

  15. Results – Effectiveness of Learning PL optimal zone random

  16. Results – Classification efficiency

  17. Results – Efficiency vs. Effectiveness

  18. Summary • Classification effectiveness decreases dramatically when knowledge becomes increasingly distributed. • Pursuit Learning • Efficient – without analyzing contents • Effective, although not content sensitive • “The Pursuit Learning approach did not depend on document content. By acquiring knowledge through reinforcements based on collaborations this algorithm was able to construct/build paths for documents to find relevant classifiers effectively and efficiently.” • Nearest Centroid Learning • Inefficient – analyzing content • Effective • Future work • Other text collections • Knowledge overlap among the agents • Local neighborhood

More Related