1 / 15

Innovative Information Filtering Models for Extracting Valuable Data from Vast Datasets

In an era of overwhelming data, traditional methods fall short in helping users find valuable information. This project explores a variety of filtering models to enhance information searches across different domains such as movies, music, legal citations, and more. We analyze diverse filtering criteria including content-based, context-based, and user-preference-based methods. By leveraging metadata architectures and user interaction models, this research aims to streamline the information retrieval process, allowing users to efficiently filter out irrelevant data and access valuable content.

Télécharger la présentation

Innovative Information Filtering Models for Extracting Valuable Data from Vast Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Value filtering • Problem • Vast amount of data • Traditional method is not enough • Hard to find the “valuable” information

  2. Filter Goal • Help users with information search • Attach “value” to documents • “Filter out” garbage information

  3. Outline • Filtering model • User-Filter interaction model • Information domain • Movie, Music, Web, Legal Cite, ... • Filtering criteria • Content-based, Context-based, ... • Metadata architecture • “Value” distribution

  4. Filtering model • Query/Response model Filter Query Information Response

  5. Information Source Information Source Filtering model • Push model Filter Useful Information

  6. Filtering model This way, Please! • Guide model ?

  7. Information domain • Different domain -> different method • Legal cite, Entertainment, Newsgroup, ... • Utility/cost ex) Legal cite : Defense System : Movie • Technological barrier ex) Text document : Movie

  8. Filtering criteria • Content-based • Access history-based • Context-based • User-preference based

  9. Content-based • Similarity of document to user’s interest • Keyword vector, LSI (Latent Semantic Index) • Most systems • Search Engines • Tapestry (Xerox Palo Alto), FAB (Stanford), WebWatcher (CMU), …

  10. Access history-based • Analysis of user’s access pattern • Personal access pattern • Personalized access history of each user • Letizia (MIT Media Lab) • Global access pattern • Global access pattern to each document • WebWatcher (CMU), Path-profile (Microsoft), KSS (Stanford), ...

  11. Context-based • Hyper-link Context • Mostly for Guide Model • KSS (Stanford), WebGlimpse (U of Arizona), WebWatcher (CMU) • Social-Network Context • Referral Web (AT&T Lab)

  12. User preference-based Personal profile matching • Find similar taste user Global preference measure • Find globally famous/popular document Explicit Feedback • User’s explicit feedback on the document Implicit Feedback • Automatic extraction of user’s preference

  13. User preference-based • GroupLens (U of Minnesota), Ringo (MIT) • Explicit user-profile matching (user voting) • Google (Stanford) • Implicit Global preference (Hyperlink) • PHOAKS (AT&T Lab) • Semi implicit Global preference (Netnews)

  14. Metadata architecture • Architecture for “Value” exchange • PICS (Platform for internet content selection) • W3C Standardization effort • Inspired by “Adult Site” Filtering • Publisher, Rater, Filter • User selects Rater • Stanford ComMentor

  15. PICS (PICS-1.1 “http://ra/v1.0/” labels on "1994.11.05" until “1995.12.31" for "http://c.s/g.html" by "John Doe" ratings (violence 3 sex 2))

More Related