Innovative Information Filtering Models for Extracting Valuable Data from Vast Datasets

Value filtering • Problem • Vast amount of data • Traditional method is not enough • Hard to find the “valuable” information

Filter Goal • Help users with information search • Attach “value” to documents • “Filter out” garbage information

Outline • Filtering model • User-Filter interaction model • Information domain • Movie, Music, Web, Legal Cite, ... • Filtering criteria • Content-based, Context-based, ... • Metadata architecture • “Value” distribution

Filtering model • Query/Response model Filter Query Information Response

Information Source Information Source Filtering model • Push model Filter Useful Information

Filtering model This way, Please! • Guide model ?

Information domain • Different domain -> different method • Legal cite, Entertainment, Newsgroup, ... • Utility/cost ex) Legal cite : Defense System : Movie • Technological barrier ex) Text document : Movie

Filtering criteria • Content-based • Access history-based • Context-based • User-preference based

Content-based • Similarity of document to user’s interest • Keyword vector, LSI (Latent Semantic Index) • Most systems • Search Engines • Tapestry (Xerox Palo Alto), FAB (Stanford), WebWatcher (CMU), …

Access history-based • Analysis of user’s access pattern • Personal access pattern • Personalized access history of each user • Letizia (MIT Media Lab) • Global access pattern • Global access pattern to each document • WebWatcher (CMU), Path-profile (Microsoft), KSS (Stanford), ...

Context-based • Hyper-link Context • Mostly for Guide Model • KSS (Stanford), WebGlimpse (U of Arizona), WebWatcher (CMU) • Social-Network Context • Referral Web (AT&T Lab)

User preference-based Personal profile matching • Find similar taste user Global preference measure • Find globally famous/popular document Explicit Feedback • User’s explicit feedback on the document Implicit Feedback • Automatic extraction of user’s preference

User preference-based • GroupLens (U of Minnesota), Ringo (MIT) • Explicit user-profile matching (user voting) • Google (Stanford) • Implicit Global preference (Hyperlink) • PHOAKS (AT&T Lab) • Semi implicit Global preference (Netnews)

Metadata architecture • Architecture for “Value” exchange • PICS (Platform for internet content selection) • W3C Standardization effort • Inspired by “Adult Site” Filtering • Publisher, Rater, Filter • User selects Rater • Stanford ComMentor

PICS (PICS-1.1 “http://ra/v1.0/” labels on "1994.11.05" until “1995.12.31" for "http://c.s/g.html" by "John Doe" ratings (violence 3 sex 2))

Innovative Information Filtering Models for Extracting Valuable Data from Vast Datasets