Recall Systems: Efficient Learning and Use of Category Indices

Recall Systems: Efficient Learning and Use of Category Indices Omid Madani With Wiley Greiner, David Kempe, and Mohammad Salavatipour

Overview • Problems and motivation • Proposal: recall systems • Experiments • Related work and conclusions

Massive Learning • Lots of ... • Instances (millions, unbounded..) • Dimensions (1000s and beyond) • Categories (1000s and beyond) • Two questions: • How to quickly categorize? • How to efficiently learn to categorize efficiently?

Yahoo! Page Topics (Y! Directory) Arts&Humanities Business&Economy Recreation&Sports Sports Photography History Contests Amateur Magazines Education college Over 100,000 categories in the Yahoo! directory basketball Given a page, quickly categorize… Larger for vision, text prediction,... (millions and beyond)

Efficiency • Two phases (unless truly online): • Learning • Classification time/deployment • Resource requirements: • Memory • Time • Sample efficiency

Idea • Cues in input may quickly narrow down possibilities => “index” categories • Like search engine, but learn a good index • Goal here: index reduces possible classes, classifiers are then applied for precise classification

Summary Findings • Very fast: • Train time: learned in minutes on thousands of instances/categories • 10s of online classifiers trained on each instance (not 1000s) • Index doesn’t hurt classifier accuracy!

Instance x Recognition System Recall System Reduced set of candidate categories Classifier Application Categories for x

The Problem: Tripartite Graph features instances categories f1 x1 c1 x2 f2 c2 x3 f3 c3 x4 f4 c4 x5 f5 x6 x7

Output: An Index features concepts Bipartite graph f1 c1 set of edges (E)= “COVER” f2 c2 f3 c3 f4 c4 f5 c5

Using the Index • Given instance x, retrieve the following candidate set of concepts: A concept is retrieved when a disjunction of features is satisfied

Terminology • False positive: The retrieved concept shouldn’t have been retrieved (irrelevant) • False negative: The concept should have been retrieved, but was not (missed)

Learning to Index • Lets learn the cover (the edges) • Online and mistake driven • Mistake means: • A false negative concept, or • Too many false positives

The Indexer Algorithm • For each concept c keep a sparse vector Vc, initially 0 • Begin with empty cover • On each instance x, • Retrieve candidates concepts • Update Vc for each false negative c (promotion) • If fp-count > tolerance, update Vc for each false positive c (demotion) • Update index accordingly • Update classifiers

Use Feature Weights • For each concept c keep a sparse vector Vc, initially 0 • An (i,j)-edge exists in the cover iff Inclusion threshold

Updating the Vectors • Increase/decrease feature weights in Vc that appear in x by learning rate • In promotion, if feature is not present in Vc: initialize to 1 or 1/df • In demotion: ignore 0 features • Max normalize weights (optional) • Update the index • Takes O(|x| + |Vc|) on every instance

The Indexer Algorithm

The Update Subroutine

Analysis • Under a distribution X on instances • A given cover E induces a • A false-positive rate (fp-rate): • A false-negative rate = fn-rate fp-rate(E) [fp-count on x]

Analysis • If fp-rate(E) <= fp, and fn-rate(E) <= fn, we say the cover is a (fp,fn)-cover • Is there an algorithm that converges efficiently to a (fp, fn)-cover? • We can show this for the max-norm algorithms, given existence of (0,0)-cover, and we set tolerance to 0

Convergence of max-norm • The max-norm algorithm converges to a (0,0)-cover, given such exists, and tolerance is set to 0 • The max-norm algorithm makes O(KL) mistakes for a concept with K pure features, and average instance length of L

Pure Features • Pure feature f for c = if f occurs, the instance belong to c • A “pure” feature never gets “punished” for its concept • Will take O(L) mistakes to get other irrelevant features out of index

Complexity Results • Existence of (fn,fp)-cover is NP-hard (when fp > 0, fn can remain 0). • Approximation is also NP-hard! • Why successful in practice?!

Variations • Some alternatives: • Use of weights for ranking • Other update policies • Additive updates • Use of other norms, or no norm • Batch versus online • …

Instance x Recognition System Recall System Reduced set of candidate categories Classifier Application Categories for x

The Classifiers • (Possibly) Binary classifiers: • One for each concept • For learning the classifiers: • Online learning algorithms

Learners Used • Need online algorithms • Experimented with: • Perceptron • Winnow • Committees of these (voted perceptrons, etc.)

Experiments

Questions • Small tolerance (10s, 100s) enough? • Convergence? Overhead (speed & memory)? • Overall performance? (together with classifier training and testing)

Size Statistics • 3 large text categorization corpora: • The big new Reuters corpus (Rose et al) • An ads dataset (internal) • ODP = open directory project (web pages and their categories)

Domain statistics

Domains

Experimental set up • Split data into 70% train and 30% test • Same split used for all experiments • Algorithm parameters: • Tolerance = 100, • Learning rate = 1.2 • Inclusion threshold = 0.1 • 2.4 ghz with 64 gig ram

Performance (Indexer Alone)

Reuters With Classifiers All three domains but subset of classes

fp-rate at pass i fn-rate at pass i Indexer’s Performance Reuters Ads ODP

Indexer’s Timings m = minutes, h = hours

Performance With Classifiers I Reuters No = index NOT used Yes = index used

With Classifiers II Reuters, 50 sample categories Ads, 76 sample categories ODP, 108 sample categories f1 score (harmonic mean of precision and recall) at pass i

Error Plot total False negative False positive

W and fp-rate Convergence number of instances

Fn-rate vs. Tolerance Fn-rate tolerance

Fp-rate vs. tolerance Fp-rate

Index Size Statistics After 20 Passes

High Out-degree Features • In Reuters: • “woodmark” (outdegree 10) • Wooden Furniture Measuring • Precision Instruments • Electronic Active Components • … • “prft” (64) • “shr” (59)

Related Work • Fast classification candidates: • hierarchical learning, trees (kd, metric, ball, vp, cover, ..), • inverted indices (search engines!) • Fast learning candidates: • Nearest neighbors • Naïve Bayes • Generative models • Hierarchical learning • Feature selection/reduction

Related • Fast visual categorization in biological systems (e.g. Thorpe et al) • Psychology of concepts (e.g. Murphy’02) • Associative memory, speed up learning, blackboard systems, models of aspects of mind/brain

Summary • Problem: Efficiently learn and classify when categories abound • Proposed the recall system: an index that serves as a filter • Efficiently learned the filter quickly learned a quick system!

Current/Future • Evaluation on other domains • Language modeling, prediction • Vision .. • Extend techniques • Ranking (easier than labeling: got very promising results) • Learn “staged” versions • Concept discovery • Understand better: • Why such efficient algorithm work? • Why should good covers exist? What tolerance? • Strengthen convergence analysis

Acknowledgements • Thanks to Thomas Pierce for helping us with the Nutch engine • The Y!R ML group (DeCoste and Keerthi) for discussions

Recall Systems: Efficient Learning and Use of Category Indices

Recall Systems: Efficient Learning and Use of Category Indices

Presentation Transcript

Specialized Business Information Systems

Chapter 6

Efficient Binary Translation In Co-Designed Virtual Machines

Chapter 7 Efficiency and Exchange

Efficient Modeling and Simulation of Multidisciplinary Systems across the Internet

BENTHAM AND HOOKER’S SYSTEM OF ANGIOSPERM CLASSIFICATION

MonetDB, Cracking and recycling

Learning and Repair Techniques for Self-Healing Systems

COP5725 Advanced Database Systems

Database Management Systems

Principles of Information Systems, Tenth Edition

Welcome to Jeopardy!

JEOPARDY

Teaching using active learning

Reinforcement Learning

Machine Learning Lecture outline

Fronto Limbic and Mem

Category 1

Learning Optimal Strategies for Spoken Dialogue Systems

JEOPARDY

Welcome to Jeopardy!