1 / 51

Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Online Learning by Projecting: From Theory to Large Scale Web-spam filtering. Yoram Singer. Based on joint work with:. Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta (Google), Joseph Keshet (HUJI), Andrew Ng (Stanford), Shai Shalev-Shwartz (HUJI).

yardley
Télécharger la présentation

Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Learning by Projecting:From Theory to Large Scale Web-spam filtering Yoram Singer Based on joint work with: Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta (Google), Joseph Keshet (HUJI), Andrew Ng (Stanford), Shai Shalev-Shwartz (HUJI) UT Austin AIML Seminar, Jan. 27, 2005

  2. Online Binary Classification No animal eats bees True Pearls melt in vinegar False Dr. Seuss finished Dartmouth True There are weapons of mass destruction in Iraq

  3. Binary Classification • Instances (documents, signals): • Labels (true/false, good/bad): • Classification and Prediction: • Mistakes and losses:

  4. Online Binary Classification • Initialize your classifier ( ) • For t = 1,2,3,…,T,… • Receive an instance: • Predict label: • Receive true label: [suffer “loss”/error] • Update classifier () • Goal: suffer small losses while learning

  5. Why Online? • Adaptive • Simple to implement • Fast, small memory footprint • Can be converted to batch learning (O2B) • Formal guarantees • But: might not be as effective as a well designed batch learning algorithms

  6. Positive Margin Negative Margin Linear Classifiers & Margins • The prediction is formed as follows: • The margin of an example w.r.t

  7. Separability Assumption

  8. Classifier Update - Passive Mode

  9. Prediction & Margin Errors

  10. Hinge Loss

  11. In case of a prediction mistake then must reside Version Space

  12. is projected onto the feasible (dual) space Mistake  Aggressive Mode

  13. Passive-Aggressive Update

  14. Three Decision Problems:A Unified View Classification Regression Uniclass

  15. The Generalized PA Algorithm • Each example induces a set of consistent hypotheses (half-space, hyper-slub, ball) • The new vector is set to be the projection of onto set of consistent hyp. Classification Regression Uniclass

  16. Loss Bound (Classification) • If there exists such that • Then where • PA makes a bounded number of mistakes

  17. Proof Sketch • Define: • Upper bound: • Lower bound: Lipschitz Condition

  18. Proof Sketch (Cont.) • Combining upper and lower bounds • L=B for classification and regression • L=1 for uniclass

  19. Unrealizable Case ???

  20. PA-I PA-II Unrealizable Case (Classification)

  21. (Not-really) Aggressive Updates

  22. Mistake Bound for PA-I • Loss suffered by PA-I on round t: • Loss suffered by any fixed vector: • #Mistakes made by PA-I is at most:

  23. Loss Bound for PA-II • Loss suffered by PA-II on round t: • Loss suffered by any fixed vector: • Cumulative loss ( ) of PA-II is at most:

  24. Beyond Binary Decision Problems • Applications and generalizations of PA: • Multiclass categorization • Topic ranking and filtering • Hierarchical classification • Sequence learning (Markov Networks) • Segmentation of sequences • Learning of pseudo-metrics

  25. Movie Recommendation System Recommender System

  26. Project • Apply Thresholds 1 2 3 4 Recommending by Projecting Projection & Thresholds are Learned from Ratings

  27. Prank Update w Thresholds 1 3 5 2 4 Rank Levels

  28. Prank Update w 1 3 5 2 4

  29. PRank Correct Rank Interval w 1 3 5 2 4

  30. Prank Update {2, 3} w 1 3 5 2 4

  31. PRank Update w

  32. PRank Update w x w

  33. EachMovie Database • 74424 registered Viewers • 1648 listed Movies • Viewers rated subsets of movies • Demo: online movie recommendation

  34. PA@Google: Web Spam Filtering [With Vineet Gupta] • Query: “hotels palo alto” • Spammers: • Cardinal Hotel - PaloAlto - Reviews of Cardinal Hotel... PaloAlto, California 94301 United States. Deals on PaloAltohotels. ... More PaloAltohotels. ... Research other PaloAltohotels. Is this hotel not right for you? ...www.tripadvisor.com/Hotel_Review-g32849-d79154-… • PaloAltoHotels - Cheap Hotels - PaloAltoHotels...Book PaloAltoHotels Online or Call Toll Free 1-800-359-7234. ... Keywords: PaloAltoHotel Discounts - Cheap Hotels in PaloAlto. Hotels In PaloAlto. ... www.hotelsbycity.com/california/hotels-palo-alto-…

  35. Enhancements for Web Spam • Various “signals”  features • Design of special kernels • Multi-tier feedback (label): • +2 navigational site (e.g. www.stanford.edu) • +1 on topic • -1 off topic • -2 nuke the spammer • Loss is sensitive to site label • Algorithmic modifications due to scale: • Online-to-batch conversions • Re-projections of old examples • Part of a recent revision to search (Google3)

  36. Web Spam Filtering - Results • Specific queries and domains are heavily spammed: • Over 50% of the returned URL for travel search • Certain countries are more spam prone • Training set size: over half a million domains • Training time: 2 hours to 5 days • Test set size: the entire web crawled by Google (over 100 million domains) • A few hours to filter all domains on 100’s of cpus • Current reduction achieved (estimate): 50% of spammers

  37. Summary • Unified online framework for decision problems • Simple and efficient algorithms (“kernelizable”) • Analyses for realizable and unrealizable cases • Numerous applications • Batch learning conversions & generalization • Generalizations using general Bregman projections • Approximate projections for large scale problems • Applications of PA to other decision problems

  38. Related Work • Projections Onto Convex Sets (POCS): • Y. Censor & S.A. Zenios, “Parallel Optimization” (Hildreth’s projection algorithm), Oxford UP, 1997 • H.H. Bauschke & J.M. Borwein, “On Projection Algorithms for Solving Convex Feasibility Problems”, SIAM Review, 1996 • Online Learning: • M. Herbster, “Learning additive models online with fast evaluating kernels”, COLT 2001 • J. Kivinen, A. Smola, and R.C. Williamson, “Online learning with kernels”, IEEE Trans. on SP, 2004

  39. Relevant Publications • Online Passive Aggressive Algorithms, CDSS’03 CSKSS’05 • Family of Additive Online Algorithms for Category Ranking, CS’03 • Ultraconservative Online Algorithms for Multiclass Problems, CS’02 CS’03 • On the algorithmic implementation of Multiclass SVM, CS’03 • PRanking with Ranking, CS’01 CS’04 • Large Margin Hierarchical Classification, DKS’04 • Learning to Align Polyphonic Music, SKS’04 • Online and Batch Learning of Pseudo-metrics, SSN’04 • The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees, DSS’04 • A Temporal Kernel-Based Model for Tracking Hand-Movements from Neural Activities, SCPVS’04

  40. Hierarchical Classification:Motivation Phonetic transcription of DECEMBER Gross erorr T ix s eh m bcl b er Small errors d AE s eh m bcl b er d ix s eh NASAL bcl b er

  41. Phonetic Hierarchy PHONEMES Sononorants Silences Nasals Obstruents Liquids n m ng l Vowels y w Affricates r Plosives jh Fricatives ch Front Center Back f b v g oy aa iy sh d ow ao ih s k uh er ey th p uw aw eh dh t ay ae zh z

  42. A greedy approach: solve a multiclass problem at each node C C C Common Constructions • Ignore the hierarchy - solve as multiclass C

  43. W0 W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 Hierarchical Classifier • Assume and • Associate a prototypewith each label • Classification rule:

  44. Define W0 W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 Hierarchical Classifier (cont.)

  45. A Metric Over Labels • A given hierarchy defines a metric over the set of labels via graph distance b a

  46. From PA to Hieron • Replace a simple margin constraint with a tree-based margin constraint: - correct label - predicted label

  47. Hieron - Update w1 w2 w3 w4 w5 w6 w7 w8 w9 w10

  48. Hieron - Update w6 w7 w10

  49. Sample Run on Synthetic Data

  50. Experiments with Hieron • Datasets used • Compared two models: • Hieron with knowledge of the correct hierarchy • Hieron without knowledge of the correct hierarchy (flat)

More Related