1 / 18

Bayesian Filtering

Bayesian Filtering. Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu. Background. Strong need exists to identify “bad” items in a population and remove them -- Examples: SPAM, Unsolicited IMs, Etc.

nikkos
Télécharger la présentation

Bayesian Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu

  2. Background • Strong need exists to identify “bad” items in a population and remove them -- Examples: SPAM, Unsolicited IMs, Etc. • Filtering often results in “Arm’s Race” requiring rapid response • “Arm’s Race” favors inherently adaptive methods over others

  3. Benefits of Filters • Less unwanted traffic, thus less wasted space on clients & servers • Greater use of internet services due to reduced customer frustration • Provide some protection against dangerous traffic: scams, phishing attacks, viruses, etc.

  4. Downsides of Filtering • Exclusion of even one legitimate item (i.e., False Positives) less desirable than letting 10 or more illegitimate items pass. • Reducing the percentage of undesirable traffic often causes legitimate traffic to be excluded as well.

  5. Cost of Filtering • Manual filtering has become prohibitive • Maintenance of static filters costs time & money • Time spent maintaining keywords or updating software delays response • “Arm’s Race” often results in ever escalating costs

  6. Methodologies • Manual filtering prohibitive in terms of time • Static filtering based on heuristics and keywords does not adapt except via manual updates • Bayesian filtering is dynamic, adapting with each new item scanned and/or marked

  7. What is Bayesian Filtering? • Uses Naïve Bayes Classifier, which uses Bayes Theorem • Classifier allows items to be adaptively categorized using probabilities & has low rate of False Positives • Most well-known use in SPAM filtering; often credited to initial work by Paul Graham (“A Plan for Spam”) in 2002

  8. Naïve Bayes Classifier • Uses Bayes Theorem with assumptions that probabilities are independent (rarely true), thus “naïve” • Classifier can start with initial assumptions, i.e., probabilities that words occur in legitimate or illegitimate messages • Is trained over time and adapts. If final probability reaches some threshold, an item is rejected. Superior to keyword filtering.

  9. Bayes Theorem • First presented in 1763 based on work by mathematician Thomas Bayes • Pr(A|B) = Pr(B|A)· Pr(A) / Pr(B) • Specifies relationships between conditional probabilities • Currently has practical use in many fields

  10. Bayesian Filtering Usage • Uses user input to develop individual statistics • Probability matrix changes over time based on scanned messages and user decisions • Matrix is used to calculate probability a message is unwanted • Matrix adapts quickly to new input, resulting in surprisingly good results

  11. Example Matrix

  12. Example • Suppose the word “guarantee” occurs in 500 of 2000 Spam emails, but only in 5 of 1000 Non-Spam emails • The probability of Spam for this word is then (500 / 2000) / ((500 / 2000) + (5 / 1000)) = 0.98 • This probability is combined with that of others obtained from message to compute a probability for the entire message being Spam.

  13. Bayesian Poisoning • Attempts to fool BF systems by adding irrelevant words (often hidden) • Type I attacks attempt to get messages through filter -- could be active or passive, with active producing feedback to sender via a “Web Bug” or other means • Type II attacks attempt to cause “False Positives”, i.e., force desirable messages to be rejected

  14. Poisoning Effectiveness • Passive attacks are rarely effective as filters are individual and sender gets no feedback • Active attacks can be initially highly effective, if systems access “Web Bugs” • All attacks lose effectiveness as the filter adjusts to incoming traffic

  15. Products that use Bayesian Filtering

  16. Summary • BF adapts to individual needs • BF is highly effective • BF adapts more quickly than other solutions • BF is resistant to “poisoning”

  17. References • [1] Sahami, M., et. al. “A Bayesian Approach to Filtering Junk E-Mail”, 1998 • [2] Graham, Paul. “A Plan for SPAM”, 2002 • [3] Graham-Cumming, John. “Does Bayesian poisoning exist?”, 2006

  18. References, cont. • [4] Naive Bayes Classifier, Wikipedia, 2007 • [5] Bayes Theorem, Wikipedia, 2007

More Related