1 / 13

Fighting Spam: An Innovative Enhancement to Outlook Express

Fighting Spam: An Innovative Enhancement to Outlook Express Zhengxiang Pan & Yuanbo Guo Target: Outlook Express Current anti-spam functionalities in OE: Blocked senders list Mail rules Limitations: Limited Rule-based filter Difficulties in generate rules Lack of flexibility

Sharon_Dale
Télécharger la présentation

Fighting Spam: An Innovative Enhancement to Outlook Express

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fighting Spam: An Innovative Enhancement to Outlook Express Zhengxiang Pan & Yuanbo Guo

  2. Target: Outlook Express • Current anti-spam functionalities in OE: • Blocked senders list • Mail rules • Limitations: Limited Rule-based filter • Difficulties in generate rules • Lack of flexibility • Not adaptive: spam mutate! • Free -> F r e e -> F*r*e*e

  3. What did we design? • An Intelligent Spam Identification Component (ISIC) that use IDSS techniques, specifically CBR. • Absorb ideas from rule-based and statistical filter • Featuring dynamical attributes selection and heuristic-guided case base maintenance

  4. Case Representation • Attribute-Value Pairs • possible values: Yes and No • Two sets of attributes • 51 predefined attributes • about specific properties of an email • selected from http://www.spamassassin.org • 100 dynamically determined attributes • About word occurrences in the email

  5. Predefined Attributes - Examples

  6. Dynamically Determined Attributes • Attribute Selection • Use Odd-Ratio as the indicator of the predicative power of a word for the categories (spam, non-spam) and rank them • Select the top 50words from each vocabulary of spam emails and non-spam emails as the attributes lots of details in the paper

  7. An Example Case Case 1: (predefined attributes) … CHARSET_FARAWAY = No TO_EMPTY = Yes FROM_AND_TO_SAME = Yes LOTS_OF_CC_LINE = Yes MISSING_HEADERS = Yes … (dynamically selected attributes) Free = Yes Guaranteed = Yes Debt = Yes Hello = No … (solution) Spam = Yes

  8. Similarity Measurement • Simple Matching Coefficiency (SMC) based on Hamming Distance SIMH (P, C) = ∑i=1..NEQ(Xi, Yi) / N EQ(Xi, Yi) = 1 if Xi = Yi; 0 otherwise.

  9. Case Retrieval • K-Nearest Neighbor like algorithm • For a new email P, calculate its similarity SIMH to each case in the case base, and pick out the top K cases with the largest SIMH values. • If the majority of those chosen cases are labeled as spam, the new email will be classified as spam too; otherwise non-spam; • e.g. K = 5

  10. Case Base Maintenance • Initially spam and non-spam base each has 200 cases • When case base size reaches 300 • restore the case base size back using a mechanism which removes those cases that are • Old (to keep the freshness of cases so that they reflect the trend) • Close to “Center Case” (in an attempt to boost the variety of cases) • Introduced a new concept “Center Case”. Defined in the paper. • Redo attribute selection based on current cases

  11. Outlook Express API GUI Case Base Case Base Manager Classifier Attribute Selector Parser Email Repository Manager Email Repository Components of the ISIC system Architecture

  12. Use enhanced Outlook Express Same UI as OE

  13. Conclusion • Highlights: • Localized & easy to construct • Personalized • Easy to use • Adaptive • Limitations • Initial cases limit personalization • Not for standalone use: on top of current OE

More Related