Arron La Joey Lei David Cortez

Spam Filtering Team Arron La Joey Lei David Cortez

Problem • How to differentiate emails • Decide if an email is spam or non-spam • Gather a diverse knowledge base to develop an unbiased spam filter

Techniques for Implementation • A hash table with “nearest neighbor approach” • Nearest neighbor approach with extra data • Bayesian or Neural Networks

The hash table will contain important and common words that may indicate if an email is spam “Nearest Neighbor Approach” Non-Spam E-Mail Spam

Nearest Neighbor Approach with Extra Data • Extra Data Consists are the following: • Size of the email • Content\Subject Line • Punctuation to word ratios • IP addresses

Bayesian Network Approach • Create two hash tables that tallies the number of occurrences of each word in a spam/non-spam email • Create a third hash table that calculates the probability of each word • probability(word) { let g = (2 * # of hashNonSpam(word)) let b = (# of hashSpam(word)) if(g + b) > 5 then max( 0.1, (min 0.99, ((min (numOfSpam / b), 1) / ((min (g/ numOfNonSpam, 1) + min(1, (b/ numOfSpam))) } numOfSpam = # of spam emails numOfNonSpam = # of non-spam emails

Bayesian Network Approach Continue.. • To check email: Take 20 words that has the probability farthest from 0.5 (meaning neutral words) • With those 20 words, use Bayes Rule ab..v prob(word) = ------------------------------ ab..v + (1 - a)(1 - b)..(1-v) If prob(word) > 0.9 == SPAM EMAIL

Methods of Evaluation • Create a training and testing data set to determine effectiveness • Results to compare implementations to one another • Implementations can be compared to other well-known techniques

Blacklist Domains/Emails “White list” Domains Authenticity Checking Header/Context Analysis Checksum Technology User Input Learning (Spam/Non-Spam Button) Classifying Non-Spam Other Techniques of Implementation

Reference • “A Plan for Spam,” Paul Graham, 2003 August, www.paulgraham.com/spam • “Better Bayesian Filtering,” 2003 Spam Conference, www.paulgraham.com/better

Arron La Joey Lei David Cortez

Arron La Joey Lei David Cortez

Presentation Transcript

Superhero Joey

Cortez Bank

Joey Adkins

Joey bentancur

By Monika Cortez

Hernando Cortez

JOEY

By Joey Cordle

Joey Jordison

Joey B.

Elvis Arron Presley

Hank arron

Title Page: arron copland

Dr. Teresa Cortez

Cortez UV Disinfection

LUKE ARRON GRAY

JOEY PIZIALI

LUKE ARRON GRAY

JOEY