Machine Learning Basics with Applications to Email Spam Detection

UGR Project - Haoyu li, brittanyedwards, weizhang under xiaoxiaoxu and aryenehorai Machine Learning Basics with Applications to Email Spam Detection

General background information about the process of machine learning

The process of email detection • Motivation of this project • Pre-processing of data • Classifier Models • Evaluation of classifiers

Motivation of this project • Spam email has been annoyed every personal email account • 60% of January 2004 emails were spam • Fraud & Phishing • Spam vs. Ham email

Our Goal

Spam Email example

Ham Email example

Pre-processing of data • Convert capital letters to lowercase • Remove numbers, and extra white space • Remove punctuations • Remove stop-words • Delete terms with length greater than 20.

Pre-processing of data • Original Email

Pre-processing of data • After pre-processing

Pre-processing of data • Extract Terms

Pre-processing of data • Reduce Terms • Keep word length <20

Different classification methods • K Nearest Neighbor (KNN) • Naive Bayes Classifier • Logistic Regression • Decision Tree Analysis

What is K Nearest Neighbor • Use k "closet" samples (nearest neighbors) to perform classification

What is K Nearest Neighbor

Initial outcome and strategies for improvement • KNN accuracy was ~64% - very low • KNN classifier does not fit our project • Term-list is still too large • Try different method to classify and see if evaluation results are better than KNN results • Continue to reduce size of term list by removing terms that are not meaningful

Steps for improvement • Remove sparsity • Reduced length threshold • Created hashtable • Used alternative classifier • Naive- Bayes Classifier

Hashtable • Calculate Hash Key for each term in term-list. • Once collision occurs, use the separate chain

Naive- Bayes classifier

Secondary Results • Correctness increases from 62% to 82.36%

Suggestions for further improvement • Revise pre-processing • Apply additional classifiers

Thank you • Questions?

Machine Learning Basics with Applications to Email Spam Detection

Machine Learning Basics with Applications to Email Spam Detection

Presentation Transcript

A Financial Approach to Machine Learning with Applications to Credit Risk

Applications of Machine Learning to Medical Informatics

EMAIL AND SPAM

Exploiting Machine Learning to Subvert Your Spam Filter

Opinion Spam Detection

Email Spam Detection using machine Learning

Machine Learning Basics with Applications to Email Spam Detection

Spam Email Detection

Machine Learning basics

Spam Email

Spam Detection

Graph Mining Applications to Machine Learning Problems

Basics Of Machine Learning

Machine learning Applications - SciExperts

17 Top Applications of Machine Learning with Python

Machine Learning Applications

Machine Learning Basics

Applications of machine learning

Topic Detection using Machine Learning

Bitcoin Ransomware Detection with Scalable Graph Machine Learning

Applications of Machine Learning to Ecological Modelling

Machine Learning Projects | Machine Learning Applications | Machine Learning Training | Simplilearn