NLPhishDetect: Detecting Phishing the Natural Language Way

NLPhishDetect:Detecting Phishing the Natural Language Way By Nabil Hossain Advisor: Dr. Rakesh Verma

Phishing? What’s that? • The fraudulent practice of sending e-mails masqueradingas atrustworthy entity in order to induce individuals to reveal personal information • Information that phishers are generally looking for: • username, password, credit card details from • bank accounts • Online payment service account, e.g. eBay, amazon, paypal • In 2005, computer users in USA suffered $929M in total • In 2007, it was estimated to be $3.2 billion

Sample phishing email • Note that this email attempts an action from the user

NLPhishDetect • Phishing emails often • create a sense of urgency • promise a sum of money as a reward, e.g. completing surveys • Main idea: Phishing emails are designed to trigger an action from the user • NLPhishDetect is designed to distinguish between ‘actionable’ and ‘informational’ emails • Analyze all the information in an email: header, links, text • Analyze email text using Natural Language Processing techniques • Use contextual information from the mailbox to detect phishing

Prototype Implementation • Goal: Analyze and filter emails before they reach the user’s mailbox

NLPhishDetect • Whole programming done on Perl • NLPhishDetect uses 3 heuristics: • headerAnalysis(): analyzes message header • linkAnalysis(): analyzes links in the body • textAnalysis(): analyzes the email text • Each heuristic examines part of the email message and generates a score • NLPhishDetect combines the scores from these 3 heuristics to compute the final email score, i.e., phishing or legitimate

Flowchart

Results • To our knowledge, our accuracy and phishing detection combined is better than any other detection algorithms created so far

About the Context Information • Context Information is used by textAnalysis() • Context information is NOT present in the 1st run

Concluding it all • NLPhishDetect is a very strong phishing email filter • High True positives: accuracy on detecting phishing emails • Low False positives: accuracy on marking good emails as ‘good’ • By analyzing links without visiting them, it prevents the user from being exposed to malwares, trojans, etc • In future, we expect to • Analyze attachments • Reduce computation costs • Commercialize NLPhishDetect

Final Words • Special thanks to my mentor Dr. Verma • Thanks to Dr. Huang and Dr. Kakadiaris for their dedication and involvement in the REU

Question Time !!

NLPhishDetect: Detecting Phishing the Natural Language Way