240 likes | 331 Vues
Learn about PhishDef, a client-side phishing classification system that offers proactive zero-day protection with high accuracy and low latency. Resilient to noise data, it uses machine learning algorithms for effective defense. Explore the dataset, feature selection methods, and evaluation results to see how PhishDef outperforms traditional models.
E N D
PhishDef: URL Names Say It All MichalisFaloutsos University of California, Riverside USA Anh Le, AthinaMarkopoulou University of California, Irvine USA
What is Phishing? • Social engineering and technical means to steal consumers’ personal identity, data, etc. • Cause billions of dollars of loss annually Anh Le - UC Irvine - PhishDef
Antiphishing.org Anh Le - UC Irvine - PhishDef
Example of a Phishing Site Anh Le - UC Irvine - PhishDef
Current Protection • Google Safe Browsing • Microsoft Smart Screen • Third-Party Anh Le - UC Irvine - PhishDef
Current Protection Model Google Safe Browsing • Motivation: • Blacklist-based protection is reactive -- -- cannot protect against zero-day phishing Anh Le - UC Irvine - PhishDef
Outline Phishing Background Motivation Our proposal New Protection Model Learning Algorithms Dataset Feature Selection Evaluation Results Concluding Remarks Anh Le - UC Irvine - PhishDef
Our Proposed Protection Model • Main challenges: Accuracy and Classification Latency • Which classification algorithm works best? • Which set of features works best? Anh Le - UC Irvine - PhishDef
Prior Work Whittaker et al. [NDSS ’10] Google Safe Browsing Ma et al. [SIGKDD ’09] Batch-based Classification Ma et al. [ICML ‘09] Batch-based vs. Online Learning Server-Side Classification Anh Le - UC Irvine - PhishDef
Main Contributions New Protection Model: Client-side classification Propose using Adaptive Regularization of Weights (AROW) High accuracy Resilient to noise Set of Lexical Features Fast to extract at client side Obfuscation resistant Anh Le - UC Irvine - PhishDef
Machine Learning Algorithms • Batch-based Support Vector Machine • Online Perceptron • Confident Weighted (CW) [Dredze et al., ICML 2008] • Adaptive Regularization of Weights (AROW)[Crammer et al., NIPS 2009] Anh Le - UC Irvine - PhishDef
Online Classification • Maintaining a weight vector and use it for classification • Online Perceptron Client Side: Trained Beforehand Extract In Real Time Server Side: Anh Le - UC Irvine - PhishDef
Online Classification • Confident Weighted (CW) • Adaptive Regularization of Weights (AROW) minimum change enough to correct last mistake minimum change increasing confidence penalty for mistake Anh Le - UC Irvine - PhishDef
Dataset • Phishing URLs • PhishTank (4,082) • MalwarePatrol (2,001) • Benign URLs • Open directory(4,012) • Yahoo directory (4,143) • Time period: June 2010 Anh Le - UC Irvine - PhishDef
Feature Selection • Lexical Features • External Features • Country, AS number, registration date, registrant, registrar, etc. Anh Le - UC Irvine - PhishDef
Outline Phishing Background Motivation Our proposal New Protection Model Learning Algorithms Dataset Feature Selection Evaluation Results Concluding Remarks Anh Le - UC Irvine - PhishDef
Evaluation Results: Lexical vs. Full Features • (+) ~ 1% • (-) Dependency on Remote Server • (-) Avg. Latency: 1.64 s Lexical features alone are better-suited than full features for client-side phishing classification Anh Le - UC Irvine - PhishDef
Evaluation Results:CW vs. AROW AROW is more resilient to noise than CW Anh Le - UC Irvine - PhishDef
Conclusion: PhishDef • Client-side phishing classification system • Proactive, on-the-fly classification of zero-day phishing URLs • Low delay client side (ms),high accuracy (97%) • Resilient to noisy data • Future Work: • Develop an add-on for Firefox Anh Le - UC Irvine - PhishDef
Questions Anh Le - UC Irvine - PhishDef
Example of a Phishing Site http://pilety.ru/c548c205d7660ed0628b467d7d5aa54c9c3a7124/image/taxrefund.htm http://www.hmrc.gov.uk/intro-income-tax.htm Anh Le - UC Irvine - PhishDef
Evaluation Results:Batch-Based vs. Online Learning Online Learning outperforms Batched-Based Learningfor Phishing classification Anh Le - UC Irvine - PhishDef
Chrome 11 > Firefox 4 Anh Le - UC Irvine - PhishDef