An Anti-Spam filter based on Adaptive Neural Networks

An Anti-Spam filter based on Adaptive Neural Networks Alexandru Catalin Cosoi Researcher / BitDefender AntiSpam Laboratory acosoi@bitdefender.com

Neural Networks • a large number of processing elements, called neurons • a different approach in problem solving • neural networks and conventional algorithmic computers complement each other

Adaptive Resonance Theory • Proposed by Carpenter and Grossberg in 1976-86 • Solves the stability – plasticity dilemma • ART architecture models can self-organize in real time producing stable recognition while getting input patterns beyond those originally stored • Contains two components: an attentional and an orienting subsystem • The orienting subsystem works like a novelty detector

ARTMAP • ARTMAP • a class of Neural Network architectures • perform incremental supervised learning • multi-dimensional maps • input vectors presented in arbitrary order • Fuzzy ARTMAP • features presented in fuzzy logic

System • A complex system that will • gather the spam and ham corpus • study its characteristics • learn • no human involvement

Inputs • words like viagra, mortgage, xanax • obfuscated words • information extracted from headers • other heuristics used in Anti-Spam filters

Hierarchy • Initial implementation: single neural network • Increasing number of heuristics • Increasing number of training items • Train both on spam and ham • Improvements • Next step: multiple neural networks (a hierarchy) • Run only requested heuristics • Perform a refined classification • Split email into several categories • Increase detection speed • Learn new patterns without losing detection on older spam

Hierarchy

Correction module and noise reduction • Performs noise reduction on the input data before entering the learning phase • Increases discrimination rate between the input patterns • Eliminates or modifies patterns that can cause misclassification (same pattern for multiple categories)

Results

Results Table 3: Detection results on an increasing number of training items. Both train and test corpus were analyzed. Detection results on training items Detection results on test items

Conclusions • Fast learning method • Solves the stability – plasticity dilemma (property preserved from the ART-modules) • Improves consistently the heuristic filter • Faster • The analysis is based on pattern recognition • Performs a refined analysis • High detection rates • Advanced categorization • Multiple spam categories • Can also be used for parental control • Can perform email classification (business, school, personal) In conclusion, this system improves both speed and detection

An Anti-Spam filter based on Adaptive Neural Networks