120 likes | 257 Vues
This paper presents an innovative anti-spam filter developed on adaptive neural networks, enhancing email classification by leveraging Adaptive Resonance Theory (ART). The proposed system organizes spam and ham data, enabling real-time self-organization and stable recognition. Through a refined architecture comprising multiple neural networks and heuristics, it effectively increases detection speed and learns new patterns without losing previously acquired recognition capabilities. The outcome is a high-performance filtering system that excels in categorizing emails, achieving advanced spam detection rates and adaptability for diverse applications.
E N D
An Anti-Spam filter based on Adaptive Neural Networks Alexandru Catalin Cosoi Researcher / BitDefender AntiSpam Laboratory acosoi@bitdefender.com
Neural Networks • a large number of processing elements, called neurons • a different approach in problem solving • neural networks and conventional algorithmic computers complement each other
Adaptive Resonance Theory • Proposed by Carpenter and Grossberg in 1976-86 • Solves the stability – plasticity dilemma • ART architecture models can self-organize in real time producing stable recognition while getting input patterns beyond those originally stored • Contains two components: an attentional and an orienting subsystem • The orienting subsystem works like a novelty detector
ARTMAP • ARTMAP • a class of Neural Network architectures • perform incremental supervised learning • multi-dimensional maps • input vectors presented in arbitrary order • Fuzzy ARTMAP • features presented in fuzzy logic
System • A complex system that will • gather the spam and ham corpus • study its characteristics • learn • no human involvement
Inputs • words like viagra, mortgage, xanax • obfuscated words • information extracted from headers • other heuristics used in Anti-Spam filters
Hierarchy • Initial implementation: single neural network • Increasing number of heuristics • Increasing number of training items • Train both on spam and ham • Improvements • Next step: multiple neural networks (a hierarchy) • Run only requested heuristics • Perform a refined classification • Split email into several categories • Increase detection speed • Learn new patterns without losing detection on older spam
Correction module and noise reduction • Performs noise reduction on the input data before entering the learning phase • Increases discrimination rate between the input patterns • Eliminates or modifies patterns that can cause misclassification (same pattern for multiple categories)
Results Table 3: Detection results on an increasing number of training items. Both train and test corpus were analyzed. Detection results on training items Detection results on test items
Conclusions • Fast learning method • Solves the stability – plasticity dilemma (property preserved from the ART-modules) • Improves consistently the heuristic filter • Faster • The analysis is based on pattern recognition • Performs a refined analysis • High detection rates • Advanced categorization • Multiple spam categories • Can also be used for parental control • Can perform email classification (business, school, personal) In conclusion, this system improves both speed and detection