1 / 26

Intrusion detection and identification based on Supelec TCPdump data and KDD1999

Intrusion detection and identification based on Supelec TCPdump data and KDD1999. Sylvain GOMBAULT et Wei WANG. Département Réseaux, Sécurité et Multimédia École Nationale Supérieure des Télécommunications de Bretagne , France. Outline. Deep analysis of kdd99 transformation and database

milt
Télécharger la présentation

Intrusion detection and identification based on Supelec TCPdump data and KDD1999

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intrusion detection and identification based on Supelec TCPdump data and KDD1999 Sylvain GOMBAULT et Wei WANG Département Réseaux, Sécurité et Multimédia École Nationale Supérieure des Télécommunications de Bretagne, France

  2. Outline • Deep analysis of kdd99 transformation and database • Intrusion detection using Supelec TCPdump data • Building multiple behavioral models for network intrusion identification (Monam 2007) • kNN based Intrusion detection and identification • PCA based intrusion detection and identification • Conclusion & future work GET/ENST Bretagne

  3. service domain_u http private time auth normal normal Protocol_type Probe normal tcp udp normal DOS Data transformation and explicit Approach Classification Transformation du trafic brut • Fonction de transformation • Choix d’attributs pertinents • Définition des propriétés à satisfaire (fonction riche) • Deux étapes après transformation des données brutes • Construction du modèle par apprentissage de données étiquetées • Phase de détection : données à classifier (analyser) GET/ENST Bretagne

  4. Fonction de transformation • Données considérées : • Trafic réseau • Pour alimenter l’outil de classification à partir du trafic brut : • Fonction de transformation T • R : ensemble du trafic brut • I : ensemble d’items structurés GET/ENST Bretagne

  5. Analysis of kdd99 database (1) • Learning base : 4 connections have the same 41 attributes but the label is different • 0,icmp,ecr_i,SF,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,1,1,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,ipsweep.148774 • 0,icmp,ecr_i,SF,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,1,1,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,portsweep.345836 • 0,icmp,tim_i,SF,564,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,2,2,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,normal.143855 • 0,icmp,tim_i,SF,564,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,2,2,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,pod.345952 GET/ENST Bretagne

  6. Analysis of kdd99 database (2) • Test base (corrected file) • 71 distinct connections have the same attributes but have the different labels. • 71503 (22.99% of the total) connections have the same attributes but appear the different labels • 3 ipsweep (Probing) attack connections have the same attributes of those of smurf (DoS) attack (56608 connections) • 3 (0.07%) Probing attacks cannot be detected (classifed as DoS attack instead) GET/ENST Bretagne

  7. Analysis of kdd99 database • Test base (corrected file) : • 7563 (97.7% of the total) connections of the snmpgetattack attack have the same attributes of those of normal • 2.3% of the snmpgetattack have similar attributes as normal, (but not all the same) • 7563 (46.72% of the total) R2L attack cannot be detected (they are classifed as normal) GET/ENST Bretagne

  8. Améliorations du C4.5 (for kdd99) GET/ENST Bretagne

  9. service domain_u http private time auth normal normal Protocol_type Probe normal tcp udp normal DOS Supelec TCPdump data Classification Transformation du trafic brut par BRO • Supelec TCPdump (trafic brut) -> using BRO to construct attributes • Transformation du trafic tcpdump en 41 attributs GET/ENST Bretagne

  10. Supelec TCPdump Data (suite) • Transformation du trafic TCPdump en 41 attributs • Use BRO • 4 catégories d’attributs : • Données générales de la connexion (niveau réseau et transport) • Service, Type de protocole (TCP, UDP ou ICMP), … • Attributs liés à la couche application • Nombre de création de fichier, Nombre de shells, … • Attributs statistiques sur les connexions situées dans les 2 dernières secondes de la connexion courante • Attributs statistiques sur les 100 dernières connexions GET/ENST Bretagne

  11. Learning and test data sets • Base d’apprentissage (from KDD99) • ~5 millions de connexions (10% (494021) utilisées from KDD99 learning set) • 4 classes d’attaques + trafic normal • Probing (4), DoS (6), U2R (4), R2L (9). • Base de test (from Supelec) • Normal • Use of 0-29 files of 101 tcpdump files • 30Gb size • 4652059 connexions • TCP: 1173654; UDP: 3254160; ICMP: 224245 • Only normal data • Attack • 10 connections • Cross-http, write-http, login-http, execute-http GET/ENST Bretagne

  12. Résultats avec les Arbres de décision (c4.5) • L’algorithme c4.5 introduit par Quinlan avec qq modifications • Processus de construction • Processus de classification GET/ENST Bretagne

  13. Intrusion detection and Identification based on KDD99 data • Building the normal model based on normal data for intrusion detection • Building individual attack model based on corresponding attack data for intrusion identification GET/ENST Bretagne

  14. The general Intrusion detection and Identification Model GET/ENST Bretagne

  15. kNN Based intrusion detection • Building normal behavioral model • Calculate the distances between each test vector t and each vector in the training data set by using Euclidean distance: • Sort the distance and choose the k nearest neighbors. • Average the k closest distance scores as the anomaly index. • Detection • If the anomaly index of a test sequence vector t is above a threshold  • the test sequence is then classified as abnormal. • otherwise it is considered as normal. GET/ENST Bretagne

  16. kNN based intrusion identification • Define normal and individual attack data sets as ; • Identification: • For each test vector tdo • Calculate for in each training set; • Find k smallest scores of as k-nearest neighbors; • If more than a half of k nearest neighbors correspond to a specific attack type then • t is identified as • Else If the number of smallest distance that corresponds to an attack type is greater than those of others then • t is identified as • Else then • t is identified as a new attack • End If • End For GET/ENST Bretagne

  17. New coordinate Original coordinate • PCA methods for intrusion detection • Principal Component Analysis • Dimension reduction technique for data analysis and compression • New coordinate system to represent the original large data set • The axes are the eigenvectors associated with the several largest eigenvalues • without sacrificing valuable information in the data set • Have been applied in face recognition, text categorization, etc. GET/ENST Bretagne

  18. Covariance matrix Mean-justed matrix k eigenvectors associated with the k largest eigenvalue Training data (attribute matrix) Mean vector Eigenvalue-eigenvector pairs PCA based normal model building for intrusion detection U GET/ENST Bretagne

  19. Projection coefficient (Principal component) Projection Reconstruction Anomaly/identification index Test data Intrusion detection based on PCA model U Mean vector t GET/ENST Bretagne

  20. PCA based intrusion detection and intrusion identification • Intrusion detection • Given a new data vector t, If its anomaly indexε is above a threshold, the test vector is considered as abnormal • Otherwise, it is classified as normal • Intrusion identification • Calculate the Euclidean distance between the test vector and its reconstruction onto each subspace formed by normal data and individual type of attack and set the minimum εi as the identification index. • If εi is below the predefined threshold θi for a certain individual type of attack, the vector is then identified as this type of attack. • Otherwise it is identified as a new attack. GET/ENST Bretagne

  21. Learning and test data sets for intrusion identification • Data description: • 41 attributes + name of the class • Text format • Data for intrusion detection (learning base of kdd99) • Learning data: randomly selected 7000 connections • Test data: 4 classes d’attaques + trafic normal • Normal data: randomly selected 10,000 normal connections • Attack data: all the other attack connections • 391,458 DoS attacks, 1,126 R2L attacks, 52 U2R attacks and 4107 Probe attacks. • Data for intrusion identification (learning base of kdd99) • Learning data: • Randomly selected 7,000 normal network connections • The former 2,000 back, 10,000 Nepture, 200 Pod, 20,000 Smurf, 800 Teardrop, 40 Guess passwd, 900 Warezclient, 1000 Ipsweep, 900 Portsweep, 1200 Satan, 200 Nmap, 15 Warezmaster, 25 buffer overflow attack • Test data • All the other network connections of these types of attacks are used for identification. GET/ENST Bretagne

  22. Intrusion detection: results based on PCA and kNN for kdd99 data GET/ENST Bretagne

  23. Intrusion identification: results based on PCA and kNN for kdd99 data GET/ENST Bretagne

  24. kNN and PCA methods comparison • kNN • No need for training • Suitable for dynamical envorinment • Require large computation in testing stage • Need computation (m – dimensionality of vector; n – number of samples) • PCA • Need considerable computation for training • Leight weight in testing stage • Need computation (p – number of different attack types; q – number principal components) • Suitable for detection massive data GET/ENST Bretagne

  25. Conclusion • KDD 99 transformation function didnot extract enough information from the raw data for anomaly detection • Using the 41 attributes can achieve 72% detection rate of Supelec normal data • kNN and PCA achieve good detection and identification results based on kdd99 data • PCA can process massive data sets • Identification process needs attack data set (sometimes it is difficult) • The 41 attributes may be reduced for light weight detection while remain the detection accuracy • Use some optimization methods for selecting key attributes in future work • Early and fast detection of network attacks is important • No need to wait the connection is finished and early detection is our future work GET/ENST Bretagne

  26. Thank for your attention! Merci pour votre attention! • Questions?

More Related