1 / 13

Information Fusion

Information Fusion. Ganesh Godavari. DDoS Data Set. DARPA DDoS data set (2000) is available MIT Lincoln Laboratory Data Set spans approximately 3 hours The five phases of the attack scenario depicted [1]: IPsweep of the Air Force Base from a remote site

gunda
Télécharger la présentation

Information Fusion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Fusion Ganesh Godavari

  2. DDoS Data Set • DARPA DDoS data set (2000) is available • MIT Lincoln Laboratory • Data Set spans approximately 3 hours • The five phases of the attack scenario depicted [1]: • IPsweep of the Air Force Base from a remote site • Probe of live IP's to look for the sadmind daemon running on Solaris hosts • Breakins via the sadmind vulnerability, both successful and unsuccessful on those hosts • Installation of the trojan mstream DDoS software on three hosts at the AFB • Launching the DDoS

  3. Attack Scenario [1]

  4. Phase 1 Attack (DDoS DataSet) Date Time Duration SrcIP Target IP Analyzer Service 03/07/2000 09:51:36 00:00:00 202.77.162.213 172.16.115.5 tcpdump_inside icmp-E-R 03/07/2000 09:51:36 00:00:05 172.16.112.194 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:36 00:00:00 202.77.162.213 172.16.115.20 tcpdump_inside icmp-E-R 03/07/2000 09:51:36 00:00:00 172.16.115.20 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:38 00:00:00 202.77.162.213 172.16.115.87 tcpdump_inside icmp-E-R 03/07/2000 09:51:38 00:00:00 172.16.115.87 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:41 00:00:00 202.77.162.213 172.16.115.234 tcpdump_inside icmp-E-R 03/07/2000 09:51:50 00:00:00 202.77.162.213 172.16.113.50 tcpdump_inside icmp-E-R 03/07/2000 09:51:50 00:00:00 172.16.113.50 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:51 00:00:00 202.77.162.213 172.16.113.84 tcpdump_inside icmp-E-R 03/07/2000 09:51:51 00:00:09 172.16.112.194 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:51 00:00:00 202.77.162.213 172.16.113.105 tcpdump_inside icmp-E-R 03/07/2000 09:51:51 00:00:00 172.16.113.105 202.77.162.213 tcpdump_inside icmp-E-Rp 03/07/2000 09:51:52 00:00:00 202.77.162.213 172.16.113.148 tcpdump_inside icmp-E-R : : : : : : : : : : : : 03/07/2000 09:52:00 00:00:00 202.77.162.213 172.16.112.194 tcpdump_inside icmp-E-R 03/07/2000 09:52:00 00:00:00 202.77.162.213 172.16.112.207 tcpdump_inside icmp-E-R icmp-E-R => icmp-echo-request icmp-E-Rp => icmp-echo-reply

  5. Algorithm Step 1: go over the data file and build vocabulary • Read all the unique fields in the data files Step 2: identify the frequent vocabulary in the data file • How to determine frequency? How can one determine the threshold for frequency ? Step 3: Generate cluster candidates • Lines containing the same frequent words form cluster Step 4: Identify temporal relationships between cluster candidates • The 24 relationships of data Step 5: Generate unique lines • Lines in the data file in based on the candidate cluster

  6. Need Suggestions • Is it safe to assume that a threshold parameter is provided? • Cluster candidate generation can involve too much data generation (next slide shows how) • 24 relations cover everything. Need to identify on which we are interested in?

  7. Cluster Candidate Generation • Data Set has 8 dimensions • frequent words(4byte col. # word) with threshold > 10 are • 0004202.77.162.213 repeated 22 • 000103/07/2000 repeated 33 • 000300:00:00 repeated 31 • 0007icmp-echo-request repeated 22 • 0007icmp-echo-reply repeated 11 • 0006tcpdump_inside repeated 33 • 0005202.77.162.213 repeated 11

  8. Candidate Generation Example • Example 03/07/2000 09:51:36 00:00:00202.77.162.213 172.16.115.5 tcpdump_insideicmp-E-R 03/07/2000 09:51:36 00:00:05 172.16.112.194 202.77.162.213tcpdump_inside icmp-E-Rp 03/07/2000 09:51:36 00:00:00202.77.162.213 172.16.115.20 tcpdump_inside icmp-E-R 03/07/2000 09:51:36 00:00:00 172.16.115.20 202.77.162.213 tcpdump_insideicmp-E-Rp • In all data first field is common so should they be considered as a candidate cluster? for each frequent-word in frequent-word-list { While (Read a Line of data != EOF) { if (frequent-word in line) add line no. to Cluster } // end of while } // end of for Cluster 1 = { line 1, line 2, line 3, line 4} Cluster 2 = { line 1, line 3, line 4} Cluster 3 = { line 1, line 3} Cluster 4 = { line 2, line 4} Cluster 5 = { line 1, line 2, line 3, line 4} Cluster 5 = { line 1, line 3} Cluster 6 = { line 2, line 4}

  9. Another Approach? • Reduction but loss of information? Char Key While (Read a Line of data != EOF) { for each frequent-word in frequent-word-list { if (frequent-word in line) key = key + frequent-word } // end of for if ( key not in Cluster) add line no. to cluster } // end of while • Cluster 1 = { line 1, line 3} • Cluster 2 = { line 2} • Cluster 3 = { line 4}

  10. Temporal Relations • Unable to find a place where the 24 temporal relationship do not meet • Need to identify relationships that are needed by the decision making

  11. Work to be done • Completed the algorithm and coding part till step 4.

  12. References [1] MIT Lincoln laboratories http://www.ll.mit.edu/IST/ideval/data/2000/2000_data_index.html

More Related