200 likes | 319 Vues
This report presents a methodology for the collection and analysis of BGP routing data between 1998 and 2003. It details the process of importing raw data into a database, the challenges encountered, and solutions implemented to accommodate large data volumes. The original database schema and its modifications to reduce record size are discussed, together with the optimization of queries to improve performance. Key findings reveal trends in message types and notable dominance of certain IP addresses within the routing tables. Recommendations for database management are also provided.
E N D
CS519 BGP Project Report Kai-Wen Chung (kc279) San-Yiu Cheng (sc345)
How to Proceed BGP Analysis Collect Raw Data Import into Database Query Database and Analyze data
Collect Raw Data • MAE-EAST (1998.1 ~ 1998.11) • http://archive.routeviews.org/ (2003.1 ~ 2003.3)
Database Schema • Original Schema
Database Schema (cont.) • Record Size • Message: 94 bytes/record • MsgPath: 18 bytes/record • # Record • Message: 104,841,405 (98.1 ~ 98.11) • MsgPath: 251,442,478 (98.1 ~ 98.11)
Database Schema (cont.) • Database space allocation: 20GB • About 12 hours to import raw data for 1 month (about 10,000,000 messages, and 20,000,000 paths) • Data volume reaches limitation soon
Our Solution • Allocate larger space • Move Database from SQLServer -> Sparrow • Total 70GB • Modify data schema to reduced record size
Data Schema Modification • Record Size • Message: 52 bytes/record • MsgPath: 14 bytes/record • Size Reduces • Message: 46.9% • MsgPath: 22.2% • Faster Data Importing
Current Status • Database • P3-500 with 128MB ram, and Windows 2000 Server and SQL Server 2000 installed • Imported Data • 1998.1 ~ 1998.11. About 21GB in DB • 2003.3. About 34GB in DB
Current Database Issue • SQL Server Performance • It could take several hours to run a query • Space problem • 70GB is only enough for data of 1 ~ 2 month (of 2003) • We need a “Tera-byte” Database to accommodate all data of 2002, and 2003
Summary of Data • Total space used: • ~55G (1998 and 03/2003) • Number of Messages: • ~220.5 Million (1998 and 03/2003) • Number of DataSet: • ~30,000 (1998 and 03/2003)
Summary of Data (cont.) • A small number of IP addresses dominate the routing table • 15 Source IP addresses occupy about 68% of the PeerIp field of the Messages • 15 Destination IP Addresses occupy about 47% of the NextHop field of the Messages
Summary of Data (cont.) • Advertisement Vs. Withdrawal Messages • There are about 220 Million Messages • ~31.5% of all Messages are Withdrawal Messages • ~68.5% of all Messages are Advertisement Messages
Some Advices • Optimize your query • Some queries are going to take several hours to execute • Test on bgpbaby first • This is a smaller version of bgpdata (~1G) • Don’t try to execute all your queries on last day • The SQL Server database is going to be overwhelmed