Magellan: A Tool for Unicast Fault Isolation
220 likes | 356 Vues
Magellan is an advanced tool designed for unicast fault isolation, focusing on enhancing Internet routing efficiency. By employing automated heuristics for fault diagnosis and proactive monitoring, it aims to inform users about network issues like slow connectivity and unreachable sites. Magellan analyzes rich historical route data to identify link and router failures, offering insights into network stability. With its capabilities in scaling and directed search, it employs a robust routing graph to pinpoint failures and diagnose oscillations, making it an invaluable asset for network monitoring and troubleshooting.
Magellan: A Tool for Unicast Fault Isolation
E N D
Presentation Transcript
Magellan: A Tool for Unicast Fault Isolation Cengiz Alaettinoglu Packet Design LLC Ramesh Govindan Information Sciences Institute John Mehringer Information Sciences Institute
Motivation • Why can't I reach www.cnn.com? • Why is the Internet soooo slow today? • It was fine yesterday!
Goals • User's perspective • What is of interest to user • Internet wide routing monitoring • not just an AS • History of route changes • not just a snapshot • Fault diagnosis • link/router failure/repair
Challenges • Scaling • Directed search by correlating destinations • Shared learning • Automated heuristics for fault isolation • Route change • Location of link/router failure/repair • Oscillations • Others?
Data Collection • Select target's interesting to the user • tcpdump/libpcap • Weighting / aging (not implemented) • Initial path to targets • traceroute • Monitoring paths • Carefully constructed ICMP probes
Monitoring • Construct a routing graph • Nodes: routers • Links: (to, from, source, destination, hop, statistics...) • Probe each link • Send two ICMP Echo Request packets to destination • For ttl = hop - 1, hop, verify incident routers, to, from
Scheduling Probes • WRR schedule a probe for each link • Limits the rate of probe packets • Weights: some links are more important/interesting • Distance to link • No of destinations using it • History of volatility • Exponentially averaged
Test Result • Positive • Do nothing • Negative • Determine new path • Incremental traceroute from the link upstream and downstream • Determine cause • Automatic heuristics based
Active Fault Isolation • Link failure • Probe the link using other destinations that uses it • Correlate results • Router failure • Generalize on link failure • Oscillations • History of old routes • Back and forth between a set of routes
Magellan Components Magellan Nam • Visualization • Offline or real-time • Great for debugging/tuning Perl Script
Snapshot • Link or router failure • I want the nam buttons, etc...
Effectiveness thru Measurement • Picked 500 popular web sites • Yahoo, msn, aol, cnn, ... • www.web100.com • Monitored routes to these destinations for 7 days
Measurements • Number of Link Probes: 839694 • Probe per second: 1.39 / second • Total Failures: 2078 • Router Failures: 334 • Link Failures: 951 • Unknown cause: 793 • Transients • Number of Oscillations: 541
Future work: Distributed Magellan • Weight to probe inversely proportional to ratio of distances • Shared learning Magellan 1 Magellan 2
Related Work • Topology Maps • Router/AS level interconnections • Mercator, skitter, AT&T • Not all links are usable (routing policy/metrics) • Routing Topology • Effect of policy/metrics • Npd Vern Paxson's work • Focus is on measurement
Conclusions • Unicast fault isolation • User's perspective • Automated heuristics • History of changes • http://www.isi.edu/scan