280 likes | 420 Vues
Realistic Online Simulation of Large-Scale Grids. Xin Liu, Huaxia Xia, Yang-suk Kee, Andrew A. Chien Department of Computer Science and Engineering and Center for Networked Systems University of California, San Diego September 27-28, 2004 Chateau de Faverges Lyon, France. Motivation.
E N D
Realistic Online Simulation of Large-Scale Grids Xin Liu, Huaxia Xia, Yang-suk Kee, Andrew A. Chien Department of Computer Science and Engineering and Center for Networked Systems University of California, San Diego September 27-28, 2004 Chateau de Faverges Lyon, France
Motivation • Simulation Tools enable Systematic Design and Evaluation • New Applications and Resources Create New Behaviors • Wide Variety of Performance Demands (4 to 6 Orders of Magnitude) • Tightly Couple Communication, Computing, and Storage Resources • Grids are Non-linear Dynamic Systems • Competitive Sharing in Open Resource Pools, Competitive, even Malicious • Unstable Resource Behavior, Denial of Service and System Failure • => Useful for Researchers, Grid Administrators, Designers, Application Designers, etc. • Testbeds (e.g. Grid2003, EU DataGrid, Planetlab, TeraGrid,…) are great, but only a partial solution • Limited control, configuration, observability, and repeatability • Expensive!
Simulation Requirements • Workload: Execute Real Software Fast, w/o Change • Accurate: High Fidelity Resource Modeling • General: Flexible Resource Modeling • Observable: Any Scale of Behavior • Repeatable: Diagnose “Heisenbugs” • Scalable: Large experiments • Large Numbers of Resources: (Thousands to Millions of nodes?) • Large Numbers of Interacting Applications
Simulation Approaches • Grid Modeling Toolkits • SimGrid [Casanova, UCSD], Bricks [Takefusa, Titech&Ochanomizu], MONARC [Neumann, CalTech] • Simplifying Assumptions, No Direct Execution • Detailed Network Simulation • Vint/NS-2 [Estrin, USC/LBL], SSF [Nicol, Dartmouth&UIUC], PDNS [Fujimoto, GaTech] • No Direct Execution, Not Online, Limited Scaling • Network Emulation • Emulab & Netbed [Lepreau, Utah] • Modelnet [Vahdat,Chase, Duke&UCSD] • Online, Limited Application Control, Network Approximations • Detailed Network and Grid Simulation • => MicroGrid
The Big Picture • Realistic Configurations • Network Structure • Topology Generators (Brite [Lakhina2001]) • Routing Configurators (OSPF, BGP Synthesis [LiuChien]) • Resource Structure • Synthetic Resource Generators [Dinda2003, Kee&Chien 2004] • Realistic Workloads • Background Traffic Generation • Native Middleware and Applications • MicroGrid = Simulation Tools for Large Scale experiments • [SC2000, SC2003, JOGC2004, SC2004]
Grid Application Virtual Grid Configuration MicroGrid Software Using the MicroGrid • Find some physical resources • Configure a Virtual Grid • Submit a Grid Application Job to it • Observe Execution (which occurs in virtual time) • DeConfigure the Virtual Grid
Virtual Resource Virtual Resource Virtual Resource Virtual Resource MaSSF: Scalable Network Simulation Resource, Network, and Information Service Virtualization • Grids = Resources + Networks + Services • Virtual Resources Host Applications • Binary Interception Captures Live Network Traffic from Applications • MaSSF Simulates Traffic Behavior in Virtual Network and Delivers Traffic to Applications
MicroGrid Highlights • 2000: Binary Interception enables Transparent Virtualization [Liu&Chien SC2000] • “Virtual time” enables wide range of relative performance experiments • 2003: Scalable Packet-level Simulation provides accurate protocol behavior [Liu&Chien SC2003] • Profile and Topology driven Graph Partitioners • Full TCP, Router, OSPF, and BGP modeling • 2004: MicroGrid validated on diverse benchmarks & grid applications [Xia, Liu, & Chien JOGC 2004] • 2004: Large-Scale ISP Simulation (20,000+ routers) based on new hierarchical load-balance and partition [Liu&Chien SC2004] • Source Releases: February 2003, July 2003 • February 2004, MicroGrid Version 2.4.4 • http://www-csag.ucsd.edu/
MaSSF: Scalable Network Simulation • Efficient Conservative, Parallel Discrete Event Online Simulation • Accepts Traffic from Live Applications • Presents Real Performance Feedback • Runs Slower than Real-time • Slows Application Execution to Preserve Interactions • Application, Network Protocols, Network all operate in Virtual Time • Can Study Future Hardware Speeds and Ratios! • Talk focus is Scalable Online Network simulation, particularly Load Balance
Load Varies Widely • Instantaneous Bandwidth, 1 second intervals
Traffic Information Network Structure Graph Preparation Constraints Objectives G Graph Partitioning Algorithms Partitioned Network Network Partition Problem • Network Nodes, Varied Link Speeds and Latencies • Speeds Bound Workload, Latencies Determine Coupling • Actual Traffic Determines Workload • => Choose Granularity • => Choose Grouping
Approaches I • METIS Graph Partitioner • Topology (Intelligent Design) • Application Placement (High BW Apps) • Profile (Actual Behavior)
Usage Data Improves Partition • 24 application; 8 simulation nodes • Profile is 66% improvement over others • Doesn’t effectively optimize for concurrency in large simulation
Approaches I • METIS Graph Partitioner • Topology (Intelligent Design) • Application Placement (High BW Apps) • Profile (Actual Behavior) • Hierarchical Partitioning 1. Grouping to Optimize MLL 2a. Topology (Intelligent Design) 2b. Profile (Actual Behavior)
Hierarchical Partitioning • Minimal Link Latency (MLL) • Critically Effects Available Parallelism • METIS chooses poor Partitions for MLL • MLL-driven Two-level Partitioning • Summarize Network Graph • => Merge nodes with link latency < threshold • Choose threshold by exploring exhaustively, and evaluating with metric to Optimize Parallelism and Load Balance • Metric • E = Cavg/Cmax * (MLL – Cn)/MLL Trades off Parallelism and Efficiency Evaluate Different Partition Outputs w/o Running the Simulation Parallelism at Load Balance MLL Limited Synch Efficiency
MLL Optimization is Effective • Evaluating MLL’s is Rapid, allowing exhaustive search even for large networks and profiles • Improvement in MLL is 10-30x => much greater concurrency • “Latency Tolerant” boundaries
Hierarchical Partition Improves Load Balance • MLL Optimization Increases Decoupling • Should make load balancing more difficult? • It Doesn’t! METIS partitioner on sumarize graph achieves ~ 40% better load balance Improvement • METIS Metric Optimization Works better on the Coarser Graphs
Hierarchical Partition Reduces Simulation Time • ~50% simulation time improvement • Greater Execution Efficiency within a Partition • Looser Coupling (MLL) increases Parallelism
Hierarchical Partition Improves Parallel Efficiency • Achieves 40+% Parallel Efficiency • Practically scaling to 20,000 routers using 90 cluster nodes (TeraGrid, FWGrid)
What can we simulate? • Quantitatively: 20K Routers, 100 AS’es, many Applications • Examples: • Large-scale behavior of Peer-to-Peer Applications in Mixed Backbone, Access, and Local-area Networks (e.g. Kazaa, BitTorrent, Gnutella) • Adaptive Applications and ReScheduling is controlled Grid resource environments [GrADS/VGrADS] • Resource Selection, Competitive Resource Sharing in Large-scale Grids [VGrADS] • Detailed Network Behavior under Denial-of-Service Attack [Wang2003] • …
Tolerating Denial-of-Service with Overlays Resource Pool (IP Network) Overlay Network Application Host Attacker Edge Proxy User Proxy
Challenges and Objectives • Large-scale Traffic • Malicious attacks, high load factors • Large Overlay and Overall Network • Accuracy is critical! • Packet drop, Link congestion • Research Questions… • What performance does application achieve? (various attacks and configurations) • Can we improve performance by redirecting applications? Can we effectively redirect applications while under attack? • Can we dynamically reconfigure proxy network to support improved application performance?
Experiment Includes uses Existing Software Packages • Overlay Network • ForwardEngine Prototype • Application • Apache server / Siege http clients • Attackers • simTrinoo UDP traffic • ~1000 routers, 64 node overlay network, 200 siege clients, 10GByte/s attacking traffic
MaSSF Models Detailed Network Configuration All Details: Buffer filling, Packet Drop
Summary • MicroGrid is an Integrated Framework for Grid Experiments • Real applications, Realistic Network and Resource Models • Integrated Online Grid Simulation • Topology Generators (maBrite), Background Traffic Generators, BGP Configuration, Resource Generators • Automatic Profile-based Load Balance! • Hierarchical MLL Tuning • Large-network Simulations at detailed packet-level provide new capabilities and insights • For More Info http://www-csag.ucsd.edu/ • MicroGrid is a part of the GrADS and VGrADS Projects
Faculty: Andrew Chien Postdocs: Kenjiro Taura Hyojong Song Yang-seok Ki Research Staff: Alex Olugbile PhD Students: Xin “Paff” Liu Ju “Tony” Wang Huaxia Xia Xianan Zhang Ranjita Bhagwan MS students: Denis Jakobsen CSAG MicroGrid Contributors