1 / 28

Realistic Online Simulation of Large-Scale Grids

Realistic Online Simulation of Large-Scale Grids. Xin Liu, Huaxia Xia, Yang-suk Kee, Andrew A. Chien Department of Computer Science and Engineering and Center for Networked Systems University of California, San Diego September 27-28, 2004 Chateau de Faverges Lyon, France. Motivation.

yank
Télécharger la présentation

Realistic Online Simulation of Large-Scale Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Realistic Online Simulation of Large-Scale Grids Xin Liu, Huaxia Xia, Yang-suk Kee, Andrew A. Chien Department of Computer Science and Engineering and Center for Networked Systems University of California, San Diego September 27-28, 2004 Chateau de Faverges Lyon, France

  2. Motivation • Simulation Tools enable Systematic Design and Evaluation • New Applications and Resources Create New Behaviors • Wide Variety of Performance Demands (4 to 6 Orders of Magnitude) • Tightly Couple Communication, Computing, and Storage Resources • Grids are Non-linear Dynamic Systems • Competitive Sharing in Open Resource Pools, Competitive, even Malicious • Unstable Resource Behavior, Denial of Service and System Failure • => Useful for Researchers, Grid Administrators, Designers, Application Designers, etc. • Testbeds (e.g. Grid2003, EU DataGrid, Planetlab, TeraGrid,…) are great, but only a partial solution • Limited control, configuration, observability, and repeatability • Expensive!

  3. Simulation Requirements • Workload: Execute Real Software Fast, w/o Change • Accurate: High Fidelity Resource Modeling • General: Flexible Resource Modeling • Observable: Any Scale of Behavior • Repeatable: Diagnose “Heisenbugs” • Scalable: Large experiments • Large Numbers of Resources: (Thousands to Millions of nodes?) • Large Numbers of Interacting Applications

  4. Simulation Approaches • Grid Modeling Toolkits • SimGrid [Casanova, UCSD], Bricks [Takefusa, Titech&Ochanomizu], MONARC [Neumann, CalTech] • Simplifying Assumptions, No Direct Execution • Detailed Network Simulation • Vint/NS-2 [Estrin, USC/LBL], SSF [Nicol, Dartmouth&UIUC], PDNS [Fujimoto, GaTech] • No Direct Execution, Not Online, Limited Scaling • Network Emulation • Emulab & Netbed [Lepreau, Utah] • Modelnet [Vahdat,Chase, Duke&UCSD] • Online, Limited Application Control, Network Approximations • Detailed Network and Grid Simulation • => MicroGrid

  5. The Big Picture • Realistic Configurations • Network Structure • Topology Generators (Brite [Lakhina2001]) • Routing Configurators (OSPF, BGP Synthesis [LiuChien]) • Resource Structure • Synthetic Resource Generators [Dinda2003, Kee&Chien 2004] • Realistic Workloads • Background Traffic Generation • Native Middleware and Applications • MicroGrid = Simulation Tools for Large Scale experiments • [SC2000, SC2003, JOGC2004, SC2004]

  6. Grid Application Virtual Grid Configuration MicroGrid Software Using the MicroGrid • Find some physical resources • Configure a Virtual Grid • Submit a Grid Application Job to it • Observe Execution (which occurs in virtual time) • DeConfigure the Virtual Grid

  7. Virtual Resource Virtual Resource Virtual Resource Virtual Resource MaSSF: Scalable Network Simulation Resource, Network, and Information Service Virtualization • Grids = Resources + Networks + Services • Virtual Resources Host Applications • Binary Interception Captures Live Network Traffic from Applications • MaSSF Simulates Traffic Behavior in Virtual Network and Delivers Traffic to Applications

  8. MicroGrid Highlights • 2000: Binary Interception enables Transparent Virtualization [Liu&Chien SC2000] • “Virtual time” enables wide range of relative performance experiments • 2003: Scalable Packet-level Simulation provides accurate protocol behavior [Liu&Chien SC2003] • Profile and Topology driven Graph Partitioners • Full TCP, Router, OSPF, and BGP modeling • 2004: MicroGrid validated on diverse benchmarks & grid applications [Xia, Liu, & Chien JOGC 2004] • 2004: Large-Scale ISP Simulation (20,000+ routers) based on new hierarchical load-balance and partition [Liu&Chien SC2004] • Source Releases: February 2003, July 2003 • February 2004, MicroGrid Version 2.4.4 • http://www-csag.ucsd.edu/

  9. MaSSF: Scalable Network Simulation • Efficient Conservative, Parallel Discrete Event Online Simulation • Accepts Traffic from Live Applications • Presents Real Performance Feedback • Runs Slower than Real-time • Slows Application Execution to Preserve Interactions • Application, Network Protocols, Network all operate in Virtual Time • Can Study Future Hardware Speeds and Ratios! • Talk focus is Scalable Online Network simulation, particularly Load Balance

  10. Load Varies Widely • Instantaneous Bandwidth, 1 second intervals

  11. Traffic Information Network Structure Graph Preparation Constraints Objectives G Graph Partitioning Algorithms Partitioned Network Network Partition Problem • Network Nodes, Varied Link Speeds and Latencies • Speeds Bound Workload, Latencies Determine Coupling • Actual Traffic Determines Workload • => Choose Granularity • => Choose Grouping

  12. Approaches I • METIS Graph Partitioner • Topology (Intelligent Design) • Application Placement (High BW Apps) • Profile (Actual Behavior)

  13. Usage Data Improves Partition • 24 application; 8 simulation nodes • Profile is 66% improvement over others • Doesn’t effectively optimize for concurrency in large simulation

  14. Approaches I • METIS Graph Partitioner • Topology (Intelligent Design) • Application Placement (High BW Apps) • Profile (Actual Behavior) • Hierarchical Partitioning 1. Grouping to Optimize MLL 2a. Topology (Intelligent Design) 2b. Profile (Actual Behavior)

  15. Hierarchical Partitioning • Minimal Link Latency (MLL) • Critically Effects Available Parallelism • METIS chooses poor Partitions for MLL • MLL-driven Two-level Partitioning • Summarize Network Graph • => Merge nodes with link latency < threshold • Choose threshold by exploring exhaustively, and evaluating with metric to Optimize Parallelism and Load Balance • Metric • E = Cavg/Cmax * (MLL – Cn)/MLL  Trades off Parallelism and Efficiency  Evaluate Different Partition Outputs w/o Running the Simulation Parallelism at Load Balance MLL Limited Synch Efficiency

  16. MLL Optimization is Effective • Evaluating MLL’s is Rapid, allowing exhaustive search even for large networks and profiles • Improvement in MLL is 10-30x => much greater concurrency • “Latency Tolerant” boundaries

  17. Hierarchical Partition Improves Load Balance • MLL Optimization Increases Decoupling • Should make load balancing more difficult? • It Doesn’t! METIS partitioner on sumarize graph achieves ~ 40% better load balance Improvement • METIS Metric Optimization Works better on the Coarser Graphs

  18. Hierarchical Partition Reduces Simulation Time • ~50% simulation time improvement • Greater Execution Efficiency within a Partition • Looser Coupling (MLL) increases Parallelism

  19. Hierarchical Partition Improves Parallel Efficiency • Achieves 40+% Parallel Efficiency • Practically scaling to 20,000 routers using 90 cluster nodes (TeraGrid, FWGrid)

  20. What can we simulate? • Quantitatively: 20K Routers, 100 AS’es, many Applications • Examples: • Large-scale behavior of Peer-to-Peer Applications in Mixed Backbone, Access, and Local-area Networks (e.g. Kazaa, BitTorrent, Gnutella) • Adaptive Applications and ReScheduling is controlled Grid resource environments [GrADS/VGrADS] • Resource Selection, Competitive Resource Sharing in Large-scale Grids [VGrADS] • Detailed Network Behavior under Denial-of-Service Attack [Wang2003] • …

  21. Tolerating Denial-of-Service with Overlays Resource Pool (IP Network) Overlay Network Application Host Attacker Edge Proxy User Proxy

  22. Challenges and Objectives • Large-scale Traffic • Malicious attacks, high load factors • Large Overlay and Overall Network • Accuracy is critical! • Packet drop, Link congestion • Research Questions… • What performance does application achieve? (various attacks and configurations) • Can we improve performance by redirecting applications? Can we effectively redirect applications while under attack? • Can we dynamically reconfigure proxy network to support improved application performance?

  23. Experiment Includes uses Existing Software Packages • Overlay Network • ForwardEngine Prototype • Application • Apache server / Siege http clients • Attackers • simTrinoo UDP traffic • ~1000 routers, 64 node overlay network, 200 siege clients, 10GByte/s attacking traffic

  24. MaSSF Models Detailed Network Configuration All Details: Buffer filling, Packet Drop

  25. CDF of User Observed Response Time

  26. Summary • MicroGrid is an Integrated Framework for Grid Experiments • Real applications, Realistic Network and Resource Models • Integrated Online Grid Simulation • Topology Generators (maBrite), Background Traffic Generators, BGP Configuration, Resource Generators • Automatic Profile-based Load Balance! • Hierarchical MLL Tuning • Large-network Simulations at detailed packet-level provide new capabilities and insights • For More Info http://www-csag.ucsd.edu/ • MicroGrid is a part of the GrADS and VGrADS Projects

  27. Faculty: Andrew Chien Postdocs: Kenjiro Taura Hyojong Song Yang-seok Ki Research Staff: Alex Olugbile PhD Students: Xin “Paff” Liu Ju “Tony” Wang Huaxia Xia Xianan Zhang Ranjita Bhagwan MS students: Denis Jakobsen CSAG MicroGrid Contributors

More Related