Nagi Rao , Bill Wing, Tony Mezzacappa, Qishi Wu, Mengxia Zhu Oak Ridge National Laboratory

Enabling Supernova Computations by Integrated Transport and Provisioning Methods Optimized for Dedicated Channels Nagi Rao,Bill Wing, Tony Mezzacappa, Qishi Wu, Mengxia Zhu Oak Ridge National Laboratory Malathi Veeraraghavan University of Virginia DOE MICS PI Meeting: High-Performance Networking Program September 27-29, 2005 Brookhaven National Laboratory

Outline • Background • Networking for TSI • Cray X1 Connectivity • USN-CHEETAH Peering • Network-Supported Visualizations

DOE ORNL-UVA Project: Complementary Roles • Project Components: • Provisioning for UltraScience Net - GMPLS • File transfers for dedicated channels • Peering – DOE UltraScience Net and NSF CHEETAH • Network optimized visualizations for TSI • TSI application support over UltraScience Net + CHEETAH ORNL UVA Visualization TSI Application Provisioning File Transfers Peering • This project leverages two projects • DOE UltraScience Net • NSF CHEETAH

Terascale Supernova Initiative - TSI • Science Objective: Understand supernova evolutions • DOE SciDAC Project: ORNL and 8 universities • Teams of field experts across the country collaborate on computations • Experts in hydrodynamics, fusion energy, high energy physics • Massive computational code • Terabyte/day generated currently • Archived at nearby HPSS • Visualized locally on clusters – only archival data • Current Networking Challenges • Limited transfer throughput • Hydro code – 8 hours to generate and 14 hours to transfer out • Runaway computations • Find out after the fact that parameters needed adjustment

TSI Desired Capabilities Data and File Transfers (terabyte – petabyte) • Move data from computations on supercomputers • Supply data to visualizations on clusters and supercomputers Interactive Computations and Visualization • Monitor, collaborate and steer computations • Collaborative and comparative visualizations Visualization channel Visualization control channel Steering channel Computation or visualization

USN-CHEETAH Peering: Data-plane CDCI • Peering: data and control planes • Coast-to-coast dedicated channels • Access to ORNL supercomputers to CHEETAH users e300 UltraScience Net CHEETAH UltraScienceNet Peering at ORNL: Data plane: 10GigE between SN16000 and e300 Control-Plane: VPN tunnel CHEETAH

USN-CHEETAH Peering: Control-Plane • Wrap USN control-plane with GMPLS RSVP-TE GMPLS Wrapper CHEETAH GMPLS Control-plane USN TL1/CLI Centralized Control-plane • Authenticated Encrypted Tunnel USN Control-host SN16000 CHEETAH ORNL NS-50

Network Connectivity of Cray X-Class X1(E): 1GigE 10GigE (future) Cray Network Subsystem (CNS) FC Cray nodes crossconnect FC Cray-X1(E): upgraded version of X1 10GigE upgrade to CNS FC Disk storage Redstorm: Cray nodes 1/10GigE crossconnect Cluster-based architecture GigE-based cross connect FC FC Disk storage Cray X2: Expected to be based on a combination of X1 and Redstorm – Cray Rainier Plans

Internal Data Paths of Supercomputers • We concentrate on two types of connections: • Ethernet/IP connections from compute/service nodes • FiberChannel (FC) connections to disks • Analysis of Internal data paths to identify potential bottlenecks: • X1(E): 1GigE – FC; 10GigE – FC bundling • X2: 1/10GigE channels; FC channels • Coordinate with Cray’s plans Disk path Disk path FC FC convert CNS X1(E) Cross-connect Network path FC Network path compute nodes Service nodes FC compute nodes Service nodes disk Cross-connect X2 (expected)

Experimental Results:Production 1GigE ConnectionCray X1 to NCSU • Tuned/ported existing bbcp protocol (unicos OS): • optimized to achieve 250-400Mbps from Cray X1 to NCSU; • actual throughput varies as a function of lnternet traffic • tuned TCP achieves ~50 Mbps. currently used in production mode by John Blondin • developed new protocol called Hurricane • achieves stable 400Mbps using a single stream from Cray X1 to NCSU; These throughput levels are the highest achieved between ORNL Cray X1 and a remote site located several hundred miles away. GigE Cray X1 GigE Linux cluster Juniper M340 Cisco All user connection Shared Internet connection

Experimental Results Cray X1: Dedicated Connection Initial testing – Dedicated Channel • UCNS connected to Cray X1 via four 2Gbps FC connections. • UCNS is connected to another linux host via 10 GigE connection • Transfer results: • 1.4Gbps using single flow using Hurricane protocol highest file transfer rates achieved over Ethernet connections from ORNL Cray X1 to an external (albeit local) host 2G FC Cray OS nodes UCNS Local host 10GigE Cray FC convert Cray X1 upgrade upgrade 2G FC UCNS NCSU cluster Cray X1E Faster processors 1 Gbps CHEETAH 600 miles

1Gbps Dedicated Connection: Cray X1(E) - NSCU orbitty cluster Performance degraded: bbcp: 30-40Mbps; single TCP: 5 Mbps Hurricane – 400Mbps (no jobs) – 200Mbps (with jobs) Performance bottleneck is identified inside Cray X1E OS nodes UltraScienceNet National Leadership Class Facility Computer CHEETAH

Modules of Visualization Pipeline • Visualization Modules • Pipeline consists of several modules • Some modules are better suited to certain network nodes • Visualization clusters • Computation clusters • Power walls • Data transfers between modules are of varied sizes and rates Note: Commercial tools do not support efficient decomposition

Grouping Visualization Modules • Grouping • Decompose the pipeline into modules • Combine the modules into groups • Transfers on single node are generally faster • Between node transfers take place over the network • Align bottleneck network links between modules with least data requirements

Optimal Mapping of Visualization Pipeline:Minimization of Total Delay Dynamic Programming Solution • Combine modules into groups • Align bottleneck network links between modules with least data requirements • Polynomial-time solvable – not NP-complete • Note: • Commercial tools (Ensight) are not readily amenable to optimal network deployment • This method can be implemented into tools that provide appropriate hooks

Optimal Mapping of Visualization Pipeline:Maximization of Frame Rate Dynamics Programming Solution • Align bottleneck network links between modules with least data requirements • Polynomial-time solvable – not NP-complete

Computational Steering • Monitor output and modify the parameters while computation is running • computational monitoring and steering • Computation cycles and time can be saved if • unproductive jobs can be terminated • strayed parameters can be corrected in time.

Experimental Results • Deployed on six Internet nodes located at ORNL, LSU, UT, NCSU, OSU, and GaTech • UT and NCSU are clusters • Configuration: • Client at ORNL, • CM node at LSU • DS nodes at OSU and GaTech • CS nodes at UT and NCState.

Estimate Network and Computing Time • The overall estimation error of transport and computing times is within 5.0%, which demonstrates the accuracy of our performance models for network and visualization parts. • We also observed that the system overhead is less than one second. This overhead consists of two components: setup time and loop time.

Performance Comparison for Different Viz Loops • Optimal visualization pipeline: GaTech-UT-ORNL: • GaTech isdata storage node and • UT is used as a computing node. • The differences in these end-to-end delay measurements are mainly caused by the • disparities in the computing power of nodes and • bandwidths of network links connecting them • Optimal visualization loop provided substantial performance enhancements over other pipeline configurations.

Comparison with ParaView • Tested out lightweight system with ParaView under the same configuration • our system consistently achieved relatively better performances than ParaView. • Performance differences may have been caused by higher processing and communication overhead incurred by ParaView.

Examples: Human Head Isosurface

Example datasets

ORNL Personnel Nagi Rao,Bill Wing, Tony Mezzacappa (PIs) Qishi Wu (Post-Doctoral Fellow) Mengxia Zhu (Phd Student – Louisiana State Uni.) PhD Thesis Mengxia Zhu, Adaptive Remote Visualization System With Optimized Network Performance for Large Scale Scientific Data, Department of Computer Science, Lousiana State University, defending on October 3, 2005 • Papers • X. Zheng, M. Veeraraghavan, N. S. V. Rao, Q. Wu, and M. Zhu. CHEETAH: Circuit-switched high-speed end-to-end transport architecture testbed, IEEE Communications Magazine, 2005. • N. S. Rao, S. M. Carter, Q. Wu, W. R. Wing, M. Zhu, A. Mezzacappa, M. Veeraraghavan, J. M. Blondin, Networking for Large-Scale Science: Infrastructure, Provisioning, Transport and Application Mapping, SciDAC Meeting, 2005. • M. Zhu, Q. Wu, N. S. V. Rao, S. S.Iyengar, Adaptive Visualization Pipeline Partition and Mapping on Computer Network, International Conference on Image Processing and Graphics, ICIG2004. • M. Zhu, Q. Wu, N. S. V. Rao, S. S.Iyengar, “On Optimal Mapping of Visualization Pipeline onto Linear Arrangement of Network Nodes”, International Conference on Visualization and Data Analysis, 2005

Conclusions • Summary • Developed several components to support TSI • USN-CHEETAH peering • File and data transfers • Visualization modules • Ongoing Tasks • USN-CHEETAH GMPLS peering • Work with Cray to address performance issues • Transitioning visualization system to production TSI

Nagi Rao , Bill Wing, Tony Mezzacappa, Qishi Wu, Mengxia Zhu Oak Ridge National Laboratory

Nagi Rao , Bill Wing, Tony Mezzacappa, Qishi Wu, Mengxia Zhu Oak Ridge National Laboratory

Presentation Transcript

Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division Oak Ridge National Laboratory raons@ornl

Chris Abernathy – Oak Ridge National Laboratory

Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division Oak Ridge National Laboratory raons@ornl

Oak Ridge National Laboratory Mentor-Protégé Program

Doing Business with Oak Ridge National Laboratory

Collaborations Dr. Orlando Rios, Oak Ridge National Laboratory

Oak Ridge National Laboratory Status Report

Oak Ridge National Laboratory Status Report

Co-authors: Dan Hayes – Oak Ridge National Laboratory, USA

Oak Ridge National Laboratory Status Report

Oak Ridge National Laboratory

Jim Keiser Oak Ridge National Laboratory Oak Ridge, Tennessee With Contributions From

Oak Ridge National Laboratory

Alexander Aleksandrov Oak Ridge National Laboratory Oak Ridge, USA

OAK RIDGE NATIONAL LABORATORY U.S. DEPARTMENT OF ENERGY

Oak Ridge National Laboratory Computing and Computational Sciences

Robin L. Graham Oak Ridge National Laboratory

L. J. Ott Oak Ridge National Laboratory

Mark Downing Agricultural Economist Oak Ridge National Laboratory

Implementing PAGES at the Oak Ridge National Laboratory

Writing Proposals for Oak Ridge National Laboratory

Christian Y. Cardall Oak Ridge National Laboratory Physics Division