UltraLight Overview

UltraLight Overview • Shawn McKee / University of Michigan • USATLAS Tier1 & Tier2 Network Planning Meeting • December 14, 2005 - BNL

The UltraLight Project • UltraLight is • A four year $2M NSF ITR funded by MPS. • Application driven Network R&D. • A collaboration of BNL, Caltech, CERN, Florida, FIU, FNAL, Internet2, Michigan, MIT, SLAC. • Significant international participation: Brazil, Japan, Korea amongst many others. • Goal: Enable the network as a managed resource. • Meta-Goal:Enable physics analysis and discoveries which could not otherwise be achieved.

UltraLight Backbone • UltraLight has a non-standard core network with dynamic links and varying bandwidth inter-connecting our nodes. • Optical Hybrid Global Network • The core of UltraLight is dynamically evolving as function of available resources on other backbones such as NLR, HOPI, Abilene or ESnet. • The main resources for UltraLight: • LHCnet (IP, L2VPN, CCC) • Abilene (IP, L2VPN) • ESnet (IP, L2VPN) • Cisco NLR wave (Ethernet) • Cisco Layer 3 10GE Network • HOPI NLR waves (Ethernet; provisioned on demand) • UltraLight nodes: Caltech, SLAC, FNAL, UF, UM, StarLight, CENIC PoP at LA, CERN

UltraLight Layer3 Connectivity • Shown (courtesy of Dan Nae) is the current UltraLight Layer 3 connectivity as of Mid-October 2005

UltraLight Network Usage

UltraLight Sites • UltraLight currently has 10 participating core sites (shown alphabetically) • Details and diagrams for each site will be reported Tuesday during “Network” day

UltraLight Network: PHASE I • Implementation via “sharing” with HOPI/NLR • Also LA-CHI Cisco/NLR Research Wave • DOE UltraScienceNet Wave SNV-CHI (LambdaStation) • Connectivity to FLR to be determined • MIT involvement welcome, but unfunded AMPATH UERJ, USP Plans for Phase I from Oct. 2004

UltraLight Network: PHASE II • Move toward multiple “lambdas” • Bring in FLR, as well as BNL (and MIT) • General comment: We are almost here! AMPATH UERJ, USP

UltraLight Network: PHASE III • Move into production • Optical switching fully enabled amongst primary sites • Integrated international infrastructure • Certainly reasonable sometime in the next few years… AMPATH UERJ, USP

Workplan/Phased Deployment • UltraLight envisions a 4 year program to deliver a new, high-performance, network-integrated infrastructure: • Phase I will last 12 months and focus on deploying the initial network infrastructure and bringing up first services • Phase II will last 18 months and concentrate on implementing all the needed services and extending the infrastructure to additional sites • Phase III will complete UltraLight and last 18 months. The focus will be on a transition to production in support of LHC Physics (+ eVLBI Astronomy, +<insert your e-Science here>?) We are HERE!

UltraLight Network Engineering • GOAL: Determine an effective mix of bandwidth-management techniques for this application-space, particularly: • Best-effort/“scavenger” using effective ultrascale protocols • MPLS with QOS-enabled packet switching • Dedicated paths arranged with TL1 commands, GMPLS • PLAN: Develop, Test the most cost-effective integrated combination of network technologies on our unique testbed: • Exercise UltraLight applications on NLR, Abilene and campus networks, as well as LHCNet, and our international partners • Progressively enhance Abilene with QOS support to protect production traffic • Incorporate emerging NLR and RON-based lightpath and lambda facilities • Deploy and systematically study ultrascale protocol stacks (such as FAST) addressing issues of performance & fairness • Use MPLS/QoS and other forms of BW management, and adjustments of optical paths, to optimize end-to-end performance among a set of virtualized disk servers

UltraLight: Effective Protocols • The protocols used to reliably move data are a critical component of Physics “end-to-end” use of the network • TCP is the most widely used protocol for reliable data transport, but is becoming ever more ineffective for higher and higher bandwidth-delay networks. • UltraLight is exploring extensions to TCP (HSTCP, Westwood+, HTCP, FAST) designed to maintain fair-sharing of networks and, at the same time, to allow efficient, effective use of these networks. • Currently FAST is in our “UltraLight Kernel” (a customized 2.6.12-3 kernel). This was used in SC2005. We are planning to broadly deploy a related kernel with FAST. Longer term we can then continue with access to FAST, HS-TCP, Scalable TCP, BIC and others.

UltraLight Kernel Development • Having a standard tuned kernel is very important for a number of UltraLight activities: • Breaking the 1 GB/sec disk-to-disk barrier • Exploring TCP congestion control protocols • Optimizing our capability for demos and performance • The current kernel incorporates the latest FAST and Web100 patches over a 2.6.12-3 kernel and includes the latest RAID and 10GE NIC drivers. • The UltraLight web page (http://www.ultralight.org ) has a Kernel page which provides the details off the Workgroup->Network page

MPLS/QoS for UltraLight • UltraLight plans to explore the full range of end-to-end connections across the network, from best-effort, packet-switched through dedicated end-to-end light-paths. • MPLS paths with QoSattributes fill a middle ground in this network space and allow fine-grained allocation of virtual pipes, sized to the needs of the application or user. TeraPaths Initial QoS test at BNL UltraLight, in conjunction with the DoE/MICS funded TeraPaths effort, is working toward extensible solutions for implementing such capabilities in next generation networks Terapaths URL: http://www.atlasgrid.bnl.gov/terapaths/

Optical Path Plans • Emerging “light path” technologies are becoming popular in the Grid community: • They can extend and augment existing grid computing infrastructures, currently focused on CPU/storage, to include the network as an integral Grid component. • Those technologies seem to be the most effective way to offer network resource provisioning on-demand between end-systems. • A major capability we are developing in Ultralight is the ability to dynamically switch optical paths across the node, bypassing electronic equipment via a fiber cross connect. • The ability to switch dynamically provides additional functionality and also models the more abstract case where switching is done between colors (ITU grid lambdas).

MonaLisa to Manage LightPaths • Dedicated modules to monitor and control optical switches • Used to control • CALIENT switch @ CIT • GLIMMERGLASS switch @ CERN • ML agent system • Used to create global path • Algorithm can be extended to include prioritisation and pre-allocation

Monitoring for UltraLight • Network monitoring is essential for UltraLight. • We need to understand our network infrastructure and track its performance both historically and in real-time to enable the network as a managed robust component of our overall infrastructure. • There are two ongoing efforts we are leveraging to help provide us with the monitoring capability required: IEPM http://www-iepm.slac.stanford.edu/bw/ MonALISA http://monalisa.cern.ch • We are also looking at new tools like PerfSonar which may help provide a monitoring infrastructure for UltraLight.

MonALISA UltraLight Repository • The UL repository: http://monalisa-ul.caltech.edu:8080/

End-Systems performance • Latest disk to disk over 10Gbps WAN: 4.3 Gbits/sec (536 MB/sec) - 8 TCP streams from CERN to Caltech; windows, 1TB file, 24 JBOD disks • Quad Opteron AMD848 2.2GHz processors with 3 AMD-8131 chipsets: 4 64-bit/133MHz PCI-X slots. • 3 Supermicro Marvell SATA disk controllers + 24 SATA 7200rpm SATA disks • Local Disk IO – 9.6 Gbits/sec (1.2 GBytes/sec read/write, with <20% CPU utilization) • 10GE NIC • 10 GE NIC – 7.5 Gbits/sec (memory-to-memory, with 52% CPU utilization) • 2*10 GE NIC (802.3ad link aggregation) – 11.1 Gbits/sec (memory-to-memory) • Need PCI-Express, TCP offload engines • Need 64 bit OS? Which architectures and hardware? • Discussions are underway with 3Ware, Myricom and Supermicro to try to prototype viable servers capable of driving 10 GE networks in the WAN.

UltraLight Global Services • Global Services support management and co-scheduling of multiple resource types, and provide strategic recovery mechanisms from system failures • Schedule decisions based on CPU, I/O, Network capability and End-to-end task performance estimates, incl. loading effects • Decisions are constrained by local and global policies • Implementation: Autodiscovering, multithreaded services, service-engines to schedule threads, making the system scalable and robust • Global Services Consist of: • Network and System Resource Monitoring, to provide pervasive end-to-end resource monitoring info. to HLS • Network Path Discovery and Construction Services, to provide network connections appropriate (sized/tuned) to the expected use • Policy Based Job Planning Services, balancing policy, efficient resource use and acceptable turnaround time • Task Execution Services, with job tracking user interfaces, incremental re-planning in case of partial incompletion • These types of services are required to deliver a managed network. Work along these lines is planned for OSG and future proposals to NSF and DOE.

UltraLight Application in 2008 • Node1> fts –vvv –in mercury.ultralight.org:/data01/big/zmumu05687.root –out venus.ultralight.org:/mstore/events/data –prio 3 –deadline +2:50 –xsum • FTS: Initiating file transfer setup… • FTS: Remote host responds ready • FTS: Contacting path discovery service • PDS: Path discovery in progress… • PDS: Path RTT 128.4 ms, best effort path bottleneck is 10 GE • PDS: Path options found: • PDS: Lightpath option exists end-to-end • PDS: Virtual pipe option exists (partial) • PDS: High-performance protocol capable end-systems exist • FTS: Requested transfer 1.2 TB file transfer within 2 hours 50 minutes, priority 3 • FTS: Remote host confirms available space for DN=smckee@ultralight.org • FTS: End-host agent contacted…parameters transferred • EHA: Priority 3 request allowed for DN=smckee@ultralight.org • EHA: request scheduling details • EHA: Lightpath prior scheduling (higher/same priority) precludes use • EHA: Virtual pipe sizeable to 3 Gbps available for 1 hour starting in 52.4 minutes • EHA: request monitoring prediction along path • EHA: FAST-UL transfer expected to deliver 1.2 Gbps (+0.8/-0.4) averaged over next 2 hours 50 minutes

EHA: Virtual pipe (partial) expected to deliver 3 Gbps(+0/-0.3) during reservation; variance from unprotected section < 0.3 Gbps 95%CL • EHA: Recommendation: begin transfer using FAST-UL using network identifier #5A-3C1. Connection will migrate to MPLS/QoS tunnel in 52.3 minutes. Estimated completion in 1 hour 22.78 minutes. • FTS: Initiating transfer between mercury.ultralight.org and venus.ultralight.org using #5A-3C1 • EHA: Transfer initiated…tracking at URL: fts://localhost/FTS/AE13FF132-FAFE39A-44-5A-3C1 • EHA: Reservation placed for MPLS/QoS connection along partial path: 3Gbps beginning in 52.2 minutes: duration 60 minutes • EHA: Reservation confirmed, rescode #9FA-39AF2E, note: unprotected network section included. • <…lots of status messages…> • FTS: Transfer proceeding, average 1.1 Gbps, 431.3 GB transferred • EHA: Connecting to reservation: tunnel complete, traffic marking initiated • EHA: Virtual pipe active: current rate 2.98 Gbps, estimated completion in 34.35 minutes • FTS: Transfer complete, signaling EHA on #5A-3C1 • EHA: Transfer complete received…hold for xsum confirmation • FTS: Remote checksum processing initiated… • FTS: Checksum verified—closing connection • EHA: Connection #5A-3C1 completed…closing virtual pipe with 12.3 minutes remaining on reservation • EHA: Resources freed. Transfer details uploading to monitoring node • EHA: Request successfully completed, transferred 1.2 TB in 1 hour 41.3 minutes (transfer 1 hour 34.4 minutes)

Supercomputing 2005 • The Supercomputing conference (SC05) in Seattle, Washington held another “Bandwidth Challenge” during the week of Nov 14-18th • A collaboration of high-energy physicists from Caltech, Michigan, Fermilab and SLAC (with help from BNL: thanks Frank and John!) won achieving 131 Gbps peak network usage. • This SC2005 BWC entry from HEP was designed to preview the scale and complexity of data operations among many sites interconnected with many 10 Gbps links

Total Transfer in 24 hours

BWC Take Away Summary • Our collaboration previewed the IT Challenges of the next generation science at the High Energy Physics Frontier (for the LHC and other major programs): • Petabyte-scale datasets • Tens of national and transoceanic links at 10 Gbps (and up) • 100+ Gbps aggregate data transport sustained for hours; We reached a Petabyte/day transport rate for real physics data • The team set the scale and learned to gauge the difficulty of the global networks and transport systems required for the LHC mission • Set up, shook down and successfully ran the system in < 1 week • Substantive take-aways from this marathon exercise: • An optimized Linux (2.6.12 + FAST + NFSv4) kernel for data transport; after 7 full kernel-build cycles in 4 days • A newly optimized application-level copy program, bbcp, that matches the performance of iperf under some conditions • Extensions of Xrootd, an optimized low-latency file access application for clusters, across the wide area • Understanding of the limits of 10 Gbps-capable systems under stress

UltraLight and ATLAS • UltraLight has deployed and instrumented an UltraLight network and made good progress toward defining and constructing a needed ‘managed network’ infrastructure. • The developments in UltraLight are targeted at providing needed capabilities and infrastructure for LHC. • We have some important activities which are ready for additional effort: • Achieving 10GE disk-to-disk transfers using single servers • Evaluating TCP congestion control protocols over UL links • Deploying embryonic network services to further the UL vision • Implementing some forms of MPLS/QoS and Optical Path control as part of standard UltraLight operation • Enabling automated end-host tuning and negotiation • We want to extend the footprint of UltraLight to include as many interested sites as possible to help insure its developments meet the LHC needs. • Questions?

UltraLight Overview