190 likes | 331 Vues
This document outlines the critical need for enhanced network infrastructure to support the scientific research community, addressing major bottlenecks such as high latency and throughput limitations. It details multiple funded projects, including NSF and DOE initiatives aimed at improving network communication for large-scale data transfers essential for climate and scientific simulations. As a Graduate Research Assistant (GRA), one will engage in extensive data analysis, protocol design, simulation, and engineering solutions to advance high-speed networking technologies that directly impact scientific outcomes.
E N D
Computer networks Malathi Veeraraghavan Univ. of Virginia mv5g@virginia.edu Fall 2013 (updated Jan. 2014) • Funded projects (GRA openings) • NSF SDCI: 2 years left • DOE HNTES: 4 years left (new grant awarded) • NSF CC-NIE (new): 3 years • NSF SCRP: 2 years left • NSF JUNO: 3 years (just starting) • Applied orientation
Outline • Big picture • Four projects • What is the problem? • Why solve it? (Motivation) • Methods used • As a GRA, what would I do? • Processes & style
Big picture • Networks to support scientific research community • High-speed • Low-latency • Who is in the science community? • DOE Office of Science • Basic energy sciences, high-energy physics, fusion energy sciences, bio & environ. research • NSF Office of Cyber Infrastructure (OCI)
Both agencies (NSF OCI and DOE) support • Supercomputing centers • nersc.gov • olcf.gov • alcf.gov • XSEDE (NSF OCI) • High-speed networks • Backbone: ESnet, Internet2 • Campus and regional nets: DYNES
NSF Software Dev. for Cyber Infrastructure (SDCI) • Problem & motivation (what & why): • Climate scientists run simulations that require > 5000 cores • Intra-datacenter network identified as bottleneck (InfiniBand cluster: 72K cores) • MPI communications: need to reduce latency and variance in latency • Scientists move tera-to-peta byte sized files: move these fast • 100 Gbps: current state of the art in link speed but not throughput (software!)
DOE Hybrid Network Traffic Engineering System (HNTES) • Problem & motivation: • Find high-rate, large-sized (alpha) flows within a network and isolate • Why? • As link rates increase, spread between fastest flow and slowest flow increases • Fast flows can delay slow flows (user sees poor quality for real-time flows) • On links to providers: Service Level Agreements (SLAs) can be violated when fast flows appear
NSF Campus Cyberinfrastructure – Network Infrastructure & Engineering (CC-NIE) • Problem & motivation • Design protocols/apps to multicast data reliably to hundreds of receivers • Save network & computing resources when compared to unicast delivery from one sender to hundreds of receivers • Application: Weather data distribution • UCAR sends real-time weather data almost continuously to 170 institutions
NSF Scheduled Circuit Routing Protocol (SCRP) • Problem & motivation • Scientific networking community has been building out a new type of internetwork with circuits and virtual circuits (airlines) • why: service guarantees (think fedex) • Contrast with Internet (roadways) • Routing problem: what should one organization’s network tell another to enable path computation for circuits?
NeTS: JUNO: Collaborative Research: ACTION: Applications Coordinatingwith Transport, IP, and Optical Networks • This project is a joint collaboration with U. Texas at Dallas, and two universities in Japan • The UVA portion of the project will develop application and transport protocols for optical networks • Starting Feb. 1, 2014
Outline • Big picture • Four projects • What is the problem? • Why solve it? (Motivation) • Methodsused • As a GRA, what would I do? • Processes & style
Methods used: Stats • Science before engineering: • Theodore von Karman: • “Scientists study the world as it is; engineers create the world that never has been” • Data collection & statistics • Rely on contacts at DOE labs, universities, network operators for operational data • Write R programs to analyze procured data • Use fir research cluster for parallel computing • Skills needed: stats/R language/parallel prog.
Methods used: run experiments • Run existing software used by scientists to obtain measurements • Use national supercomputers and network testbeds • NCAR Wyoming SC: MPI programs (climate) • U. Utah Emulab • ESnet 100G network testbed • U. New Mexico: PROBE • ExoGENI racks: OpenFlow switches • DYNES: 10 high-performance hosts/switches across US • Skills needed: learn/run new software programs; write shell scripts; cron jobs; use rigorous scientific methods in executing expts.
Methods used: simulations • For NSF SCRP project • Problem requires large-scale thinking • Cannot implement • Cannot collect data as system does not yet exist • Then simulate • Skills needed: C++ programming, parallel programming, prob & stats, rigorous scientific methods
Methods used: engineering • Come up with engineering solutions for problems identified from scientific discovery through analysis of operational data and experimentally collected data • Implement software • Evaluate solutions on testbeds • Two key points • Exploratory not confirmatory (watch out for bias) • Always quantify the negative!
Methods: Write papers • Conference first, then journal • Collab Web site for grad students • how to organize a paper • hierarchical • think of reviewers • know your community’s work • literature search (when?)
Outline • Big picture • Four projects • What is the problem? • Why solve it? (Motivation) • Methods used • As a GRA, what would I do? • Processes & style
Processes • Goals as a graduate student • Focus on next step • quals • proposal defense • dissertation • Want Masters en route: MCS or MS • Career goal: academics or industry • Community, community, community • Ask the process question for each step
Advising style • Close collaboration with GRA • Research grants have milestones/deliverables • Generate ideas/papers/software that others use – who is the customer? what is the product? • New ideas from GRA • Develop proposals: Security for DHS; Vehicular • Communicate – be open • Full-time access (no substitute for hard work) – two-way commitment
Summary • High-speed, low-latency networking for • Scientific applications: scientists • Network utilization: providers, campus, datacenter • Bottom-up: new optical comm. technologies • Techniques used • Obtain operational data/experimental measurements and analyze statistics – find the real problem • Develop engineering solution • Evaluate through experiments or simulations