420 likes | 577 Vues
Esnet: DOE’s Science Network GNEW March, 2004. William E. Johnston, ESnet Manager and Senior Scientist Michael S. Collins, Stan Kluz, Joseph Burrescia, and James V. Gagliardi, ESnet Leads and the ESnet Team Lawrence Berkeley National Laboratory. Esnet Provides.
 
                
                E N D
Esnet:DOE’s Science NetworkGNEW March, 2004 William E. Johnston, ESnet Manager and Senior Scientist Michael S. Collins, Stan Kluz,Joseph Burrescia, and James V. Gagliardi, ESnet Leads and the ESnet Team Lawrence Berkeley National Laboratory
Esnet Provides • High bandwidth backbone and connections for Office of Science Labs and programs • High bandwidth peering with the US, European, and Japanese Research and Education networks • SecureNet (DOE classified R&D) as an overlay network • Science services – Grid and collaboration services • User support: ESnet “owns” all network trouble tickets (even from end users) until they are resolved • one stop shopping for user network problems • 7x24 coverage • Both network and science services problems
ESnet Connects DOE Facilities and Collaborators GEANT - Germany - France - Italy - UK - etc Sinet (Japan) Japan – Russia(BINP) CA*net4 KDDI (Japan) France Switzerland Taiwan (TANet2) Australia CA*net4 Taiwan (TANet2) Singaren PNNL NERSC SLAC BNL ANL MIT INEEL LIGO LBNL LLNL SNLL JGI TWC Starlight GTN&NNSA 4xLAB-DC ANL-DC INEEL-DC ORAU-DC LLNL/LANL-DC JLAB PPPL AMES FNAL ORNL SRS LANL SNLA DOE-ALB PANTEX SDSC ORAU NOAA OSTI ARM ALB HUB YUCCA MT BECHTEL GA Allied Signal KCP ELP HUB ATL HUB CHI HUB NYC HUB DC HUB Office Of Science Sponsored (22) NNSA Sponsored (12) NREL Joint Sponsored (3) Other Sponsored (NSF LIGO, NOAA) Laboratory Sponsored (6) CA*net4 CERN MREN Netherlands Russia StarTap Taiwan (ASCC) PNWG SEA HUB ESnet IP Abilene Japan Abilene Chi NAP NY-NAP QWEST ATM Abilene MAE-E SNV HUB PAIX-E MAE-W Fix-W PAIX-W Euqinix Abilene 42 end user sites International (high speed) OC192 (10G/s optical) OC48 (2.5 Gb/s optical) Gigabit Ethernet (1 Gb/s) OC12 ATM (622 Mb/s) OC12 OC3 (155 Mb/s) T3 (45 Mb/s) T1-T3 T1 (1 Mb/s) peering points ESnet backbone: Optical Ring and Hubs hubs SNV HUB
ESnet is Driven by the Needs of DOE Science August 13-15, 2002 Organized by Office of Science Mary Anne Scott, Chair Dave Bader Steve Eckstrand Marvin Frazier Dale Koelling Vicky White Workshop Panel Chairs Ray Bair and Deb Agarwal Bill Johnston and Mike Wilde Rick Stevens Ian Foster and Dennis Gannon Linda Winkler and Brian Tierney Sandy Merola and Charlie Catlett • Focused on science requirements that drive • Advanced Network Infrastructure • Middleware Research • Network Research • Network Governance Model Available at www.es.net/#research
Eight Major DOE Science Areas Analyzed at the August ’02 Workshop Driven by
S C C&C C&C I C&C C&C C&C C C&C C S Evolving Qualitative Requirements for Network Infrastructure S C S C guaranteedbandwidthpaths I 1-40 Gb/s,end-to-end I 2-4 yrs 1-3 yrs C C C C storage S S S compute C instrument I cache &compute C&C S C C&C C&C I 4-7 yrs 3-5 yrs C&C C&C C&C C C&C 100-200 Gb/s,end-to-end C S
New Strategic Directions to Address Needs of DOE Science June 3-5, 2003 Organized by the ESSC Workshop Chair Roy Whitney, JLAB Report Editors Roy Whitney, JLAB Larry Price, ANL Workshop Panel Chairs Wu-chun Feng, LANL William Johnston, LBNL Nagi Rao, ORNL David Schissel, GA Vicky White, FNAL Dean Williams, LLNL • Focused on what was needed to achieve the science driven network requirements of the previous workshop • Both Workshop reports are available at es.net/#research Available at www.es.net/#research
ESnet Strategic Directions • Developing a 5 yr. strategic plan for how to provide the required capabilities identified by the workshops • Between DOE Labs and their major collaborators in the University community we must address • Scalable bandwidth • Reliability • Quality of Service • Must address an appropriate set of Grid and human collaboration supporting middleware services
ESnet Connects DOE Facilities and Collaborators GEANT - Germany - France - Italy - UK - etc Sinet (Japan) Japan – Russia(BINP) CA*net4 KDDI (Japan) France Switzerland Taiwan (TANet2) Australia CA*net4 Taiwan (TANet2) Singaren PNNL NERSC SLAC BNL ANL MIT INEEL LIGO LBNL LLNL SNLL JGI TWC Starlight GTN&NNSA 4xLAB-DC ANL-DC INEEL-DC ORAU-DC LLNL/LANL-DC JLAB PPPL AMES FNAL ORNL SRS LANL SNLA DOE-ALB PANTEX SDSC ORAU NOAA OSTI ARM ALB HUB YUCCA MT BECHTEL GA Allied Signal KCP ELP HUB ATL HUB CHI HUB NYC HUB DC HUB Office Of Science Sponsored (22) NNSA Sponsored (12) NREL Joint Sponsored (3) Other Sponsored (NSF LIGO, NOAA) Laboratory Sponsored (6) CA*net4 CERN MREN Netherlands Russia StarTap Taiwan (ASCC) PNWG SEA HUB ESnet IP Abilene Japan Abilene Chi NAP NY-NAP QWEST ATM Abilene MAE-E SNV HUB PAIX-E MAE-W Fix-W PAIX-W Euqinix Abilene 42 end user sites International (high speed) OC192 (10G/s optical) OC48 (2.5 Gb/s optical) Gigabit Ethernet (1 Gb/s) OC12 ATM (622 Mb/s) OC12 OC3 (155 Mb/s) T3 (45 Mb/s) T1-T3 T1 (1 Mb/s) peering points ESnet backbone: Optical Ring and Hubs hubs SNV HUB
PNNL NERSC SLAC Brandeis Nevis Yale MIT BNL ANL INEEL INEEL LIGO LLNL SAN SNLL STARLIGHT CHI NAP JGI TWC 4xLAB-DC ANL-DC INEEL-DC ORAU-DC LLNL/LANL-DC PAIX-E MAE-E JLAB PPPL AMES FNAL Fix-W Mae-W SNV HUB ELP HUB SNV HUB ELP HUB ORNL CHI HUB SRS DC HUB ATL HUB NYC HUBS LANL SNLA SEA HUB SEA HUB PAIX-W DOE-ALB SNV SNV SNV ELP OSTI NOAA ORAU ARM ALB HUB SDSC/CENIC YUCCA MT YUCCA MT BECHTEL GA Allied Signal Allied Signal Allied Signal ELP Qwest Owned SBC(PacBell) Contracted/Owned NREL FTS2000 Contracted/Owned SPRINT Contracted/Owned Level3 While ESnet Has One Backbone Provider, there areMany Local Loop Providers to Get to the Sites NY-NAP QWEST ATM LBNL/ CalRen2 GTN DOE-NNSA PANTEX Qwest Contracted Touch America (bankrupt) MCI Contracted/Owned Site Contracted/Owned
ESnet Logical InfrastructureConnects the DOE Community With its Collaborators Abilene ESnet provides complete access to the Internet by managing the full complement of Global Internet routes (about 150,000) at 10 general/commercial peering points + high-speed peerings w/ Abilene and the international networks
ESnet Traffic Annual growth in the past five years has increased from 1.7x annually to just over 2.0x annually.
Who Generates Traffic, and Where Does it Go? ESnet Inter-Sector Traffic Summary, Jan 2003 72% 21% Commercial 14% DOE is a net supplier of data because DOE facilities are used by Univ. and commercial, as well as by DOE researchers ESnet 17% ~25% DOE sites R&E 10% 53% Peering Points 9% International DOE collaborator traffic, inc.data 4% ESnet Appropriate Use Policy (AUP) All ESnet traffic must originate and/or terminate on an ESnet an site (no transit traffic is allowed) E.g. a commercial site cannot exchange traffic with an international site across ESnet This is effected via routing restrictions ESnet Ingress Traffic = Green ESnet Egress Traffic = Blue Traffic between sites % = of total ingress or egress traffic
ESnet Site Architecture New York (AOA) Chicago (CHI) Washington, DC (DC) The Hubs have lots of connections(42 in all) Backbone(optical fiber ring) Atlanta (ATL) Sunnyvale (SNV) ESnet responsibility Site responsibility El Paso (ELP) Hubs(backbone routers and local loop connection points) Site gateway router ESnet border router SiteLAN Local loop (Hub to local site) DMZ Site
SecureNet • SecureNet connects 10 NNSA (Defense Programs) Labs • Essentially a VPN with special encrypters • The NNSA sites exchange encrypted ATM traffic • The data is unclassified when ESnet gets it because it is encrypted before it leaves the NNSA sites with an NSA certified encrypter • Runs over the ESnet core backbone as a layer 2 overlay – that is, the SecureNet encrypted ATM is transported over ESnet’s Packet-Over-SONET infrastructure by encapsulating the ATM in MPLS using Juniper CCC
SecureNet – Mid 2003 Backup SecureNet Path AOA-HUB CHI-HUB GTN SNV-HUB LLNL DC-HUB SNLL ORNL KCP DOE-AL Pantex LANL Primary SecureNet Path SNLA SRS ATL-HUB ELP-HUB SecureNet encapsulates payload encrypted ATM in MPLSusing the Juniper Router Circuit Cross Connect (CCC) feature.
6BONE Abilene Abilene TWC ESnet LBNL SLAC IPv6 only 7206 7206 7206 7206 7206 IPv4/IPv6 IPv4 only IPv6-ESnet Backbone 9peers 18 peers BNL StarLight 7peers Distributed 6TAP PAIX LBL Chicago Sunnyvale New York ANL FNAL DC Albuquerque Atlanta SLAC El Paso • IPv6 is the next generation Internet protocol, and ESnet is working on addressing deployment issues • one big improvement is that while IPv4 has 32 bit – about 4x109 – addresses (which we are running short of), IPv6 has 132 bit – about 1040 – addresses (which we are not ever likely to run short of) • another big improvement is native support for encryption of data
Operating Science Mission Critical Infrastructure • ESnet is a visible and critical pieces of DOE science infrastructure • if ESnet fails,10s of thousands of DOE and University users know it within minutes if not seconds • Requires high reliability and high operational security in the ESnet operational services – the systems that are integral to the operation and management of the network • Secure and redundant mail and Web systems are central to the operation and security of ESnet • trouble tickets are by email • engineering communication by email • engineering database interface is via Web • Secure network access to Hub equipment • Backup secure telephony access to Hub equipment • 24x7 help desk (joint with NERSC) • 24x7 on-call network engineer
Disaster Recovery and Stability • The network operational services must be kept available even if, e.g., the West coast is disabled by a massive earthquake, etc. • ESnet engineers in four locations across the country • Full and partial engineering databases and network operational service replicas in three locations • Telephone modem backup access to all hub equipment • All core network hubs are located in commercial telecommunication facilities with high physical security and backup power
BNL LBNL TWC PPPL AMES ELP HUB SNV HUB CHI HUB NYC HUBS DC HUB ATL HUB SEA HUB SDSC ALB HUB Disaster Recovery and Stability • Engineers, 24x7 NOC, generator backed power • Spectrum (net mgmt system) • DNS (name – IP address translation) • Eng database • Load database • Config database • Public and private Web • E-mail (server and archive) • PKI cert. repository and revocation lists • collaboratory authorization service • Remote Engineer • partial duplicate infrastructure DNS Remote Engineer Duplicate Infrastructure (currently deploying full replication of the NOC databases and servers and Science Services databases) Engineers Eng Srvr Load Srvr Config Srvr • ESnet backbone operated without interruption through • N. Calif. Power blackout of 2000 • the 9/11 attacks • the Sept., 2003 NE States power blackout
Maintaining Science Mission Critical Infrastructurein the Face of Cyberattack • A Phased Security Architecture is being implemented to protect the network and the sites • The phased response ranges from blocking certain site traffic to a complete isolation of the network which allows the sites to continue communicating among themselves in the face of the most virulent attacks • Separates ESnet core routing functionality from external Internet connections by means of a “peering” router that can have a policy different from the core routers • Provide a rate limited path to the external Internet that will insure site-to-site communication during an external denial of service attack • provide “lifeline” connectivity for downloading of patches, exchange of e-mail and viewing web pages (i.e.; e-mail, dns, http, https, ssh, etc.) with the external Internet prior to full isolation of the network
Phased Response to Cyberattack ESnet third response – shut down the main peering path and provide only a limited bandwidth path for specific “lifeline” services ESnet first response – filters to assist a site ESnet second response – filter traffic from outside of ESnet peeringrouter X X router ESnet router LBNL attack traffic router X borderrouter Lab first response – filter incoming traffic at their ESnet gateway router gatewayrouter peeringrouter border router Lab gatewayrouter Lab Sapphire/Slammer worm infection created almost a Gb/s traffic spike on the ESnet backbone until filters were put in place (both into and out of sites) to damp it out.
Future Directions – the 5 yr Network Strategy • Elements • University connectivity • Scalable and reliable site connectivity • Provisioned circuits for hi-impact science bandwidth • Close collaboration with the network R&D community • Services supporting science (Grid middleware, collaboration services, etc.)
5 yr Strategy – Near Term Goal 1 • Connectivity between any DOE Lab and any Major University should be as good as ESnet connectivity between DOE Labs and Abilene connectivity between Universities • Partnership with I2/Abilene • Multiple high-speed peering points • Routing tailored to take advantage of this • Latency and bandwidth from DOE Lab to University should be comparable to intra ESnet or intra Abilene • Continuous monitoring infrastructure to verify
5 yr Strategy – Near Term Goal 2 • Connectivity between ESnet and R&D nets – a critical issue from Roadmap • UltraScienceNet and NLR for starters • Reliable, high bandwidth cross-connects • IWire ring between Qwest – ESnet Chicago hub and Starlight • This is also critical for DOE lab connectivity to the DOE funded LHCNet 10 Gb/s link to CERN • Both LHC tier 1 sites in the US – Atlas and CMS – are at DOE Labs • ESnet ring between Qwest – ESnet Sunnyvale hub and the Level 3 Sunnyvale hub that houses the West Coast POP for NLR and UltraScienceNet
5 yr Strategy – Near-Medium Term Goal • Scalable and reliable site connectivity • Fiber / lambda ring based Metropolitan Area Networks • Preliminary engineering study completed for San Francisco Bay Area and Chicago area • Proposal submitted • At least one of these is very likely to be funded this year • Hi-impact science bandwidth – provisioned circuits
ESnet Future Architecture • Migrate site local loops to ring structured Metropolitan Area Network and regional nets in some areas • Goal is local rings, like the backbone, that provide multiple paths • Dynamic provisioning of private “circuits” in the MAN and through the backbone to provide “high impact science” connections • This should allow high bandwidth circuits to go around site firewalls to connect specific systems. The circuits are secure and end-to-end, so if the sites trust each other, they should allow direct connections if they have compatible security policies. E.g. HPSS <-> HPSS • Partnership with DOE UltraNet, Internet 2 HOPI, and National Lambda Rail
ESnet Future Architecture site one optical fiber pairDWDM providing point-to-point, unprotected circuits provisioned circuits initially via MPLS paths, eventually via lambda paths Layer 2 management equipment (e.g. 10 GigEthernet switch) MetropolitanAreaNetworks corering site Layer 3 (IP)management equipment (router) production IP provisioned circuits carriedover lambdas Optical channel (λ) management equipment provisioned circuits carriedas tunnels through the ESnetIP backbone site
T320 monitor monitor ESnet MAN Architecture - Example CERN(DOE funded link) StarLight Qwest hub Current DMZs are back-hauled to the core router Implemented via 2 VLANs – one in each direction around the ring ESnet core other international peerings Vendor neutral facility ESnet managedλ / circuit services ESnet management and monitoring – partly to compensate for no site router • Ethernet switch • DMZ VLANs • Management of provisioned circuits ESnet managedλ / circuit services tunneled through the IP backbone via MPLS ESnet production IP service ANL FNAL site equip. Site gateway router site equip. Site gateway router Site LAN Site LAN
Specific host, instrument, etc. Specific host, instrument, etc. common security policy Future ESnet Architecture circuit cross connect ESnet border Site gateway router MANoptical fiber ring SiteLAN DMZ Site New York (AOA) Washington ESnetbackbone Atlanta (ATL) Private “circuit” from one Lab to another El Paso (ELP) circuit cross connect Site gateway router ESnet border SiteLAN MANoptical fiber ring DMZ Site
Long-Term ESnet Connectivity Goal • MANs for scalable bandwidth and redundant site access to backbone • Connecting MANs with two backbones to ensure against hub failure (for example NLR is shown as the second backbone below) Japan Europe CERN/Europe Japan MANs Local loops High-speed cross connects with Internet2/Abilene Qwest Major DOE Office of Science Sites NLR
Long-Term ESnet Bandwidth Goal • Harvey Newman:“And what about increasing the bandwidth in the backbone?” • Answer: technology progress • By 2008 (the next generation ESnet backbone) DWDM technology will be 40 Gb/s per lambda • And the backbone will be multiple lambdas • Issues • End-to-End, end-to-end, and end-to-end
Science Services Strategy • ESnet is in a natural position to be the provider of choice for a number of middleware services that support collaboration, colaboratories, Grids, etc. • The characteristics of ESnet that make it a natural middleware provider are that ESnet • is the only computing related organization that serves all of the Office of Science • is trusted and well respected in the OSC community • has the 7x24 infrastructure required to support critical services, and is a long-term stable organization. • The characteristics of the services for which ESnet is the natural provider are those that • require long-term persistence of the service or the data associated with the service • require high availability, require a high degree of integrity on the part of the provider • are situated at the root of a hierarchy so that the service scales in the number of people that it serves by adding nodes that are managed by local organizations (so that ESnet does not have a large and constantly growing direct user base).
Science Services Strategy • DOE Grids CA that provides X.509 identity certificates to support Grid authentication provides an example of this model • the service requires a highly trusted provider, requires a high degree of availability • provides a centralized agent for negoiating trust relationships with, e.g., European CAs • it scales by adding site based or Virtual Organization based Registration Agents that interact directly with the users
Science Services: Public Key Infrastructure • Public Key Infrastructure supports cross-site, cross-organization, and international trust relationships that permit sharing computing and data resources and other Grid services • Digital identity certificates for people, hosts and services – essential core service for Grid middleware • provides formal and verified trust management – an essential service for widely distributed heterogeneous collaboration, e.g. in the International High Energy Physics community • DOE Grids CA • Have recently added a second CA with a policy that permits bulk issuing of certificates with central private key mg’mt • Important for secondary issuers • NERSC will auto issue certs when accounts are set up – this constitutes an acceptable identity verification • May also be needed for security domain gateways such asKerberos – X509 – e.g. KX509
Science Services: Public Key Infrastructure • Policy Management Authority – negotiates and manages the formal trust instrument (Certificate Policy - CP) • Sets and interprets procedures that are carried out by ESnet • Currently facing an important oversight situation involving potential compromise of user X.509 cert private keys • Boys-from-Brazil style exploit => kbd sniffer on several systems that housed Grid certs • Is there sufficient forensic information to say that the pvt keys were not compromised?? • Is any amount of forensic information sufficient to guarantee this, or should the certs be revoked? • Policy refinement by experience • Registration Agents (RAs) validate users against the CP and authorize the CA to issue digital identity certs • This service was the basis of the first routine sharing of HEP computing resources between US and Europe
Science Services: Public Key Infrastructure • The rapidly expanding customer base of this service will soon make it ESnet’s largest collaboration service by customer count
Voice, Video, and Data Collaboration Service • The other highly successful ESnet Science Service is the audio, video, and data teleconferencing service to support human collaboration • Seamless voice, video, and data teleconferencing is important for geographically dispersed collaborators • ESnet currently provides voice conferencing, videoconferencing (H.320/ISDN scheduled, H.323/IP ad-hoc), and data collaboration services to more than a thousand DOE researchers worldwide
Voice, Video, and Data Collaboration Service • Heavily used services, averaging around • 4600 port hours per month for H.320 videoconferences, • 2000 port hours per month for audio conferences • 1100 port hours per month for H.323 • approximately 200 port hours per month for data conferencing • Web-Based registration and scheduling for all of these services • authorizes users efficiently • lets them schedule meetings Such an automated approach is essential for a scalable service – ESnet staff could never handle all of the reservations manually
Science Services Strategy • The Roadmap Workshop identified twelve high priority middleware services, and several of these fit the criteria for ESnet support. These include, for example • long-term PKI key and proxy credential management (e.g. an adaptation of the NSF’s MyProxy service) • directory services that virtual organizations (VOs) can use to manage organization membership, member attributes and privileges • perhaps some form of authorization service • in the future, some knowledge management services that have the characteristics of an ESnet service are also likely to be important • ESnet is seeking the addition funding necessary to develop, deploy, and support these types of middleware services.
Conclusions • ESnet is an infrastructure that is critical to DOE’s science mission and that serves all of DOE • Focused on the Office of Science Labs • ESnet is evolving its architecture and services strategy to need the stated requirements for bandwidth, reliability, QoS, and Grid and collaboration supporting services