1 / 27

CSG Research Computing Jim Pepin USC CTO/Director HPCC

CSG Research Computing Jim Pepin USC CTO/Director HPCC. HPCC. Provide common facilities and services for a large cross section of the university that requires leading edge computational and networking resources. Leverage USC central resources with externally funded projects. Overview.

decima
Télécharger la présentation

CSG Research Computing Jim Pepin USC CTO/Director HPCC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSG Research ComputingJim PepinUSC CTO/Director HPCC

  2. HPCC • Provide common facilities and services for a large cross section of the university that requires leading edge computational and networking resources. • Leverage USC central resources with externally funded projects.

  3. Overview • Sponsored by ISD (Information Services Division of USC) and ISI (Information Sciences Institute) • User community • ISI • LAS • Engineering • School of Medicine • IMSC • ICT • Others

  4. Current Resources • High Performance Computing Resources • Linux Cluster (~1000nodes/2000cpus, 2Gb/sec Myrinet) • 20TB shared disk, 18GB - 40GB local disk per node. • Ranks in top 10 for academic clusters. • Myrinet switch is 768 nodes. • Adding nodes funded by USC research groups. • Sun Core Servers (E15k shared memory) • 72 processors, 288GB memory, 30TB shared disk • Mass Storage Facilities (Unitree) • 18,000 tape capacity

  5. Funding Sources • ISD (University) Resources • 1.5M M/S and Equipment budget • Software/Maintenance .4M • Generic capital 1.0M • Other .1M • 3 FTEs direct support • 2 FTEs system staff offset • Los Nettos/LAAP • 2.0M • Condo Arrangements • 50k-250k one off capital purchases

  6. Cluster Power Usage Math • 42 nodes/cabinet • 200 watts/node. • 8.4Kw/cabinet • 1000 nodes 24 cabinets • 1 control cabinet per 8 cabinets of compute servers • 8 control cabinets • 32 cabinets per 100 nodes • 268Kw per 1000 nodes • 100 Tons of a/c per 1000 nodes • Roughly 400KW total power use for 1000 nodes • 1500-2000 sq feet of space.

  7. Current Software • Cluster software from IBM (xcat) is core of facility. • Stable production environment. • MPI is basic message passing • Globus/NMI work is proceeding with Carl’s help in funding plus ISD resources. • Leverages with campus need for global directory • More later. • Solaris and Unitree are core for Mass Storage support. • We need to look at other mass storage opportunities. • Issues • We need to be able to support faculty/researchers with tools and consulting to help them effectively use large-scale resources. • Many packages exist on HPCC resources, with no local support to help use them.

  8. “Middleware” • Globus as base with NMI architecture for campus. • GT2 moving to GT3 • SCEC/ISI • Condor as lightweight job manager in user rooms • PBS/Maui on Cluster and Computation side of E15k • Issues • Kx509 bridge from Kerberos • USC PKI lite CA is base. • Only hosts and services. • NMI based. • Pubcookie (Kerberos back-end) • Uses host certs from PKI lite CA • Shib for some prototype library apps (scholar’s portal) • Campus GDS/PR using NMI schemes (eduperson etc)

  9. HPCC Governance • HPCC faculty advisory group • Meets 4-5 times a year • Provides guidance to DCIO and CTO • “Final” Decisions are in ISD (CIO/DCIO) • Usual mode is agreement • Time allocation • No recharge • Large project reviewed by faculty allocation group • Some projects over 500k node hours • Condo users get dedicated nodes and cost sharing • Research leverage • Condo • Cost sharing • External funding • Grid construction • Next generation network

  10. CTO/HPCC Projects • Advanced Networking Projects • Calren-2 • 2xGb service today . • 10Gb service in next 2 years. • Fiber/wavelength services(CENIC/National Lambda Rail) • Online for west coast. • Look at L2 possibilities to build shared ‘spaces’. • Look to leverage for project like Optiputer ITR. • 1 Wilshire colo facilities • See if we can use that space to facilitate ETF proposal. • Optiputer ITR as way to help network expansion.

  11. CTO/HPCC Projects • Leverage HPCC efforts at ISI with ISD Resources • Clusters • Expand cluster to ~2000 nodes centrally owned. • Expand cluster for other groups (condo model). • Mass Storage • Look into large scale storage for groups like VHF project and other high end storage needs. (fractional petabytes) • Globus/NMI • Provide campus leadership for Global directory services and identity management. (authentication and authorization). • Networking Research

  12. CTO/HPCC Projects • Fiber is a major part of the HPCC’s ability to service large scale computational needs. The following slides show what we have today and how it can be used.

  13. Fiber Facilities • Lease dark fiber. • Started with dark fiber 3 years ago. • Pioneer in this area. • DWP (Department of Water and Power) • USC franchise area fiber for campus access. • Leverage new players (NLR/Cenic). • Use for USC, LAAP and Los Nettos projects. • Built-out today using low cost CWDM and 15540s. • 10Gbps ethernet backbone in place Fall ‘02 • Built-out fiber to Caltech/JPL/VHF(Shoah) and other Los Nettos sites.

  14. Fiber Facilities • Lease more dark fiber. • Harvey Mudd. • Build second path to USC for disaster recovery. • Install DWDM gear from CENIC deal with Cisco. • 1Gb wavelengths in first phase (fall 04) • 10Gb wavelengths in summer 04. • Use to enable projects like Optiputer and ETF. • Experiment with optical switching hardware as ‘fiber patch panel’ for development of shared ‘computer centers’.

  15. Original USC Fiber Backbone 4 strand SM DWP fiber 1 wilshire Downtown Clinic HSC UPC ISI ICT Original External fiber plant

  16. Caltech JPL HMC 818 VHF 1 wilshire HSC Tustin UPC ISI fiber gigaman ICT Today’s Fiber and Gigaman circuits

  17. Colo Facilities • Acquired space in 1 wilshire (original site). • 3 years ago. • DWP fiber is core. • Use to connect to exchanges and others ISPs. • Extend to potentially other ‘1 Wilshire’ buildings. • Use new Campus Level 3 fiber as means. • House routers and l2 equipment. • Provide space on USC campus for partners partners. • Enables Pacific Wave Exchange Point.

  18. Exchange Point/Research 1 wilshire 818 7th Gb Gb Gb ports 100m ports 10Gb 10Gb ISI 10Gb HSC Gb ports 100mb ports UPC Foundry Bigirons 802.1q vlans Gb ports 100mb ports

  19. Experimental Networking • Networking research community • California Institutes for Science and Innovation (CITRIS, CalIT2, Nano Systems, BioMedical) • San Diego Super Computer Center • CACR • ISI • Teragrid/Distributed Terascale Facility • UCSB/Dan Blumenthal optical labs

  20. Future Resource Goals • High Performance Computing Resources • Linux Cluster (2048nodes/4096cpus, 2Gb/sec Myrinet) • 60TB shared disk, 36GB - 72GB local disk per node. • Rank in top 5 for academic clusters. • Start 64 bit nodes in summer 04. • Switch fabric will expand past 1024 nodes with ability to condo other users. • Plan to add more nodes funded by USC research groups (condo) Goal would be 3000+ nodes total. • Sun Core Servers (E15k shared memory) • 72 processors, 288GB memory, 300TB disk • Use this system for high end data users (large scale databases) and video users. • Mass Storage Facilities (Unitree today) • 18,000 tape capacity • PB online as goal in 3 years.

  21. 3 Year Strategy • Next step after 32 bit pentium. • Need to determine what will replace Xeons. One answer is opteron or IA64, but we need to start to develop clusters in this space and benchmark. • Much of the code will need reworking at user level. • Find ways to cost share with local cluster purchasers. “Condo” housing of medium to large clusters will be important. • Build “Grid-U”

  22. 3 Year Strategy • As cluster expand into the 2-4k node space power and A/C become significant issues (along with floor space). • We need to develop several major partners to allow HPCC to be the central piece of joint proposals from USC for such initiatives as ETF and future cyber infrastructure proposals. • Example is shared submission for Major Research Instrument grant.

  23. 3 Year Strategy • Networking Futures • Expand Exchange Point (R/E, Pacific Wave) • 10Gb at all sites • Layer 1 facilities (Optiputer type connections) • Re-design/RFP for campus network this month • Design network with ‘enclaves’ for research or academic support • Much higher internal bandwidth (10Gb core-core, at least 1Gb to all buildings 10Gb to major research centers) • How to provide comprehensive security without unacceptable friction.

More Related