1 / 21

Wide-Area File Systems in the TeraGrid

Wide-Area File Systems in the TeraGrid. Chris Jordan Steve Simms Patricia Kovatch Phil Andrews. What are WAN File Systems?. A single “file system” entity that spans multiple systems distributed over a wide area network Often but not always spans administrative domains

marius
Télécharger la présentation

Wide-Area File Systems in the TeraGrid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wide-Area File Systems in the TeraGrid Chris Jordan Steve Simms Patricia Kovatch Phil Andrews

  2. What are WAN File Systems? • A single “file system” entity that spans multiple systems distributed over a wide area network • Often but not always spans administrative domains • Makes data available for computation, analysis, viz across widely distributed systems • Key usability aspect is that there is nothing special about a WAN-FS from the user perspective – no special clients, no special namespace, etc

  3. A Long History in TeraGrid • First demonstration by SDSC at SC 2002 • Numerous demonstrations at Supercomputing • Several production file systems past and present • Many TeraGrid research projects have used the production WAN file systems • Many TeraGrid research projects have used experimental WAN file systems • Continuing research, development, and production projects from 2002-2010

  4. WAN File System Challenges • Security • Identity mapping across administrative domains • Control of mount access and root identity • Performance • Long network latencies imply a delay on every operation • Appropriate node/disk/network/OS configuration on both client and server • Reliability • Network problems can occur anywhere • Numerous distributed clients can inject problems

  5. Milestones • Series of early SC demos by SDSC, IBM, others • GPFS-WAN demonstration and production • GPFS-WAN Version 2.0 • Lustre-WAN Demos at SC (Indiana, others) • Indiana’s Data Capacitor • Lustre-WAN 2.0 R&D – “J-WAN” • TG-Wide Lustre-WAN (Current deployment)

  6. Early Tests and Demonstrations • SC 2002 – SAM-FS over wide area using FC over IP • Phil Andrews, Tom Sherwin, Bryan Banister (SDSC) • SC 2003 – early GPFS-WAN demo w/ IBM • Andrews, Banister, Patricia Kovatch (SDSC) • SC 2004 – 28Gbps over TG backbone using GPFS • Andrews, Kovatch, Banister, Chris Jordan(SDSC) • SC 2005 – Intercontinental GPFS demonstrated with DEISA sites • Jordan, Kovatch, Andrews, and many DEISA admins • SC 2007 – pNFS clients demonstrated w/ GPFS

  7. GPFS-WAN 1.0 • First Production WAN File System in TeraGrid • Evolution of SC04 demo system • 68 IA64 “DTF Phase one” server nodes • .5 PB IBM DS4100 SATA Disks, Mirrored RAID • ~250 TB Usable storage, ~8GB/sec peak I/O • Still the fastest WAN-FS ever deployed in TeraGrid (30Gb/s) – network got slower afterward • Utilized GSI “grid-mapfile” for Identity Mapping • Utilized RSA keys w/ OOB exchange for system/cluster authentication

  8. Use of GPFS-WAN 1.0 • Production in October 2005 • Accessible on almost all TeraGrid resources (SDSC, NCSA, ANL, NCAR) • Required major testing and debugging effort (~1 year from SC 2004 demo) • BIRN, SCEC, NVO were major early users • Lots of multi-site use in a homogeneous computing environment (IA64/IA32) • BIRN Workflow – compute on multiple resources, visualize at Johns Hopkins

  9. Transatlantic File Systems • DEISA is built around multiple WAN-FS instances • SC’05 GPFS demo between TeraGrid and DEISA • Special network link between GEANT and TG (10Gb) • GPFS-WAN mounted at multiple sites in Germany • Multiple DEISA file systems mounted on SDSC IA64 cluster (Italy, France, Germany) • Achieved 1Gbps performance in both directions

  10. SC ‘07 Demo • Export GPFS-WAN via pNFS • 6 pNFS servers at SDSC • 2 pNFS clients at SC • Other clients at NCSA, ORNL • Saturated 10/20 Gb/s link from SDSC pNFS Client TeraGrid Network pNFS Server SDSC pNFS Client pNFS Client GPFS-WAN Server SDSC

  11. GPFS-WAN 2.0 • In production late 2007 • Replaced all Intel hardware with IBM p575s • Replaced all IBM Disks with DDN arrays • Essentially everything redundant • Capacity expanded to ~1PB raw • Added use of storage pools and ILM features • Remains in production 3 years later • However, licensing and attrition have slowly reduced the number of systems capable of using GPFS-WAN…

  12. Meanwhile, in flyover country… • Indiana’s Data Capacitor: • NSF MRI grant • Lustre-based WAN file system • Identity mapping based on custom “uid table” • System/Cluster authentication using firewall rules • DC-WAN Production began late 2008 • Now mounted on BigRed, Mercury, Cobalt, Pople, Lonestar, QueenBee… • Steve Simms initially did most of the work himself • Now, a whole support team

  13. IU’s Data Capacitor WAN • Purchased by Indiana University • Announced production at LUG 2008 • Allocated on Project by Project basis • 1 pair Dell PowerEdge 2950 for MDS • 2 pair Dell PowerEdge 2950 for OSS • 2 x 3.0 GHz Dual Core Xeon • Myrinet 10G Ethernet • Dual port Qlogic 2432 HBA (4 x FC) • 2.6 Kernel (RHEL 5) • DDN S2A9550 Controller • Over 2.4 GB/sec measured throughput • 360 Terabytes of spinning SATA disk • Currently running Lustre 1.8.1.1

  14. 2007 Bandwidth Challenge:Five Applications Simultaneously • Acquisition and Visualization • Live Instrument Data • Chemistry • Rare Archival Material • Humanities • Acquisition, Analysis, and Visualization • Trace Data • Computer Science • Simulation Data • Life Science • High Energy Physics

  15. Challenge Results

  16. DC-WAN Applications • Wide range of applications and domains • Several projects spanning both TeraGrid and non-TeraGrid resources • Utilized as a simple “bridge” to bring data into TG • Has also been used for transatlantic mount to Germany • Diverse range of systems with DC-WAN lends itself to use in workflows

  17. Lustre-WAN 2.0 at PSC • J-WAN – Josephine Palencio • Support use of Kerberos for identity mapping and user authentication • Potentially very convenient for management of user identities and authorization • Kerberos is well-accepted, widely used • Many other valuable features of Lustre 2.0 • Successful tests with storage at PSC and SDSC, client mounts at several TeraGrid sites

  18. Lustre-WAN 2.0 History • Systems have been operational for over 2 years • Successful tests have been done with distributed storage at PSC and SDSC • Work is ongoing to improve, harden Kerberos and other features of Lustre 2.0 • Still pre-release, but expected to appear late this year

  19. TG-Wide Lustre-WAN • Lustre 1.8 now supports distributed storage • Storage nodes can be co-located with compute, vis resources for local access to data • 6 Sites installing storage, 1PB total usable • Will use Indiana’s UID-mapping mechanism • Almost all TeraGrid resources are now compatible • Single namespace and access mechanism will make data on Lustre-WAN near ubiquitous in TeraGrid • Planned for production October 1 2010

  20. Still much to be done • Caching of remote data on local/faster storage • Improved hierarchical data management • Integration with Archive systems • Support for pNFS/NFSv4 • Various forms of redundancy/reliability/availability • Improved identity mapping and management

  21. What have we learned? • Most important lesson is that users love widely-mounted file systems • Site-wide file systems are becoming the norm • WAN file systems are already widely used (TG, DEISA) • Additional resources add to value of WAN file systems in a non-linear fashion • Most important missing features are automated data management and link to archives

More Related