1 / 24

Xrootd Present & Future The Drama Continues

This presentation discusses the state of performance, application design, performance measurements, server scaling, OS and filesystem impact, clustering, future leaf node SRM, alternative root node SRM, SRM integration status, and the next big thing in high-performance data access servers.

rodneycruz
Télécharger la présentation

Xrootd Present & Future The Drama Continues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Xrootd Present & FutureThe Drama Continues Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University HEPiX 13-October-05 http://xrootd.slac.stanford.edu

  2. Outline • The state of performance • Single server • Clustered servers • The SRM Debate • The Next Big Thing • Conclusion 2: http://xrootd.slac.stanford.edu

  3. Application Design Point • Complex embarrassingly parallel analysis • Determine particle decay products • 1000’s of parallel clients hitting the same data • Small block sparse random access • Median size < 3K • Uniform seek across whole file (mean 650MB) • Only about 22% of the file read (mean 140MB) 3: http://xrootd.slac.stanford.edu

  4. Performance Measurements • Goals • Very low latency • Handle many parallel clients • Test setup • Sun V20z 1.86MHz dual Opteron, 2GB RAM • 1Gb on board Broadcom NIC (same subnet) • Solaris 10 x86 • Linux RHEL3 2.4.21-2.7.8.ELsmp • Client running BetaMiniApp with analysis removed 4: http://xrootd.slac.stanford.edu

  5. Latency Per Request (xrootd) 5: http://xrootd.slac.stanford.edu

  6. Capacity vs Load (xrootd) 6: http://xrootd.slac.stanford.edu

  7. xrootd Server Scaling • Linear scaling relative to load • Allows deterministic sizing of server • Disk • NIC • CPU • Memory • Performance tied directly to hardware cost • Competitive to best-in-class commercial file servers 7: http://xrootd.slac.stanford.edu

  8. OS Impact on Performance 8: http://xrootd.slac.stanford.edu

  9. Device & Filesystem Impact I/O limited CPU limited UFS good on small reads VXFS good on big reads 1 Event » 2K 9: http://xrootd.slac.stanford.edu

  10. Overhead Distribution 10: http://xrootd.slac.stanford.edu

  11. Network Overhead Dominates 11: http://xrootd.slac.stanford.edu

  12. Xrootd Clustering (SLAC) kan01 kan02 kan03 kan04 kanxx Redirectors kanolb-a bbr-olb03 bbr-olb04 client machines Hidden Details 12: http://xrootd.slac.stanford.edu

  13. Clustering Performance • Design can scale to at least 256,000 servers • SLAC runs a 1,000 node test server cluster • BNL runs a 350 node production server cluster • Self-regulating (via minimal spanning tree algorithm) • 280 nodes self-cluster in about 7 seconds • 890 nodes self-cluster in about 56 seconds • Client overhead is extremely low • Overhead added to meta-data requests (e.g., open) • ~200us * log64(number of servers) / 2 • Zero overhead for I/O 13: http://xrootd.slac.stanford.edu

  14. Current MSS Support • Lightweight agnostic interfaces provided • oss.mssgwcmd command • Invoked for each create, dirlist, mv, rm, stat • oss.stagecmd |command • Long running command, request stream protocol • Used to populate disk cache (i.e., “stage-in”) mssgwcmd MSS xrootd (oss layer) stagecmd 15: http://xrootd.slac.stanford.edu

  15. Future Leaf Node SRM • MSS Interface ideal spot for SRM hook • Use existing hooks or new long running hook • mssgwcmd & stagecmd • oss.srm |command • Processes external disk cache management requests • Should scale quite well Grid srm xrootd (oss layer) MSS 16: http://xrootd.slac.stanford.edu

  16. rc dm BNL/LBL Proposal Replica Services GRID BNL Replica Registration Service & DataMover srm Generic Standard Clients drm LBL das xrootd 17: http://xrootd.slac.stanford.edu

  17. Alternative Root Node SRM • Team olbd with SRM • File management & discovery • Tight management control • Several issues need to be considered • Introduces many new failure modes • Will not generally scale Grid srm olbd (root node) MSS 18: http://xrootd.slac.stanford.edu

  18. SRM Integration Status • Unfortunately, SRM interface in flux • Heavy vs light protocol • Working with LBL team • Working towards OSG sanctioned future proposal • Trying to use the Fermilab SRM • Artem Turnov at IN2P3 exploring issues 19: http://xrootd.slac.stanford.edu

  19. The Next Big Thing High Performance Data Access Servers plus Efficient large scale clustering Allows Novel cost-effective super-fast massive storage Optimized for sparse random access Imagine 30TB of DRAM At commodity prices 20: http://xrootd.slac.stanford.edu

  20. Device Speed Delivery 21: http://xrootd.slac.stanford.edu

  21. Memory Access Characteristics Server: zsuntwo CPU: Sparc NIC: 100Mb OS: Solaris 10 UFS: Sandard 22: http://xrootd.slac.stanford.edu

  22. The Peta-Cache • Cost-effect memory access impacts science • Nature of all random access analysis • Not restricted to just High Energy Physics • Enables faster and more detailed analysis • Opens new analytical frontiers • Have a 64-node test cluster • V20z each with 16GB RAM • 1TB “toy” machine 23: http://xrootd.slac.stanford.edu

  23. Conclusion • High performance data access systems achievable • The devil is in the details • Must understand processing domain and deployment infrastructure • Comprehensive repeatable measurement strategy • High performance and clustering are synergetic • Allows unique performance, usability, scalability, and recoverability characteristics • Such systems produce novel software architectures • Challenges • Creating application algorithms that can make use of such systems • Opportunities • Fast low cost access to huge amounts of data to speed discovery 24: http://xrootd.slac.stanford.edu

  24. Acknowledgements • Fabrizio Furano, INFN Padova • Client-side design & development • Bill Weeks • Performance measurement guru • 100’s of measurements repeated 100’s of times • US Department of Energy • Contract DE-AC02-76SF00515 with Stanford University • And our next mystery guest! 25: http://xrootd.slac.stanford.edu

More Related