1 / 38

Hiding Periodic I/O Costs in Parallel Applications

Hiding Periodic I/O Costs in Parallel Applications. Xiaosong Ma Department of Computer Science University of Illinois at Urbana-Champaign Spring 2003. Roadmap. Introduction Active buffering: hiding recurrent output cost Ongoing work: hiding recurrent input cost Conclusions.

cleave
Télécharger la présentation

Hiding Periodic I/O Costs in Parallel Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hiding Periodic I/O Costsin Parallel Applications Xiaosong Ma Department of Computer Science University of Illinois at Urbana-Champaign Spring 2003

  2. Roadmap • Introduction • Active buffering: hiding recurrent output cost • Ongoing work: hiding recurrent input cost • Conclusions

  3. Introduction • Fast-growing technology propels high performance applications • Scientific computation • Parallel data mining • Web data processing • Games, movie graphics • Individual component’s growth un-coordinated • Manual performance tuning needed

  4. We Need Adaptive Optimization • Flexible and automatic performance optimization desired • Efficient high-level buffering and prefetching for parallel I/O in scientific simulations

  5. Scientific Simulations • Important • Detail and flexibility • Save money and lives • Challenging • Multi-disciplinary • High performance crucial

  6. Parallel I/O in Scientific Simulations • Write-intensive • Collective and periodic • “Poor stepchild” • Bottleneck-prone • Existing collective I/O focused on data transfer … Computation I/O Computation I/O Computation I/O Computation …

  7. My Contributions • Idea: I/O optimizations in larger scope • Parallelism between I/O and other tasks • Individual simulation’s I/O need • I/O related self-configuration • Approach: hide the I/O cost • Results • Publications, technology transfer, software

  8. Roadmap • Introduction • Active buffering: hiding recurrent output cost • Ongoing work: hiding recurrent input cost • Conclusions

  9. Latency Hierarchy on Parallel Platforms local memory access inter-processor communication disk I/O wide-area transfer • Along path of data transfer • Smaller throughput • Lower parallelism and less scalable

  10. Basic Idea of Active Buffering • Purpose: maximize overlap between computation and I/O • Approach: buffer data as early as possible

  11. Challenges • Accommodate multiple I/O architectures • No assumption on buffer space • Adaptive • Buffer availability • User request patterns

  12. Roadmap • Introduction • Active buffering: hiding recurrent output cost • With client-server I/O architecture [IPDPS ’02] • With server-less architecture • Ongoing work: hiding recurrent input cost • Related work and future work • Conclusions

  13. File System Client-Server I/O Architecture compute processors I/O servers

  14. buffer data prepare enter collective buffer space write routine available out of buffer no overflow space exit send a block all data sent data to send Client State Machine

  15. exit Server State Machine data to receive & enough buffer space prepare receive a block recv. write out of buffer space request fetch write a block idle- listen init. recv. recv. got write alloc. buffers idle, no exit req. all data message data to received fetch & no fetch a block write done data to data no request write idle & fetch & write all busy- listen to fetch write done exit msg.

  16. Maximize Apparent Throughput • Ideal apparent throughput per server Dtotal Tideal = Dc-buffered Dc-overflow Ds-overflow Tmem-copy TMsg-passing Twrite • More expensive data transfer only becomes visible when overflow happens • Efficiently masks the difference in write speeds + +

  17. Write Throughput without Overflow • Panda Parallel I/O library • SGI Origin 2000, SHMEM • Per client: 16MB output data per snapshot, 64MB buffer • Two servers, each with 256MB buffer

  18. Write Throughput with Overflow • Panda Parallel I/O library • SGI Origin 2000, SHMEM, MPI • Per client: 96MB output data per snapshot, 64MB buffer • Two servers, each with 256MB buffer

  19. Give Feedback to Application • “Softer” I/O requirements • Parallel I/O libraries have been passive • Active buffering allows I/O libraries to take more active role • Find optimal output frequency automatically

  20. exit Server-side Active Buffering data to receive & enough buffer space prepare receive a block recv. write out of buffer space request fetch write a block idle- listen init. recv. recv. got write alloc. buffers idle, no exit req. all data message data to received fetch & no fetch a block write done data to data no request write idle & fetch & write all busy- listen to fetch write done exit msg.

  21. Performance with Real Applications • Application overview – GENX • Large-scale, multi-component, detailed rocket simulation • Developed at Center for Simulation of Advanced Rockets (CSAR), UIUC • Multi-disciplinary, complex, and evolving • Providing parallel I/O support for GENX • Identification of parallel I/O requirements [PDSECA ’03] • Motivation and test case for active buffering

  22. Overall Performance of GEN1 • SDSC IBM SP (Blue Horizon) • 64 clients, 2 I/O servers with AB • 160MB output data per snapshot (in HDF4)

  23. Aggregate Write Throughput in GEN2 • LLNL IBM SP (ASCI Frost) • 1 I/O server per 16-way SMP node • Write in HDF4

  24. internet Scientific Data Migration • Output data need to be moved • Online migration • Extend active buffering to migration • Local storage becomes another layer in buffer hierarchy … Computation I/O Computation I/O Computation I/O Computation

  25. servers Internet File System workstation running visualization tool I/O Architecture with Data Migration compute processors

  26. Active Buffering for Data Migration • Avoid unnecessary local I/O • Hybrid migration approach memory-to-memory transfer disk staging Combined with data compression [ICS ’02] Self-configuration for online visualization

  27. Roadmap • Introduction • Active buffering: hiding recurrent output cost • With client-server I/O architecture • With server-less architecture [IPDPS ’03] • Ongoing work: hiding recurrent input cost • Conclusions

  28. File System Server-less I/O Architecture I/O thread compute processors

  29. ADIO ABT HFS NFS NTFS PFS PVFS UFS XFS Making ABT Transparent and Portable • Unchanged interfaces • High-level and file-system independent • Design and evaluation [IPDPS ’03] • Ongoing transfer to ROMIO

  30. Active Buffering vs. Asynchronous I/O

  31. Roadmap • Introduction • Active buffering: hiding recurrent output cost • Ongoing work: hiding recurrent input cost • Conclusions

  32. I/O in Visualization • Periodic reads • Dual modes of operation • Interactive • Batch-mode • Harder to overlap reads with computation … Computation I/O Computation I/O Computation I/O Computation

  33. Efficient I/O Through Data Management • In-memory database of datasets • Manage buffers or values • Hub for I/O optimization • Prefetching for batch mode • Caching for interactive mode • User-supplied read routine

  34. Related Work • Overlapping I/O with computation • Replacing synchronous calls with async calls [Agrawal et al. ICS ’96] • Threads [Dickens et al. IPPS ’99, More et al. IPPS ’97] • Automatic performance optimization • Optimization with performance models [Chen et al. TSE ’00] • Graybox optimization [Arpaci-Dusseau et al. SOSP ’01]

  35. Roadmap • Introduction • Active buffering: hiding recurrent output cost • Ongoing work: hiding recurrent input cost • Conclusions

  36. Conclusions • If we can’t shrink it, hide it! • Performance optimization can be done • more actively • at higher-level • in larger scope • Make I/O part of data management

  37. References • [IPDPS ’03] Xiaosong Ma, Marianne Winslett, Jonghyun Lee and Shengke Yu, Improving MPI-IO Output Performance with Active Buffering Plus Threads, 2003 International Parallel and Distributed Processing Symposium • [PDSECA ’03] Xiaosong Ma, Xiangmin Jiao, Michael Campbell and Marianne Winslett, Flexible and Efficient Parallel I/O for Large-Scale Multi-component Simulations, The 4th Workshop on Parallel and Distributed Scientific and Engineering Computing with Applications • [ICS ’02] Jonghyun Lee, Xiaosong Ma, Marianne Winslett and Shengke Yu, Active Buffering Plus Compressed Migration: An Integrated Solution to Parallel Simulations' Data Transport Needs, the 16th ACM International Conference on Supercomputing • [IPDPS ’02] Xiaosong Ma, Marianne Winslett, Jonghyun Lee and Shengke Yu, Faster Collective Output through Active Buffering, 2002 International Parallel and Distributed Processing Symposium

More Related