190 likes | 323 Vues
This presentation discusses the current limitations of HDF5 in multi-threaded applications, particularly focusing on its single global semaphore which restricts concurrency. It proposes strategies for improving performance, including replacing the global semaphore with individual semaphores for data structures to enhance concurrency and reduce latency. Recommendations for future directions highlight the importance of leveraging asynchronous I/O and optimizing data handling processes. The session will emphasize the necessary expertise, cost considerations, and potential paths forward for better multi-threading support within the HDF5 library.
E N D
Multi-threading in HDF5:Paths Forward Current implementation - Future directions HDF5 Workshop at PSI
Outline • Introduction • Current implementation • Paths forward: • Improve concurrency • Reduce latency • Conclusions and Recommendations HDF5 Workshop at PSI
Introduction HDF5 Workshop at PSI
Introduction • HDF5 design principles • Flexibility • Adaptability to new computational environments • Current challenges: • Multi-threaded applications run on multi-core systems • HDF5 thread-safe library cannot support concurrency built into such applications HDF5 Workshop at PSI
Current implementation HDF5 Workshop at PSI
Current Implementation • HDF5 uses single global semaphore • Controls modification of memory and file data structures: • One thread at a time enters the library • An application thread enters HDF5 API routine, acquires semaphore • Other threads are blocked until the thread completes API call and releases semaphore • No simultaneous modifications of data structures that can cause file corruption • No race conditions when several threads try to modify a memory data structure HDF5 Workshop at PSI
Current Implementation • Pros: • Current implementation provides thread-safety needed to avoid corruption of data structures • Cons: • No concurrent use of HDF5 library by multi-threaded applications HDF5 Workshop at PSI
Paths forward HDF5 Workshop at PSI
Improving Concurrency • Replace single global semaphore with semaphores that guard individual data structures • Pros: • Greater level of concurrency • No corruption of internal data structures • Each thread waits only when it needs to modify a data structure locked by another thread • Reduces waiting time for a resource to become available HDF5 Workshop at PSI
Improving Concurrency • Cons: • Replacing the global semaphore with individual semaphores, locks, etc. requires careful analysis of HDF5 data structures and their interactions • 300K lines of C code in library will require 4-6 FTE years of knowledgeable staff • Significant future maintenance effort • Testing challenges HDF5 Workshop at PSI
Reducing Latency • Reduce waiting time for each thread to acquire global semaphore • Reduce time by removing known HDF5 bottlenecks: • I/O performance • “Compute bound” (CB) operations • Datatype conversions • Compression and other filters • General overhead • E.g., structures for storing and accessing chunked datasets and metadata HDF5 Workshop at PSI
Reducing Latency: I/O Performance • Support asynchronous I/O (AIO) access to data in HDF5 file • AIO initiated within the library in response to an API call • Completes in the backgroundafter API call has returned • Global semaphore is released when API call returns – less waiting time HDF5 Workshop at PSI
Reducing Latency: CB operations • Use multiple threads within HDF5 library to • Perform datatype conversion • Perform compression on one chunk • Multiple threads work on one chunk • Perform compression on many chunks • Each thread works on a chunk HDF5 Workshop at PSI
Reducing Latency: General Optimizations • Traditional optimizations • Some examples: • Algorithm improvements for handling • Chunk cache • Hyperslabselections • Memory usage • Data structure improvements • Chunk indices with O(1) lookup speed • Advanced B-tree implementations HDF5 Workshop at PSI
Reducing Latency • Pros: • Smaller development effort, ~ 1.5 FTE years • Localized changes to the library • Easier to maintain • Incremental improvements • Cons: • Still uses global semaphore HDF5 Workshop at PSI
Recommendations HDF5 Workshop at PSI
Decision • Reduce Latency • Decision factors: • Available expertise • Cost • Already funded features: • AIO • Using multiple threads to compress a chunk • Future maintainability HDF5 Workshop at PSI
Other considerations • Approaches are not mutually exclusive • Both can be implemented in the future if funding is available HDF5 Workshop at PSI
Thank You! Questions? HDF5 Workshop at PSI