1 / 1

Improving I/O with Compiler-Supported Parallelism

Improving I/O with Compiler-Supported Parallelism. Anna Youssefi, Ken Kennedy. Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds. Disk I/O may be a major bottleneck in applications such as:

yehudi
Télécharger la présentation

Improving I/O with Compiler-Supported Parallelism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving I/O with Compiler-Supported Parallelism Anna Youssefi, Ken Kennedy Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds. Disk I/O may be a major bottleneck in applications such as: • scientific codes related to image processing • multimedia applications • out-of-core computations Computational optimizations alone may not provide any significant improvements to these programs. Why Should Compilers Be Involved? Compilers have knowledge of both the application and the computer architecture or operating system. Compilers can reduce the burden on the programmer and increase code portability by requiring little to no change in the user level program to achieve good performance on different architectures. Compilers can automatically translate programs written in high-level languages, which may lack robust I/O or operating system interfaces, into higher performance languages that provide more control over low-level system activities. Human Neuroimaging Lab http://www.hnl.bcm.tmc.edu/ The Human Neuroimaging Laboratory at the Baylor College of Medicine conducts research in the physiology and functional anatomy of the human brain using fMRI technology. fMRI Technology Functional Magnetic Resonance imaging is a technique for determining which parts of the brain are activated when a person responds to stimuli. A high resolution brain scan is followed by a series of low resolution scans taken on regular time slices. Brain activity is identified by increased blood flow to specific regions of the brain. Motivating Application The HNL wants to optimize a preprocessing application, which normalizes brain images of human subjects to a canonical brain in order to make the images comparable and enable data analysis. The program uses calls to the SPM (Statistical Parametric Mapping) library. Transformation: Loop Distribution & Parallelization Single processor Processor 1 Processor 2 Processor 3 Processor 4 Hand transformation on I/O-intensive loop in HNL preprocessing application The original loop reads a different input file and writes a portion of a single output file on each iteration. The loop is distributed into two separate loops: the first loop runs in parallel on four different processors; the second loop runs sequentially across all processors. Standard compiler transformations are implemented by hand to parallelize the loop. Dependence analysis can be used to automate the transformation. Performance Results Performance of the transformed loop was constrained by shortcomings of the MPI (Message Passing Interface) implementation we used. This implementation relies on file I/O to share data and results in excessive communication times, as demonstrated by the broadcast overhead. Even with these performance constraints, we achieved 30-40% improvement in running time. We expect to achieve even better results from using a different MPI implementation. Conclusion and Future Work Through parallelization, we achieved a minimum of 30% improvement in the running time of an I/O-intensive loop. Standard compiler transformations can be extended to reveal the parallelism in such loops. We plan to implement compiler strategies to automate these transformations. We also plan to implement compiler support for other application-level I/O transformations, such as converting synchronous to asynchronous I/O, prefetching and overlapping I/O with computation. for i=1 to 192 READ PROCESS WRITE for i=1 to 48 READ PROCESS for i=1 to 48 READ PROCESS for i=1 to 48 READ PROCESS for i=1 to 48 READ PROCESS for i=1 to 48 WRITE for i=1 to 48 WRITE for i=1 to 48 WRITE for i=1 to 48 WRITE

More Related