1 / 4

Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors

Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors. Yuanrui Zhang, Mahmut Kandemir Computer Science and Engineering, The Pennsylvania State University. Nikos Pitsianis, Xiaobai Sun Computer Science, Duke University. Non-Uniform FFT Local Convolution. Local Support.

alijah
Télécharger la présentation

Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Parallelization of Non-Uniform Convolutions on Chip Multiprocessors Yuanrui Zhang, Mahmut Kandemir Computer Science and Engineering, The Pennsylvania State University. Nikos Pitsianis, Xiaobai Sun Computer Science, Duke University.

  2. Non-Uniform FFT Local Convolution Local Support sources targets • While FFTs are well implemented by FFTW, we concentrate on accelerating the parallel convolution step on multicores. • Geometric tiling can enhance data locality and reuse for the non-uniform local convolution, a matrix-vector product with an irregular and sparse matrix. • Multicores have different on-chip memory hierarchies and characteristics.

  3. Architecture Aware Hierarchical Geometric Tiling and Parallel Scheduling • Hierarchical tiling according to memory hierarchy and sizes • Neighborhood tile traversing order • Block distribution • Try to take care of data sharing and locality at all levels of cache 1 2 19 18 6 3 16 17 7 5 4 On-chip memory abstraction 8 14 15 13 12 9 10 11

  4. Preliminary Evaluation

More Related