1 / 4

Sparse Matrix-Dense Vector Multiply on G80: Probing the CUDA Parameter Space

Sparse Matrix-Dense Vector Multiply on G80: Probing the CUDA Parameter Space. Comp 790 GPGP Project Stephen Olivier. Currently…. Have a working “naïve” implementation in which each thread computes one dot product (Similar to Sashi’s implementation) 1.26 GFLOPs, 7.56 GB/s for n=32k, nz/row=20

jorryn
Télécharger la présentation

Sparse Matrix-Dense Vector Multiply on G80: Probing the CUDA Parameter Space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sparse Matrix-Dense Vector Multiply on G80: Probing the CUDA Parameter Space Comp 790 GPGP Project Stephen Olivier

  2. Currently… • Have a working “naïve” implementation in which each thread computes one dot product (Similar to Sashi’s implementation) • 1.26 GFLOPs, 7.56 GB/s for n=32k, nz/row=20 • In the midst of implementing a version using the texture memory, which is cached, to store the input vector • Also developing an analytic model to express the parameterization of work and data partitioning to suit the G80

  3. Pertinent Constraints • Available parallelism • Potential reuse • Capacity constraints of the various memories • Multithreading constraints • Thread/block/grid layout • Data distribution and blocking for the memory hierarchy • Amount of sequential work done for latency hiding

  4. Resulting Analytic Model • Model will approximate ideal parameters based on problem size, e.g. number of rows and (average) number of nonzeros per row • Plan to verify the model by testing against a wide variation in the combinations of the parameters for some key sample problems • Can implement the model as an “autotuner” for G80 SpMV in the spirit of ATLAS or FFTW • Can integrate directly into code for g80 iterative methods, e.g. conjugate gradient

More Related