1 / 16

Accelerating Coherent Pulsar De-dispersion on Graphics Processing Units

by Arjun Radhakrishnan supervised by Prof. Michael Inggs. Accelerating Coherent Pulsar De-dispersion on Graphics Processing Units. Outline. Graphics Processing Units (GPUs) Pulsars Pulsar De-dispersion Motivation Implementation Results Conclusion & Future Work.

kin
Télécharger la présentation

Accelerating Coherent Pulsar De-dispersion on Graphics Processing Units

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. by Arjun Radhakrishnan supervised by Prof. Michael Inggs Accelerating Coherent PulsarDe-dispersion onGraphics Processing Units

  2. Outline • Graphics Processing Units (GPUs) • Pulsars • Pulsar De-dispersion • Motivation • Implementation • Results • Conclusion & Future Work

  3. Graphics Processing Units • GPUs are massively parallel processors that are present on consumer graphics cards • Generally used to render 3D objects on screen and calculate the colour of pixel to display *Source: [7] • Are mass market products due to the video game industry • Performance tracks Moore's Law since the majority of on-chip space is devoted to compute units as opposed to cache on CPUs

  4. Why Use GPUs? Figure 1: Peak floating point performance of NVIDIA GPUs vs Intel CPUs [2]

  5. Pulsars • Highly magnetised, rapidly rotating neutron stars formed after a supernova • Pulsars emit beams of electromagnetic radiation from their magnetic poles • Beams sweep in a circular path called the “lighthouse effect” • Produce periodic pulses when the pulse sweeps Earth Figure 2: Pulsar Model [3]

  6. Pulsar Dispersion • Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM) • Lower frequency components of the pulse are delayed more than higher frequencies

  7. Pulsar De-dispersion • Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM) • Lower frequency components of the pulse are delayed more than higher frequencies • Correct for the dispersion by shifting the received signal a certain amount Figure 3: Pulsar De-dispersion [4]

  8. Coherent De-dispersion • Coherent de-dispersion is the most accurate method of removing the dispersion effects of the Interstellar Matter • Preserves amplitude and phase information from the receiving signal • Convolve the voltage signal with the inverse transfer function of the ISM • This transfer function is a function of the Dispersion Measure (DM) of the signal got from models of the galactic electron density • In practice we use the Fast Fourier Transform (FFT) to make the convolution operation a multiplication in the frequency domain and then apply an inverse FFT

  9. Motivation • Why study Pulsars? • A major SKA Science driver: Detection of gravitational waves and tests of strong field relativity; Analysing black holes • GPU acceleration for MeerKAT • Large frequency range (Low: 0.5 – 2.5 GHz, High: 8 – 14.5GHz) • High bandwidth per polarisation (4GHz final) • Large number of channels (16384) • >10GB of data per second • Even more important for SKA since precision will be a high priority and data storage is not feasible

  10. Implementation Considerations • Both CPU and GPU were tested with single-precision floating point • A bottleneck for GPU computing is the time taken to send data to it from main memory – minimise as much as possible • Use asynchronous data transfers to hide the latency • Re-calculate rather than copy data across • Use shared memory on the GPU for calculations and store to global memory at the end • Source data file used is fake dual polarisation data generated with a DM of 50pc/cm3 and 100MHz bandwidth centred on 1450MHz

  11. Basic Program Flow HOST DEVICE Read in Data Copy to GPU memory Allocate memory on GPU Initiate GPU Kernel Begin De-dispersion Parallel FFT Parallel FFT Parallel FFT ... V(f0) . H-1(f0) V(f1) . H-1(f1) V(fn) . H-1(fn) ... Inverse FFT Inverse FFT Inverse FFT ... + + Output Array Receive de-dispersed signal Send Data Back to Host Free Memory Figure 4: Program flow

  12. Results Figure 5: Left: Overall speedup (5x) Right: Kernel Speedup (12x)

  13. Results • Was able to coherently de-disperse 50MHz on 1 GPU • Used 2 GPUs for the full 100MHz • Scaling across multiple GPUs was linear • Using larger transfer functions was found to increase performance since there was less of an overhead in memory access times

  14. Conclusion • GPUs are significantly faster than CPUs for de-dispersion • Enabled real-time coherent de-dispersion for the dataset used • Coherent de-dispersion of a 100MHz bandwidth signal requires multiple GPUs at present • Faster memory access would greatly improve overall speedup • Currently testing with real undetected pulsar data

  15. Questions? Thank You!

  16. References • D. R. Lorimer and M. Kramer, Handbook of Pulsar Astronomy Cambridge University Press, 2005 • NVIDIA CUDA Programming Guide • D. Manchester, “CSIRO ATNF Pulsar Education Page” • Jim Cordes, “The SKA as a Radio Synoptic Survey Telescope: Widefield Surveys for Transients, Pulsars and ETI”, SKA Memo 97 • John Rowe Animation/Australia Telescope National Facility, CSIRO [Online]. http://www.atnf.csiro.au/research/pulsar/array/gallery.html • Cornell University Dept. of Astronomy, “Legacy Pulsars: Homepage” [Online]. http://arecibo.tc.cornell.edu/legacypulsardata/Default.aspx • VR-Zone, “The NVIDIA GeForce GTX 280 1GB bare,” [Online]. http://vr-zone.com/articles/nvidia-geforce-gtx-280-preview/5872.html?doc=5872

More Related