1 / 7

Data Oriented Design

Data Oriented Design. Generic Programming Johannes de Fine Licht University of Copenhagen (DK) Thematic CERN School of Computing 2014. The problem at hand. Navigation in detector geometry. Geant4: single particles navigated in sequence.

ianthe
Télécharger la présentation

Data Oriented Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Oriented Design Generic Programming Johannes de Fine Licht University of Copenhagen (DK) Thematic CERN School of Computing 2014

  2. The problem at hand Navigation in detector geometry Geant4: single particles navigated in sequence. GeantV: vectors of particles exploiting SIMD operations. Borrowed from F. Carminati, “Towards a high performance detector geometry library on CPU and GPU”, Annual Concurrency Forum 2014 2

  3. We want to exploit CPU vector instructions (SSE, AVX) and accelerators such as GPUs and the Xeon Phi Writing implementations for each backend would be an enormous effort, and is hopeless in terms of maintainability Our goal was to write generic kernels, that could exploit various backends (platforms, libraries, hardware) Generic programming

  4. Vc for wrapping intrinsics Vc is a project developed in Germany by Matthias Kretz The library wraps vector instruction intrinsics in easy-to-use classes, resulting in portableexplicitly vectorized code http://code.compeng.uni-frankfurt.de/projects/vc/ #include <Vc/Vc> using namespace Vc; double_v add(double_v const &a, double_v const &b) { // Performs 2 (SSE), 4 (AVX) or 8 (MIC) additions return a + b; }

  5. Abstracted kernels Write algorithms in a way where we can plug in the backend Compiles to CPU vector instruction intrinsics through Vc Compiles to CUDA kernel code executable on GPU template <class Double_t, class Bool_t> Bool_t BoxContains(const double dimensions[3], const Double_t point[3]) { Bool_t contains[3]; for (int i = 0; i < 3; ++i) { contains[i] = abs(point[i]) < dimensions[i]; } return contains[0] && contains[1] && contains[2]; } 5

  6. Maximizing hardware occupancy • Single code base means we can develop algorithms and intrinsics separately • With a good scheduler, we can dispatch work to the optimal processor at runtime • We even gain from other effects; with AVX we exceed 4x speedup at high input size… (Generic algorithm implemented by Georgios Bitzes.) 6

  7. Acknowledgements Geometry team John Apostolakis (CERN/SFT) Marilena Bandiermonte (CERN/SFT) Georgios Bitzes (CERN/openlab) Gabriele Cosmo (CERN/SFT) Laurent Duhem (Intel) Andrei Gheata (CERN/SFT) Guilherme Lima (Fermilab) Tatiana Nikitina (CERN/SFT) Sandro Wenzel (CERN/SFT) GeantV team at CERN (Federico Carminati) and Fermilab (Philippe Canal) tCSC 2014, Andrzej Nowak 7

More Related