1 / 24

Event Reconstruction in STS

Event Reconstruction in STS. I. Kisel GSI. CBM-RF-JINR Meeting Dubna, May 21, 2009. Many-core HPC. CPU Intel: XX-cores. Gaming STI: Cell. GP CPU Intel: Larrabee. ?. GP GPU Nvidia: Tesla. CPU/GPU AMD: Fusion. ?. FPGA Xilinx: Virtex. ?. On-line event selection

jessie
Télécharger la présentation

Event Reconstruction in STS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Event Reconstruction in STS I. Kisel GSI CBM-RF-JINR Meeting Dubna, May 21, 2009

  2. Many-core HPC CPU Intel: XX-cores Gaming STI: Cell GP CPU Intel: Larrabee ? GP GPU Nvidia: Tesla CPU/GPU AMD: Fusion ? FPGA Xilinx: Virtex ? • On-line event selection • Mathematical and computational optimization • Optimization of the detector ? OpenCL ? • Heterogeneous systems of many cores • Uniform approach to all CPU/GPU families • Similar programming languages (CUDA, Ct, OpenCL) • Parallelization of the algorithm (vectors, multi-threads, many-cores) Ivan Kisel, GSI

  3. Current and Expected Eras of Intel Processor Architectures From S. Borkar et al. (Intel Corp.), "Platform 2015: Intel Platform Evolution for the Next Decade", 2005. Cores HW Threads SIMD width • Future programming is 3-dimentional • The amount of data is doubling every 18-24 month • Massive data streams • The RMS (Recognition, Mining, Synthesis) workload in real time • Supercomputer-level performance in ordinary servers and PCs • Applications, like real-time decision-making analysis Ivan Kisel, GSI

  4. Cores and HW Threads CPU architecture in 19XX CPU architecture in2000 Process Thread1 Thread2 exer/w r/wexe exer/w ... ... 2 Threads per Process per CPU CPU architecture in 2009 CPU of your laptop in2015 Cores and HW threads are seen by an operating system as CPUs: > cat /proc/cpuinfo Maximum half of threads are executed 1 Process per CPU Ivan Kisel, GSI

  5. SIMD Width S S S S D2 D1 S8 S4 S4 S8 S12 S4 S16 S2 S6 S10 S6 S14 S2 S2 S15 S3 S3 S7 S3 S11 S7 S1 S5 S1 S9 S13 S5 S1 CPU SIMD = Single Instruction Multiple Data SIMD uses vector registers SIMD exploits data-level parallelism Vector Scalar D Scalar double precision (64 bits) Faster or Slower ? D Vector (SIMD) double precision (128 bits) 2 or 1/2 Vector (SIMD) single precision (128 bits) 4 or 1/4 Intel AVX (2010) vector single precision (256 bits) 8 or 1/8 Intel LRB (2010) vector single precision (512 bits) 16 or 1/16 Ivan Kisel, GSI

  6. SIMD KF Track Fit on Intel Multicore Systems: Scalability 10.00 2xCell SPE ( 16 ) Woodcrest ( 2 ) Clovertown ( 4 ) Dunnington ( 6 ) 1.00 0.10 0.01 scalar 8 32 single -> 2 4 16 double Real-time performance on different Intel CPU platforms Speed-up 3.7 on the Xeon 5140 (Woodcrest) Fit time, ms/track # threads Real-time performance on different CPU architectures – speed-up 100 with 32 threads H. Bjerke, S. Gorbunov, I. Kisel, V. Lindenstruth, P. Post, R. Ratering Ivan Kisel, GSI

  7. Intel Larrabee: 32 Cores • LRB vs. GPU: • Larrabee will differ from other discrete GPUs currently on the market such as the GeForce 200 Series and the Radeon 4000 series in three major ways: • use the x86 instruction set with Larrabee-specific extensions; • feature cache coherency across all its cores; • include very little specialized graphics hardware. • LRB vs. CPU: • The x86 processor cores in Larrabee will be different in several ways from the cores in current Intel CPUs such as the Core 2 Duo: • LRB's 32 x86 cores will be based on the much simpler Pentium design; • each core supports 4-way simultaneous multithreading, with 4 copies of each processor register; • each core contains a 512-bit vector processing unit, able to process 16 single precision floating point numbers at a time; • LRB includes explicit cache control instructions; • LRB has a 1024-bit (512-bit each way) ring bus for communication between cores and to memory; • LRB includes one fixed-function graphics hardware unit. L. Seiler et all, Larrabee: A Many-Core x86 Architecture for Visual Computing, ACM Transactions on Graphics, Vol. 27, No. 3, Article 18, August 2008. Ivan Kisel, GSI

  8. General Purpose Graphics Processing Units (GPGPU) • Substantial evolution of graphics hardware over the past years • Remarkable programmability and flexibility • Reasonably cheap • New branch of research – GPGPU Ivan Kisel, GSI

  9. NVIDIA Hardware • Streaming multiprocessors • No overhead thread switching • FPUs instead of cache/control • Complex memory hierarchy • SIMT – Single Instruction Multiple Threads • GT200 • 30 multiprocessors • 30 DP units • 8 SP FPUs per MP • 240 SP units • 16 000 registers per MP • 16 kB shared memory per MP • >= 1 GB main memory • 1.4 GHz clock • 933 GFlops SP S. Kalcher, M. Bach Ivan Kisel, GSI

  10. SIMD/SIMT Kalman Filter on the CSC-Scout Cluster 18x(2x(Quad-Xeon, 3.0 GHz, 2x6 MB L2), 16 GB) + 27xTesla S1070(4x(GT200, 4 GB)) GPU 9100 CPU 1600 M. Bach, S. Gorbunov, S. Kalcher, U. Kebschull, I. Kisel, V. Lindenstruth Ivan Kisel, GSI

  11. CPU/GPU Programming Frameworks • Cg, OpenGL Shading Language, Direct X • Designed to write shaders • Require problem to be expressed graphically • AMD Brook • Pure stream computing • No hardware specific • AMD CAL (Compute Abstraction Layer) • Generic usage of hardware on assembler level • NVIDIA CUDA (Compute Unified Device Architecture) • Defines hardware platform • Generic programming • Extension to the C language • Explicit memory management • Programming on thread level • Intel Ct (C for throughput) • Extension to the C language • Intel CPU/GPU specific • SIMD exploitation for automatic parallelism • OpenCL (Open Computing Language) • Open standard for generic programming • Extension to the C language • Supposed to work on any hardware • Usage of specific hardware capabilities by extensions Ivan Kisel, GSI

  12. Cellular Automaton Track Finder 10 200 500 Ivan Kisel, GSI

  13. L1 CA Track Finder: Efficiency • Fluctuated magnetic field? • Too large STS acceptance? • Too large distance between STS stations? I. Rostovtseva Ivan Kisel, GSI

  14. L1 CA Track Finder: Changes I. Kulakov Ivan Kisel, GSI

  15. L1 CA Track Finder: Timing old – old version (from CBMRoot DEC08) new – new paralleled version Statistic: 100 central events Processor: Pentium D, 3.0 GHz, 2 MB. I. Kulakov Ivan Kisel, GSI

  16. On-line = Off-line Reconstruction ? • Off-line and on-line reconstructions will and should be parallelized • Both versions will be run on similar many-core systems or even on the same PC farm • Both versions will use (probably) the same parallel language(s), such as OpenCL • Can we use the same code, but with some physics cuts applied when running on-line, like L1 CA? • If the final code is fast, can we think about a global on-line event reconstruction and selection? Ivan Kisel, GSI

  17. Summary • Think parallel ! • Parallel programming is the key to the full potential of the Tera-scale platforms • Data parallelism vs. parallelism of the algorithm • Stream processing – no branches • Avoid direct accessing main memory, no maps, no look-up-tables • Use SIMD unit in the nearest future (many-cores, TF/s, …) • Use single-precision floating point where possible • In critical parts use double precision if necessary • Keep portability of the code on heterogeneous systems (Intel, AMD, Cell, GPGPU, …) • New parallel languages appear: OpenCL, Ct, CUDA • GPGPU is personal supercomputer with 1 TFlops for 300 EUR !!! • Should we start buying them for testing the algorithms now? Ivan Kisel, GSI

  18. Back-up Slides (1-5) Back-up Ivan Kisel, GSI

  19. Back-up Slides (1/5) Back-up Ivan Kisel, GSI

  20. Back-up Slides (2/5) Back-up Ivan Kisel, GSI

  21. Back-up Slides (3/5) Back-up Ivan Kisel, GSI

  22. Back-up Slides (4/5) SIMD is out of consideration (I.K.) Back-up Ivan Kisel, GSI

  23. Back-up Slides (5/5) Back-up Ivan Kisel, GSI

  24. Tracking Workshop Please be invited to the Tracking Workshop 15-17 June 2009 at GSI Ivan Kisel, GSI

More Related