1 / 7

Enhancing Gravitational N-body Simulations: Key Design Goals & Performance Metrics

This text explores the major design goals of gravitational N-body simulations, emphasizing efficiency, versatility, and scalability. Key objectives include the capability to implement various numerical methods and configurable control parameters. The discussion highlights the computational hardware capabilities required, ranging from desktop to server configurations, along with performance metrics like average and peak floating-point operations per second (FLOPs). The text also reviews parallelization techniques and memory requirements, essential for simulating large particle systems in astrophysics.

pierce
Télécharger la présentation

Enhancing Gravitational N-body Simulations: Key Design Goals & Performance Metrics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gravitational N-body Simulation Major Design Goals Efficiency Versatility (ability to use different numerical methods) Scalability Lesser Design Goals Flexibility (control parameters must be configurable) Persistence (pause and continue) Visualization

  2. Hardware 6 GFlops average desktop 256 GFlops top-line server Single Computer Configuration 1-4 CPUs 1-4 Cores 3-4 GHz CPUs 2-4 32-bit FP IPC 1-2 64-bit FP IPC Windows Cluster Configurations http://gears.aset.psu.edu/hpc/systems/ -LION-XO (80x2xOpteron/8GB + 40x4xOpteron/16GB; 2.4 GHz) -1.6 TFlops (32-bit); 800 GFlops (64-bit); single-core assumed -Gigabit Ethernet GNU/Linux Single or dual core CPUs? CPU Model?

  3. Algorithms Direct Methods: O(N2) + very simple + scalable inefficient (~30,000 particles max @ 256 GFlops) Treecode / Mutipole: O(NlogN) more difficult to implement scalability harder to achieve + efficient (106-1010 particles) Field Methods: O(NlogN) or O(N) Involves solving Poisson’s equation Area of active research

  4. Levels of Parallelization 1) SIMD: up to 4 threads -4x32-bit flops/cycle -2x64-bit flops/cycle 2) SMP/MPU: up to 4 threads -1-4 cores -1-4 CPUs 3) Cluster: up to N nodes

  5. Memory Requirements Position: x, y, z Velocity: vx, vy, vz 6x4 = 24 bytes (32-bit fp) 6x8 = 48 bytes (64-bit fp) 2,500 points per KB (32-bit) 1,300 points per KB (64-bit)

  6. Levels of Memory 1) L1 cache: 64 KB -CPU clock-speed -no latency 2) L2 cache: 1 MB -CPU clock-speed -low latency 3) RAM: GBs -reduced speed (up to 12-24GB/s) -huge latency 4) Network (weakest link) -1 Gbit/sec

  7. 109 Particles Require… Memory: 24 GB (32-bit) Instructions per iteration: Log2(109)x109xconst~3x1012ops=3TFlops Time: ~12 sec @ 256 GFlops

More Related