70 likes | 187 Vues
This text explores the major design goals of gravitational N-body simulations, emphasizing efficiency, versatility, and scalability. Key objectives include the capability to implement various numerical methods and configurable control parameters. The discussion highlights the computational hardware capabilities required, ranging from desktop to server configurations, along with performance metrics like average and peak floating-point operations per second (FLOPs). The text also reviews parallelization techniques and memory requirements, essential for simulating large particle systems in astrophysics.
E N D
Gravitational N-body Simulation Major Design Goals Efficiency Versatility (ability to use different numerical methods) Scalability Lesser Design Goals Flexibility (control parameters must be configurable) Persistence (pause and continue) Visualization
Hardware 6 GFlops average desktop 256 GFlops top-line server Single Computer Configuration 1-4 CPUs 1-4 Cores 3-4 GHz CPUs 2-4 32-bit FP IPC 1-2 64-bit FP IPC Windows Cluster Configurations http://gears.aset.psu.edu/hpc/systems/ -LION-XO (80x2xOpteron/8GB + 40x4xOpteron/16GB; 2.4 GHz) -1.6 TFlops (32-bit); 800 GFlops (64-bit); single-core assumed -Gigabit Ethernet GNU/Linux Single or dual core CPUs? CPU Model?
Algorithms Direct Methods: O(N2) + very simple + scalable inefficient (~30,000 particles max @ 256 GFlops) Treecode / Mutipole: O(NlogN) more difficult to implement scalability harder to achieve + efficient (106-1010 particles) Field Methods: O(NlogN) or O(N) Involves solving Poisson’s equation Area of active research
Levels of Parallelization 1) SIMD: up to 4 threads -4x32-bit flops/cycle -2x64-bit flops/cycle 2) SMP/MPU: up to 4 threads -1-4 cores -1-4 CPUs 3) Cluster: up to N nodes
Memory Requirements Position: x, y, z Velocity: vx, vy, vz 6x4 = 24 bytes (32-bit fp) 6x8 = 48 bytes (64-bit fp) 2,500 points per KB (32-bit) 1,300 points per KB (64-bit)
Levels of Memory 1) L1 cache: 64 KB -CPU clock-speed -no latency 2) L2 cache: 1 MB -CPU clock-speed -low latency 3) RAM: GBs -reduced speed (up to 12-24GB/s) -huge latency 4) Network (weakest link) -1 Gbit/sec
109 Particles Require… Memory: 24 GB (32-bit) Instructions per iteration: Log2(109)x109xconst~3x1012ops=3TFlops Time: ~12 sec @ 256 GFlops