An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines

An FPGA Implementation of theEwald Direct Space and Lennard-JonesCompute Engines By: David Chui Supervisor: Professor P. Chow

Overview • Introduction and Motivation • Background and Previous Work • Hardware Compute Engines • Results and Performance • Conclusions and Future Work

1. Introduction and Motivation

What is Molecular Dynamics (MD) simulation? • Biomolecular simulations • Structure and behavior of biological systems • Uses classical mechanics to model a molecular system • Newtonian equations of motion (F = ma) • Compute forces and integrate acceleration through time to move atoms • A large scale MD system takes years to simulate

Why is this an interesting computational problem?

Motivation • Special-purpose computers for MD simulation have become an interesting application • FPGA technology • Reconfigurable • Low cost for system prototype • Short turn around time and development cycle • Latest technology • Design portability

Objectives • Implement the compute engines on FPGA • Calculate the non-bonded interactions in an MD simulation (Lennard-Jones and Ewald Direct Space) • Explore the hardware resources • Study the trade-off between hardware resources and computational precision • Analyze the hardware pipeline performance • Become the components of a larger project in the future

2. Background and Previous Work

Lennard-Jones Potential • Attraction due to instantaneous dipole of molecules • Pair-wise non-bonded interactions O(N2) • Short range force • Use cut-off radius to reduce computations • Reduced complexity close to O(N)

Lennard-Jones Potential of Argon gas

Electrostatic Potential • Attraction and repulsion due to electrostatic charge of particles (long range force) • Reformulate using Ewald Summation • Decompose to Direct Space and Reciprocal Space • Direct Space computation similar to Lennard-Jones • Direct Space complexity close to O(N)

Ewald Summation - Direct Space

Previous Hardware Developments

Recent work - FPGA based MD simulator Transmogrifier-3 FPGA system • University of Toronto (2003) • Estimated speedup of over 20 times over software with better hardware resources • Fixed-point arithmetic, function table lookup, and interpolation Xilinx Virtex-II Pro XC2VP70 FPGA • Boston University (2005) • Achieved a speedup of over 88 times over software • Fixed-point arithmetic, function table lookup, and interpolation

MD Simulation software - NAMD • Parallel runtime system (Charm++/Converse) • Highly scalable • Largest system simulated has over 300,000 atoms on 1000 processors • Spatial decomposition • Double precision floating point

NAMD - Spatial Decomposition

3. Hardware Compute Engines

Purpose and Design Approach • Implement the functionality of the software compute object • Calculate the non-bonded interactions given the particle information • Fixed-point arithmetic, function table lookup, and interpolation • Pipelined architecture

Compute Engine Block Diagram

Function Lookup Table • The function to be looked up is a function of |r|2 (the separation distance between a pair of atoms) • Block floating point lookup • Partition function based on different precision

Function Lookup Table

Hardware Testing Configuration

4. Results and Performance

Simulation Overview • Software model • Different coordinate precisions and lookup table sizes • Obtain the error compared to computation using double precision

Total Energy Fluctuation

Average Total Energy

Operating Frequency

Latency and Throughput

Hardware Improvement Operating frequency: • Place-and-route constraints • More pipeline stages Throughput: • More hardware resources • Avoid sharing of multipliers

Compared with previous work • Pipelined adders and multipliers • Block floating point memory lookup • Support different types of atoms

5. Conclusions and Future Work

Hardware Precision • A combination of fixed-point arithmetic, function table lookup, and interpolation can achieve high precision • Similar result in RMS energy fluctuation and average energy • Coordinate precision of {7.41} • Table lookup size of 1K • Block floating memory • Data precision maximized • Different types of functions

Hardware Performance • Compute engines operating frequency: • Ewald Direct Space 82.2 MHz • Lennard-Jones 80.0 MHz • Achieving 100 MHz is feasible with newer FPGAs

Future Work • Study different types of MD systems • Simulate computation error with different table lookup sizes and interpolation orders • Hardware usage: storing data in block RAMs instead of external ZBT memory

An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines

An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines

Presentation Transcript

Lab 4: FPGA Implementation

An Implementation Method of the Box Filter on FPGA

Direct Implementation Perspective

Implementation of FSM int o FPGA

FPGA implementation of trapeziodal filters mid presentation

Reliability of the Three Main Engines of Space Shuttle

FPGA Implementation of Lookup Algorithms

FPGA Implementation of Multipliers

EFFICIENT FPGA IMPLEMENTATION OF PWM CORE

An Efficient FPGA Implementation of IEEE 802.16e LDPC Encoder

Internet Database Engines - Selection and Implementation

FPGA Implementation of Lookup Algorithms

Scaled Nucleation in Lennard-Jones System

A Study of Scaled Nucleation in a Model Lennard-Jones System

Effect of the Solid Phase on the Global Phase Diagram of Lennard-Jones Mixtures

An FPGA-Based Adaptive Computing Implementation of Chirp Signal Detection

TUTORIAL no. 2 Monte Carlo simulations: Lennard Jones gas

An FPGA Implementation of the Smooth Particle Mesh Ewald Reciprocal Sum Compute Engine (RSCE)

Nanobubbles in a Lennard-Jones Fluid

DSP Implementation on FPGA

Ewald sphere

Literature Review – An FPGA Implementation of the Simplex Algorithm