1 / 41

Software and Hardware Implementation of Cellular Automata for Structural Analysis and Design

Software and Hardware Implementation of Cellular Automata for Structural Analysis and Design. Zafer Gürdal * & Mark T. Jones ** Virginia Tech * Depts. of Aerospace and Ocean Eng., & Engineering Science and Mechanics ** The Bradley Department of Electrical and Computer Engineering

hastin
Télécharger la présentation

Software and Hardware Implementation of Cellular Automata for Structural Analysis and Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software and Hardware Implementation of Cellular Automata for Structural Analysis and Design Zafer Gürdal* & Mark T. Jones** Virginia Tech * Depts. of Aerospace and Ocean Eng., & Engineering Science and Mechanics ** The Bradley Department of Electrical and Computer Engineering 06/17/03 National Institute of Aerospace, Hampton VA Support • NASA LaRC, NRA 98, Innovative Algorithms for Aerospace Engineering Analysis and Optimization, PM: Jarek Sobieski • NASA LaRC, Mechanics and Durability Branch, PM: Damodar Ambur • Virginia Tech, ASPIRES Program

  2. Outline • Introduction • Evolutionary Design • Elements of Cellular Automata • CA applied to Engineering Design • Truss Domain • Composite Laminate Design • Hardware Implementation • Configurable Computing – FPGAs • CA Implementation Results • Multigrid Acceleration

  3. Evolutionary Design • Mimic natural evolution of biological systems for structural design • Evolutionary design often relies on local optimality/decision making of independent parts • Examples: Reaction wood Bone growth • Cellular Automata: Decomposition of a seemingly complex macro behavior into basic small local problems

  4. Evolutionary Design Individual Designs Species Genetic Algorithms ESO,MMD,CA Local Evolution of Analysis and Design Local Rules for Design, Global Analysis Cellular Automata ESO, MMD Evolutionary Design of Structures

  5. Cellular Automata • Weiner (1946), Ulam (1952), von Neumann (1966) • Automata Networks • Cell Dynamic Scheme • Idealizations of complex natural systems • Flock behavior • Diffusion of gaseous systems • Solidification and crystal growth • Hydrodynamic flow and turbulence • General characteristics • Locality • Vast Parallelism • Simplicity

  6. Elements of Cellular Automata • Cell Definitions • Lattice Configurations • Neighborhoods • Boundaries • Update rules • Iteration Schemes

  7. Rectangular Triangular Hexagonal Elements of Cellular Automata • Definition for state of a cell and update rule time step Center cell cell ID Neighborhood cells • Two-dimensional Lattice Configurations

  8. NN NW NE N NW N NE N E EE WW W W E W E S SW SE SW S SE SS S Neighborhood Definition • Rectangular Neighborhoods von Neumann Moore MvonN • Boundaries • Periodic • Location Specific

  9. N NE NW uC W C E vC d vSE SW S SE uSE Update Rule – 2D Truss Domain Analysis Ground Structure Single Cell • Displacement Update:

  10. Undeformed CA Analysis FEM Analysis Applied force or displacement Sample Truss Analysis Results • Linear Analysis • Nonlinear Analysis

  11. 2985 iterations Linear analysis Nonlinear analysis total reaction 1641 iterations # of iterations Linear vs. Nonlinear Analysis

  12. 75 kN 100 kN 40 m 60 m Sizing/Design Rules • Local Optimization Formulation • Sequential Move and Size • Fully Stressed Design Dense Truss Solution (CDF = 40) Geometry & Basic Ground Structure CDF = 1

  13. Y x a(x,y) y ¶ W W X Design of Fiber Reinforced Panels • Minimum Compliance Design where (x,y): fiber angle distribution • Minimum Strain Energy Density (Pedersen 1990) Principal Strain Direction

  14. Panel with a Circular Hole in Shear Quarter Panel Model Optimality Criteria (OC) Design

  15. Panel with a Circular Hole in Shear Pattern Matching + OC Design Pattern Matching + Discrete Design

  16. Panel with a Circular Hole in Shear Topology + Orientation Design Topology + Discrete Fiber Orientation

  17. Hardware Integration • Current parallel architectures are limited • Specialized CA machines mimicking CA domains • Domain Modeled === Hardware Domain

  18. Configurable Computing and Field Programmable Gate Arrays (FPGAs)

  19. Definitions and Potential • Configurable computers are a relatively new class of computer architecture in which hardware circuits are (re-)configured for a specific algorithm • Offer “ASIC-like” speeds without the cost of designing and fabricating a chip • ASIC cost can run into many millions • General-purpose CPUs are slow • Configurable computers are often built using FPGAs because of their widespread availability (>>$1B market)

  20. An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip Each CLB contains registers and LUTs, where each LUT can implement a 4-input logic operation By programming the CLBs and interconnections large circuits can be represented in the FPGA One Xilinx XC2V4000 FPGA can represent a circuit up to 1M gates Field Programmable Gate Array (FPGA) Layout

  21. DINI DN3000k10 Board • DINI DN3000k10 is an FPGA based PCI card • Contains five Xilinx XCV4000 FPGAs connected by a 226 bit wide bus • One of the FPGAs has a separate connection for communicating to a PC via the PCI bus • FPGAs can be configured through the PCI bus or configurations can be stored on board

  22. Algorithms for FPGAs • Target FPGA strengths: parallel, pipelined, customized • Goal is to have every part of the chip actively computing at the highest possible clock speed • Do: re-think the algorithm to • Expose the natural parallelism • Pipeline time-consuming operations • Examine the precision that is really necessary • Do not: Implement algorithms as you would in software on a traditional computer

  23. Multiplier Options Usage (% CLBs)* *Percentage of CLBs used in a XC2V4000, the XC2C4000 contains 5760 CLBs

  24. Application Performance • HokieGene – Genome Matching Project (2003) • Matching engine executes on one FPGA (XC2V1000) • Performs 200 billion cell updates per second • 1,200 billion operations per second (1.2 TOPS) • BYU - Network Intrusion Detection Systems (2002) • Hardware implementation uses one FPGA (XC2V1000) • Outperformed software version running on P3 – 750MHz: • Up to 400 times more throughput than software version • Up to 1000 times less latency than software version • Xilinx – High Performance DES Encryption (2000) • Implemented on one small FPGA (XCV150) • Maximum throughput 10.75 GB/sec • Outperformed best ASIC implementation • University of Texas at Austin – Target Recognition System (2000) • System built using one FPGA (ORCA 40k) and Myrinet interfacing • Capable of processing 900 templates per second • 2,800 billion operations per second (2.8 TOPS)

  25. Iterative Methods for Linear Systems • Consider Jacobi’s method • D xi+1 = (D-A) xi + b • In software, we would select either single or double precision floating point • On a configurable computer we can select any format in which to store/compute value • Choose the desired precision of the solution • Reconstruct the method for fast computation

  26. Iterative Methods Continued • Re-cast as iterative improvement scheme • ri = b - A xi  Compute in n bits • xi = A–1 ri  Compute in k bits • xi+1 = xi +xi = A–1 ri  Compute in n bits • Use Jacobi to solve for xi in compact, fast k-bit hardware (cost ~ bits2) • Thm: Convergence rate is independent of k • Thm: Optimal choice of k ~ n/(# iterations)1/3

  27. Convergence • Solution Error vs. Number of Iterations • K= 3,6,9 decimal digits • No difference in convergence rate

  28. Performance Advantage • Execution Cost (number of bit operations) vs. the size of the matrix • Compares cost of normal vs. modified algorithm • Convergence for each algorithm is identical

  29. h h Euler Beam Formulation • Cell Neighborhood Control Volume FL FR y F FC ML MC MR wL ,θL wC ,θC wR ,θR x d(x) • Cell Equilibrium

  30. Cellular Automata ModelMultiple Cells per Processing Element

  31. Equilibrium Update • residual Equilibrium Update • error Converged NO YES • correction Design Update • Design Update Converged NO YES End Beam Design

  32. Algorithm Strategy • The limited precision algorithm illustrated for Jacobi’s method earlier is applied to CA • Much smaller, faster circuits for applying CA rule updates in k-bit operations • Built-in 18x18 multipliers compute residual • Built-in high-speed memories provide • Storage for intermediate and permanent quantities • Many customizable word-lengths • Extremely high memory bandwidth

  33. Processing Element

  34. FPGA Performance Cell Updates Per Second (Millions)

  35. CA Performance

  36. y F x latticeh S S S S S E S S S E S S S E E E S S S S S S S E lattice 2h lattice 4h lattice 8h V - cycle W - cycle lattice lattice h h 2h 2h 4h 4h 8h 8h : Equilibrium update to convergence : Restriction (on r) : Prolongation (on e) : Equilibrium updated α times Multigrid Acceleration

  37. latticeh lattice 2h Prolongation

  38. Correction Prolongation • Residual Restriction Prolongation/Restriction Prolongation Operator latticeh lattice 2h where

  39. ~ ~ Design with 3 Cells: Nested Iteration for MG accelerated CA Design with 257 Cells: Design with 65 Cells: Design with 17 Cells: ~ Design with 5 Cells: d(x)

  40. CA Design Performance with Full MG 108 107 106 105 104 Total number of cell updates 103 102 101 100 1 10 100 1000 Number of Cells

  41. Concluding Remarks • Summary • CA paradigm has been demonstrated for various structural systems • CA paradigm matches well with Configurable Computing acceleration • Full Multigrid acceleration for CA improves design convergence • Future Work • Expand the design capabilities in terms of structural details and the types of field problems that can be solved • Tools that will enable engineers to effortlessly use configurable computers for CA applications • Continue to investigate algorithms to improve CA performance

More Related