Computational Science and Engineering Tsinghua University April 2008 Bebo White firstname.lastname@example.org
“The first great scientific breakthrough of the 21st century – the decoding of the human genome announced in February 2001 – was a triumph of large-scale computational science. When the Department of Energy (DOE) and the National Institutes of Health (NIH) launched the Human Genome Project in 1990, the most powerful computers were 100,000 times slower than today’s high-end machines; private citizens using networks could send data at only 9600 baud; and many geneticists performed their calculations by hand….it was expected to take decades.” ---Report to the President, June 2005, “Computational Science: Ensuring America’s Competitiveness” This validates an additional way of “doing science”
How To “Do Science” The four methods of “doing modern science”
Scientific Method Process (1/2) Research Design Interpret Conduct Collect
Scientific Method Process (2/2) • First described ~400 years ago • Is not constant – evolves as a result of technology • Peer review is a result of print • Repeatability of experiments is a result of peer review and collaboration (societies, not just letters) • Statistical sampling is due to advancements in mathematics • Etc. • The impact of computing is only now being realized
“The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be solvable.” --Paul Dirac, Royal Academy, London, 1929 “It is nice to know the computer understands the problem, but I would like to understand it too.” --Eugene Wigner (when confronted with the computer generated results of a quantum mechanics calculation)
What is Computational Science? (1/5) • Computational science is the integration of computing technology into scientific research • It is the application of computer simulation and other computational methods to the solution of scientific problems and the understanding of scientific phenomenon • Computing becomes a “full partner” in scientific discovery • It is not to be confused with computer science which is the study of topics related to computers and information processing
What is Computational Science? (3/5) Computational science seeks to gain an understanding of scientific processes through the use of mathematical methods on computers Computational Science
What is Computational Science? (4/5) • Used to: • Perform experiments that might be too dangerous to perform in a lab • Perform experiments that happen too quickly or too slowly • Perform experiments that might be too expensive • Perform experiments that are only solvable using computational approaches • Visualize phenomenon in the past, present, or future • Perform “what-if” experiments • Data mine through huge datasets • Etc., etc.
What is Computational Science? (5/5) “Computational Science was built on the vision that computers would represent a virtual laboratory where one could explore new concepts from simulations and comparison of these with experimental data.” ---Geoffrey Fox, Indiana University Analyze - Predict
Data Assimilation Information Simulation Information Technology Datamining Model Reasoning Ideas Computational Science
Computational Science Investigations A Computational science investigation should include • An application - a scientific problem of interest and the components of that problem that we wish to study and/or include. • Algorithm- the numerical/mathematical representation of that problem, including any numerical methods or recipes used to solve the algorithm. • Architecture – a computing platform and software tool(s) used to compute a solution set for the algorithm.
Computational Science Process Simplify Represent Interpret Translate Simulate
The Modeling Process • Modeling is the application of methods to analyze complex real-world problems in order to make predictions about what might happen with various actions • A system exhibits probabilistic or stochastic behavior if an element of chance exists. Otherwise, it exhibits deterministic behavior. A probabilistic or stochastic model exhibits random effects, while a deterministic model does not. • A static model does not consider time, while a dynamic model changes with time. • In a continuous model, time changes continuously, while in a discrete model time changes in incremental steps. (Ref: Shiflet & Shiflet)
Major Approaches to Computational Science Problems • System dynamics models provide global views of major systems that change with time (e.g., equation-based physics problems) • Cellular automaton simulations (finite element) provide local views of individuals affecting individuals. The world under consideration consists of a rectangular grid of cells, and each cell has a state that can change with time according to rules (e.g., visualization of lattice gauge QCD)
Real World Problem Identify Real-World Problem: • Perform background research, focus focus on a workable problem • Conduct investigations (Labs) if if appropriate • Select computational tool • Understand current activity and predict future behavior
Working Model Simplify Working Model: Identify and select factors to describe important aspects of Real World Problem; determine those factors that can be neglected. • State simplifying assumptions • Determine governing principles, physical laws • Identify model variables and inter-relationships
Mathematical Model Represent Mathematical Model: Express the Working Model in mathematical terms; write down mathematical equations or an algorithm whose solution describes the Working Model. In general, the success of a mathematical model depends on how easy it is to use and how accurately it predicts.
Computational Model TranslateComputational Model: • Change Mathematical Model into a form suitable for computational solution. • Computational models include tool-specifics.
Results/Conclusions Simulate Results/Conclusions: Run “Computational Model” to obtain Results; draw Conclusions. • Verify your computer program; use check cases; explore ranges of validity. • Graphs, charts, and other visualization tools are useful in summarizing results and drawing conclusions.
Real World Problem Interpret Conclusions: Compare with Real World Problem behavior. • If model results do not “agree” with physical reality or experimental data, reexamine the Working Model (relax assumptions) and repeat modeling steps. • Often, the modeling process proceeds through several “cycles” until model is “acceptable”
Example – Electron-Gamma Showers (EGS) • To simulate the interaction of particle beams of varying energies on fixed targets of various materials and geometries • To study the resulting particle showers • Simulations based upon known laws of physics and observed interactions (cross sections) between particles • Allows “what-ifs” not possible or feasible in the laboratory
EGS Applications • Materials physics • Radiation/health physics • Radiation medicine • Education • Etc.
Finite Element Method (FEM) • Many problems in engineering and applied science are governed by differential or integral equations • The solutions to these equations would provide an exact, closed-form solution to the particular problem being studied • However, complexities in the geometry, properties and in the boundary conditions that are seen in most real-world problems usually means that an exact solution cannot be obtained or obtained in a reasonable amount of time
Finite Element Method (2/2) • In the FEM, a complex region defining a continuum is discretized into simple geometric shapes called elements • The properties and the governing relationships are assumed over these elements and expressed mathematically in terms of unknown values at specific points in the elements called nodes • An assembly process is used to link the individual elements to the given system. When the effects of loads and boundary conditions are considered, a set of linear or nonlinear algebraic equations is usually obtained • Solution of these equations gives the approximate behavior of the continuum or system
Example – Lattice QCD Simulation (1/3) • In quantum theories such as QCD, particles are represented by fields • To simulate the quark and gluon activities inside matter on a computer, physicists calculate the evolution of the fields on a four-dimensional lattice representing space and time • A typical lattice simulation that approximates a volume containing a proton might use a grid of 24x24x24 points in space evaluated over a sequence of 48 points in time • The values at the intersections of the lattice approximate the local strength of quark fields • The links between the points simulate the rubber bands–the strength of the gluon fields that carry energy and other properties of the strong force through space and time, manipulating the quark fields.
Example – Lattice QCD Simulation (2/3) • At each step in time, the computer recalculates the field strengths at each point and link in space • The algorithm for a single point takes into account the changing fields at the eight nearest-neighbor points, representing the exchange of gluons in three directions of space–up and down; left and right; front and back–and the change of the fields over time–past and future.
Advanced Test Reactor Simulation at INL (Idaho National Laboratory)
Simulation vs. CGI? • http://www.youtube.com/watch?v=_FIKonHQF8Y
Topics in Computational Science and Engineering • High Performance Computing • Data Mining • Simulation • Scientific Visualization • Programming (Traditional and Symbolic Manipulation Tools) • Collaboration systems/E-Science • Analysis Packages • Display and text processing systems
Data Mining • Modern science is driven by data analysis like never before. We have an ability to collect and process data that is increasing exponentially! • “…the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner.” • The extraction of useful patterns from data sources, e.g., databases, texts, web, image.
Sequential pattern mining: A sequential rule: A B, says that event A will be immediately followed by event B with a certain confidence • Deviation/anomaly/exception detection: discovering the most significant changes in data • Data visualization: using graphical methods to show patterns in data • High performance computing • Bioinformatics
Why Data Mining • Rapid computerization produces huge amounts of data • How to make best use of data? • A growing realization: knowledge discovered from data can be used for competitive advantage and to increase intelligence
Purposes of Data Mining • Locating phenomenon from spatially, temporally, or logically related factors, each of which is defined at different levels of abstraction • Content based searching and browsing • Feature extraction • Reduction in data volume • Scientific analysis • Searching for anomalies
Data Mining Fields • Data mining is an emerging multi-disciplinary field: Statistics Machine learning Databases Visualization Data warehousing High-performance computing ...
Typical Data Mining Tasks • Classification: • mining patterns that can classify future data into known classes • Association rule mining: • mining any rule of the form X Y, where X and Y are sets of data items • Clustering: • identifying a set of similar groups in the data
Data Mining Define problem Data collection Data preparation Data modelling Interpretation/ Evaluation Implement/ Deploy model
Machine Learning • “…the study of computer algorithms capable of learning to improve their performance on a task on the basis of their own experience.” • Often this is “learning from data”. • A sub-discipline of artificial intelligence, with large overlaps into statistics, pattern recognition, visualization, robotics, control, …
Data Mining Define problem Machine Learning Data collection Data preparation Data modelling Interpretation/ Evaluation Implement/ Deploy model
Patterns (1/2) • Patterns are the relationships and summaries derived through a data mining exercise • Patterns must be: • valid • novel • potentially useful • understandable
Patterns (2/2) • Patterns are used for • prediction or classification • describing the existing data • segmenting the data (e.g., the market) • profiling the data (e.g., your customers) • Detection (e.g., intrusion, fault, anomaly)