1 / 26

Management of Computational Chemistry Electronic Structure Data in the U.S.

Management of Computational Chemistry Electronic Structure Data in the U.S. David A. Dixon, Mingyang Chen, Amanda Stott, Shenggang Li Department of Chemistry, The University of Alabama, Tuscaloosa AL 35487-0336. Robert Ramsay Chair Fund. SEURAT Computational Chemistry (Commercial). Pros:

yauvani
Télécharger la présentation

Management of Computational Chemistry Electronic Structure Data in the U.S.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Management of Computational Chemistry Electronic Structure Data in the U.S. David A. Dixon, Mingyang Chen, Amanda Stott, Shenggang Li Department of Chemistry, The University of Alabama, Tuscaloosa AL 35487-0336 Robert Ramsay Chair Fund

  2. SEURAT Computational Chemistry (Commercial) • Pros: • GUI (Visulization) • Multiple file formats support • User-defined Meta-data • Cons: • Not efficient when dealing with large amount of files (manually import file by file) • Commercial http://www.synapticscience.com/seurat/feature/

  3. ChemDataBase (Not in U.S.) • Pros: • - GUI • Cons: • Parsers rely on CDK (Chemistry Development Kit) • No emphasis on database table building Huarong Sun, Ruisheng Zhang et al. 2008 International Multi-symposiums on Computer and Computational Sciences

  4. NIST Computational Chemistry Comparison and Benchmark DataBase (CCCBDB) • Provide a benchmark set of molecules for the evaluation of ab initio computational methods. •  Over 250,000 calculations at different levels. • Thermochemical values: • 1. Enthalpies of formation. • 2. Entropies, heat corrections. • 3. Supporting data, such as geometries, vibrational frequencies, etc. • 4. Additional computed properties such as atomic charges, electric dipole moments, HOMO-LUMO gaps, etc. http://cccbdb.nist.gov/

  5. Carnegie-Mellon Quantum Chemistry Archive (CMQCA) • CMQCA is a collection of compressed results from GAUSSIAN 80 program; archive outputs adopting similar standards are generated in later versions of GAUSSIAN. • Accessed by a 300-baud terminal to the CMU Vax 11/780 through a telephone line. • 2000 Hartree-Fock structures with STO-3G, 3-21G and 6-31G* are listed in 1st version of CMQCA. • Self-reproductivity is required for the data archiving. • CMQCA format: input data followed by the numerical results(energies, dipoles, frequencies, etc.). • Final geometries are stored for a local minimum or saddle point serching. • Prototype for the meta-data format in RCCC

  6. Ampac / Agui from Semichem • Ampac for fast semi-empirical calculations • Fast and reliable • Many methods: AM1, MNDO, MINDO3, PM3, MNDO/d, RM1, PM6, SAM1, MNDOC • Geometry optimization, frequencies, transition state, IRC, solvation, etc. • Agui for molecular visualization • Support most features of Gaussian 09 including periodic systems, ONIOM, etc. • Support many file formats including Mol, Mol2, SDF, PDB, CIF • Support many platforms: Windows, Linux, Mac OS X, etc. Manage Molecular Orbitals 3D Reaction Surface Plot Surface Adsorption

  7. Extensible Computational Chemistry Environment EcceA domain encompassing problem solving environment for computational chemistry including a sophisticated graphical user interface, scientific visualization tools, and an underlying data management framework. The resulting environment enables research scientists to transparently utilize from their desktop workstations complex computational modeling software and high-performance computers. MS3 EMSL Molecular Science Software

  8. Science Drivers: Science across Scales in Space & Time • Catalysis: Computational catalysis – transition metal oxides, homogeneous catalysts, metal clusters, site isolated catalysts • Nanoscience: TiO2 clusters for sensors and photocatalysts; Shape memory alloys (Nitinol) (NASA) • Energy: H2 storage in chemical systems – organic & inorganic • Energy: Advanced Fuel Cycle Initiative – Metal oxide clusters in solution for new fuels and environmental cleanup • Energy: New sources of energy (solar) • Geochemistry: Geological CO2 sequestration • The Environment: Atmosphere, Clean Water, Subsurface & Cleanup • Biochemistry: Peptide and amino acid negative ion chemistry • Computational main group chemistry – fluorine chemistry, acids and bases, other elements • Computational thermodynamics and kinetics – high accuracy, solvation effects. • Chemical End Station: RC3 & software development

  9. Overview of UA Computing Resources Office Clients (Windows / Linux) Storage Server (Samba, NFS, WWW, SQL, etc.) Data backup Dell PowerEdge 2950 & PowerVault MD1000 Intel Xeon 8 cores Memory:16 GB Storage: 13 TB Data access (console/web) Data backup, parsing, & database building Network Traffic Gateway Chinook (EMSL, PNNL) 18,480 CPUs Colonel/Hope/Pople (UA) 348 CPUs UAHPC (UA) 260 CPUs DMC/Altix (ASC) 1,484 CPUs Home Clients Supercomputers

  10. Computing Hardware Resources

  11. Computing Software Resources • Other computational chemistry programs • For quantum chemistry: ACES3, CFour, Columbus, Dalton, GAMESS, Molcas, MPQC, PSI3, etc. • For molecular dynamics: CPMD, Espresso, NAMD, Tinker, ZORI, etc. • Software for program development • Intel C/C++/Fortran compilers, MKL/IPP/TBB libraries; • PGI C/C++/Fortran compilers, ACML libraries

  12. Build a Robust Computational Database • Machine-built database • Database grows as new computational data are generated • Expandable support for new computational output format • Go open source

  13. Computational Chemistry Data Management System: RCCC = RC3 (Regional Computational Chemistry Collaboratory) • Manage and mine the vast amounts of chemistry-specific data that petascale computation will generate. • Perspectives of a user, a group, or a project. • Resides on the remote supercomputer and a local server computer at each registered site. • At supercomputer, data are automatically parsed by a program to extract essential information either during or after job execution. • Data packaged and stored with registration of relevant metadata in a database. • Local server automatically mirrors relevant data. • Database exposes a standard directory hierarchy and file system so that standard tools can be used in to manipulate data. • The main objective is to perform the day-to-day data backup, collect calculation meta-data, and organize them for research uses, so that users could have an easier way to access, present, and reuse their computational results.

  14. RC3 Architecture

  15. Implementation Details • File Mirroring: synchronize new files from computing servers to storage server • account management • scheduled mirroring • SSH via Paramiko (Python module) • Data Extracting: collect meta-data from transferred files. • user, group, filename, location, software version, title, keywords, coordinates, frequencies, energies, dipole, and etc. • user can easily define and expand extracting rules • parsing while transferring • on demand scan • Database: • MySQL • Python Interface • Encryption: • Crypto (Python module) • Geometry visualization: • Jmol • User Interface: • Command Line (Bash shell) • Webpage (Jmol integrated) • Implementation: • Python 2.x

  16. Parsing in RC3

  17. Progress and Status • RC3 Demo tested in our group for more than 1 years • 1 group, 36 users, 94 accounts • Most of our goals achieved • So far, 1.5 Terabytes (1.6 Million pieces of) files backed up and well organized in tree-structured directories • 144,000 data entries generated in the database • Currently supports NWChem, Molpro, Gaussian, and etc. • Parsing various properties

  18. Account Manager Snapshot Enter menu[ [USER]:mchen10 ] now.. ================================================================== [ MENU TITLE ]: [USER]:mchen10 ------------------------------------------------------------------ [ 1 ] --- LIST ALL ACCOUNTS [ 3 ] --- ENTER ACCOUNT MENU [ 5 ] --- ADD AN ACCOUNT [ 6 ] --- DEL AN ACCOUNT [ 8 ] --- SWITCH TO ANOTHER USER [ 9 ] --- CHANGE THE PASSWORD FOR THE CURRENT USER ------------------------------------------------------------------ [ HELP ]: Choose an entry or: (q)quit; (m)print menu. ================================================================== [ [USER]:mchen10 ]Choose an entry(q to quit, m to print the menu) [ [USER]:mchen10 ]Your Input:

  19. Parsing flags

  20. How to search for all the calculated files whose molecules contain C, H, N, O and Ru, print the calculation settings and specific properties (e.g. energy, Gibbs free energy correction, etc.), and sort the results by their energies? • mysql query • mysql > select Formula, Basis, CalcBy, Energy, Gibbs, Thermal, Enthalpy, • FinTime from RC3_MAIN where Formula like "C%H%N%O%Ru%" • and JobStatus like "CalcDone%" Order by Energy; • rc3query (a bash wrapper) • $ rc3query ‘C*H*N*O*Ru*’

  21. Future Work • Publish as open source package • Improve parsing engines and write configuration for more computational codes • Web GUI • Better documentation

  22. Benefits of Problem Solving Environments (PSEs) to Scientists • Integrates the key activities of scientific research, from problem definition, research design, experiment execution, and analysis • Allows scientists to efficiently execute their computational models over a distributed network • Integrates the scientist’s processes, data, and resources into a common working environment • Guides scientists in the research and experimentation process • Allows scientists to share their knowledge and expertise in their specific domains • Reduces barriers to collaboration among scientists who are geographically dispersed

  23. Problem-solving Tools Record Management • Experimental Data Management • Collaborative Record (e.g., lab notebook) • Data & Processing History • Experiment Reproduction Research Support • Literature Searches • Data Repositories • Professional Forums • Research Design (e.g, problem definition, hypotheses, research approach) Workflow management • Interactive steering • Process automation • Decision support • Knowledge/expertise transfer • Experimental design • Collaboration models Computation management • Partitioning & assignments • Status and monitoring • Software history • Data validation

  24. Architecture of a Collaborative PSE C O M B U S T I O N C L I M A T E M A N U F A C T U R I N G C H E M I S T R Y E N G I N E E R I N G   Scientific Domains Problem Solving Computation Management Records Management Decision Support Work Flow Distributed OS Support Distributed Data Management Distributed Messaging Collaborative Technologies Resource Management Registry Service Security Model Execution Remote Access Computational Grid

  25. Basis Set Tool Calculation Launcher Calculation Editor Calculation Manager Calculation Viewer 3-D Builder Data Model and Library Interface Molecular Dynamics Data Model Ecce Data Model Electronic Single, Experiment Structure Remote Chemistry Data Model Job and Job Data Management Model Multiple Computaional Tasks Data Model EMSL Data Model

More Related