410 likes | 601 Vues
The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations. Sandra Gesing Center for Bioinformatics, University of Tübingen sandra.gesing@uni-tuebingen.de 28.04.2010. Outline. Motivation MoSGrid (Molecular Simulation Grid) The MoSGrid portal
E N D
The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen sandra.gesing@uni-tuebingen.de 28.04.2010
Outline • Motivation • MoSGrid (Molecular Simulation Grid) • The MoSGrid portal • Domain specific workflows • MSML (Molecular Simulation Markup Language) • Future work • MoSGrid Portal
Motivation • Numerousapplicationsformolecularsimulationsanddocking, e.g. • Materials science • Structuralbiology • Drug design • Sophisticatedtoolsandalgorithmssupportscientists • High-performancecomputingfacilitiesareavailable • MoSGrid Portal
Motivation • Drawbacks of using molecular simulations and docking • Usability of tools is limited • Complexity of methods • Lack of graphical user interfaces • Complexity of infrastructures • Many end users lack computer science background • ⇒ Need for self-explanatory and intuitive user interfaces • ⇒ A portalformolecularsimulationsanddocking • MoSGrid Portal
Portals • Single point of entry • Possibility to customize views and tools • Store user preferences • No installation of software on the user’s side • No firewall issues • MoSGrid Portal
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa Unifying Diversity Slide copiedfrom: Stuart Owen „Workflows withTaverna“ • MoSGrid Portal
MoSGrid • Molecular Simulation Grid (D-Gridproject) • Goal • Providing users with Grid access to molecular simulation tools and docking tools via a workflow-enabledportal • Implementation of high-performance computing • Workflows • Annotations of results • Data mining • Use of the D-Grid-infrastructure • MoSGrid Portal
MoSGrid Partners • Universitätzu Köln • Eberhard-Karls-UniversitätTübingen • Universität Paderborn • Konrad-Zuse-ZentrumfürInformationstechnik Berlin • TechnischeUniversitätDresden • TechnischeUniversität Dortmund • Bayer Technology Services GmbH, Leverkusen • Origines GmbH, Martinsried • GETLIG&TAR, Falkensee • BioSolveIT, Sankt Augustin • COSMOlogicGmbH&Co. KG, Leverkusen • MoSGrid Portal
MoSGrid in a Nutshell Portal WS-PGRADE Workflow Structure Recipe Result High-levelmiddlewareservicelevel gUSE Cloud File System Gridresources UNICORE 6 XtreemFS Result • MoSGrid Portal
Credential Management • User management based on Liferay features • Community management • Organization management • X.509 user certificates • SAML (Security Assertion Markup Language) • Minimize credential data transfers • Set of maximum hops for trust delegation • Usable for single sign-on infrastructures (e.g., Shibboleth) • MoSGrid Portal
Credential Management • MoSGrid Portal
WS-PGRADE • MoSGrid Portal
WS-PGRADE • MoSGrid Portal
WS-PGRADE • MoSGrid Portal
gUSE Architecture grid User Support Environment User interface WS-PGRADE High-levelmiddlewareservicelayer gUSE Workflow storage Application repository Information system Workflow engine Submitters Logging Gridresources middlewarelayer UNICORE 6 • MoSGrid Portal
gUSE Submitter Interface GridService • actionJobSubmit • actionJobAbort • actionJobOutput • actionJobStatus • actionJobResource JOB1 JOB3 JOB4 JOB2 Workflow engine JOBn Submitter GridService • MoSGrid Portal
gUSE Submitter for UNICORE gUSE UNICORE 6 Resources UNICORE Atomic Services actionJobSubmit JOB1 JOB3 JOB4 JOB2 1 - Security 2 - Registry 3 - Submitjob 5 - Start job Workflow engine JOBn UNICORE submitter (UCC lib) Uspace 4 - Upload data • MoSGrid Portal
ASM (ApplicationSpecific Module) • Library formanaging WS-PGRADE workflows • Listing of users and workflows in the local repository • Import of Workflows in the user space • Upload/download of input and output files • Setting the parameters of a job in a workflow • Submission of workflows • Monitoring of workflows • Deletion of workflows • Usable in portlets und Java tools ⇒ ImplicituseofgUSE submitter • MoSGrid Portal
Distributed Data Management • XtreemFS is an object-basedgridandcloudfilesystem • Abilitytominimizedatatransfer • Low latency, localavailabilitythroughreplication • Grid Security Infrastructure (GSI) support • MoSGrid Portal
Distributed Data Management • XtreemFSintegration • Portlet • UNICORE • GSI support • Data flow • WS-PGRADE • XtreemFS • Frontend nodes • Computenodes • UNICORE mediates datatransfers XtreemFS UNICORE TSI • MoSGrid Portal
Domain Molecular Dynamics • Study and simulation of molecular motion • Provide a molecular dynamics service on multiple levels • Direct upload of job descriptions • Workflows and standard recipes for repeating tasks • Analysis of relevant properties • MoSGrid Portal
Equilibration of Proteins • Proteins from databases (e.g., the Protein Data Bank, PDB) do not necessarily represent a near-native conformation/configuration • For all kind of production runs a minimization and an equilibration is an indispensable prerequisite • Eases the work of experienced users • Lowers the hurdle for novice users • MoSGrid Portal
UseCase: Gromacs_EQ structure (pdb/gro) pdb2gmx structure (pdb) topology (top/itp) editconf genbox box (pdb) EM.mdp (mdp) Solvated (pdb) adj. Top. (top/itp) grompp topol.tpr mdout.mdp • MoSGrid Portal
mdrun xmgrace g_energy ener.edr traj.trr traj.xtc state.cpt md.log SYSTEM_EM.pdb Analysis.jpg grompp topol.tpr mdout.mdp FULL.mdp (mdp) mdrun xmgrace g_energy ener.edr traj.trr traj.xtc state.cpt md.log SYSTEM_EQ. pdb Analysis.jpg • MoSGrid Portal
MD Portlet • MoSGrid Portal
Domain Quantum Chemistry • Study and simulation of molecular electronic behavior relative to their chemical reactivity • Survey - MoSGrid Community • First implementation for Gaussian • Then support for • Turbomole • GAMESS-US • Further relevant QC applications • MoSGrid Portal
Domain Quantum Chemistry Gaussian Jobs • Single input file • Defines molecular geometry and task • Result • Not structured output • Platform dependent checkpoint file • Integrated multi-step job option • Not usable for generalized workflows • MoSGrid Portal
Domain Quantum Chemistry First prototype • Workflow controlled by portlet • Three phases • Pre-processing • Job execution • Post-processing • MoSGrid Portal
Assisted job creation Guiding GUI Most common options available Domain Quantum Chemistry Workflows • Pre-created job description • Upload of Gaussian job description file • Monitoring of jobs • Post-processing and presentation of results • MoSGrid Portal
Domain Quantum Chemistry Preprocessing • Portlet (GUI) supports common options • Automatic generation of job description • Submission of job • MoSGrid Portal
Domain Quantum Chemistry Post-processing • Parsing of result file • Python scripts executed by portlet • Relevant information about molecular properties • Data in CSV-Format saved and accessible • MoSGrid Portal
Domain Docking • CADDSuite(Computer-aided Drug Design) • MoSGrid Portal
Domain Docking • Galaxy available for local ressources in Tübingen • MoSGrid Portal
MolDB • Stores molecules in binary format, which allows for fast export • Automatically creates and stores can. smiles, fingerprints, and functional groups counts for imported molecules • Automatically saves and restores docking-/rescoring-results • DB can be filtered to all stored molecule properties before exporting molecules • Current speed for import/export: ~100 compounds/sec. • MoSGrid Portal
MSML Molecular Simulation Markup Language • Based on CML (Chemical Markup Language) • Common interpretation by humans and computers • Follows the minimum information principle • Description: http://xml-cml.org/convention/dictionary • XSL transformation • Usedforvalidationpurposesvalidator.xml-cml.org • MoSGrid Portal
Future Work • WS-PGRADE • Integration of the UNICORE IDB to offer drop-down boxes of available tools • MD- and QC-Portlet • Adoption to gUSE workflow engine via the ASM libraries • CADDSuite • Export of workflows from Galaxy to WS-PGRADE • MSML • Further development • MoSGrid Portal
Involved Projects SHIWA (SHaring Interoperable Workflows for Large Scale Scientific Simulations on Available DCIs) • EU project • Duration: 01.07.2010 – 30.06.2012 • Tübingen participates via Galaxyworkflowexport CompChem Virtual Organization • EGEE project • Availableressources MoSGridPortal
Future Projects SCI-BUS (SCIentific gateway Based User Support) • EU project • Duration: 01.10.2011 – 30.09.2014 • Pan-European ressources • Tübingen participateswiththeextensionoftheMoSGridportalwith an interactivemoleculeeditoranda semanticsearch MoSGridPortal
Acknowledgements • Oliver Kohlbacher • ÁkosBalaskó • Georg Birkenheuer • Sebastian Breuers • Richard Grunzke • Sonja Herres-Pawlis • Valentina Huber • Miklos Kozlovszky • Jens Krüger • IstvánMárton • Patrick Schäfer • Bernd Schuller • Johannes Schuster • Anna SzikszayFabri • Klaus-Dieter Warzecha • Martin Wewior • MoSGrid Portal