1 / 3

IU OREChem Summary Slides

The ORE-CHEM pipeline developed by Indiana University aims to enhance drug-like molecule research by converting various data formats and enabling large-scale job submissions to the TeraGrid using Swarm. Key processes include converting PubChem XML to CML, CML to RDF, and integrating data into a searchable RDF triple store. The pipeline supports high-throughput jobs through parallel job execution and resource management. With a focus on fault tolerance and efficiency, Swarm-Grid facilitates the management of thousands of jobs while prioritizing resource allocation.

elvis
Télécharger la présentation

IU OREChem Summary Slides

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IU OREChem Summary Slides Marlon Pierce, Geoffrey Fox, SashikiranChalla

  2. IU’s ORE-CHEM Pipeline Harvest NIH PubChem for 3D Structures Convert Gaussian Output to CML Convert CML to RDF->ORE-Chem Convert PubChem XML to CML Submit Jobs to TeraGrid with Swarm Insert RDF into RDF Triple Store Goal is to create a public, searchable triple store populated with ORE-CHEM data on drug-like molecules. Convert PubChem XML to CML Convert CML to Gaussian Input Conversions are done with Jumbo/CML tools from Peter Murray Rust’s group at Cambridge. Swarm is a Web service capable of managing 10,000’s of jobs on the TeraGrid. We are developing a Dryad version of the pipeline.

  3. Swarm-Grid • Swarm considers traditional Grid HPC cluster are suitable for the high-throughput jobs. • Parallel jobs (e.g. MPI jobs) • Long running jobs • Resource Ranking Manager • Prioritizes the resources with QBETS, INCA • Fault Manager • Fatal faults • Recoverable faults Swarm-Grid Standard Web Service Interface Request Manager QBETS Web Service Resource Ranking Manager Data Model Manager Fault Manager Hosted by UCSB User A’s Job Board Local RDMBS Job Queue Job Distributor MyProxy Server Grid HPC/Condor pool Resource Connector Hosted by TeraGrid Project Condor(Grid/Vanilla) with Birdbath Grid HPC Clusters Grid HPC Clusters Condor Cluster Grid HPC Clusters Grid HPC Clusters

More Related