60 likes | 81 Vues
Discover a workflow using fast supercomputer Big Red to mine chemical compounds from research paper texts and visualize them in 3D graphics. Extract chemical data from PubMed abstracts, convert formats, and generate images for convenient processing.
E N D
CICC Chemical Compound Mining Workflows Jungkee (Jake) Kim Community Grids Laboratory
A Workflow for Big Red Demo I • “Big Red” is one of fastest supercomputers • Mining chemical compounds found on research paper texts and showing them in 3D graphics PubMed Abstracts OSCAR3 SMILES Extraction Converting the format Text files XML files SMILES Molecular & Quantum Mechanics Converting to pictures Generating HTML script SDF files SDF files POV, JPG files CICC Project Meeting
A Workflow for Big Red Demo II Final HTML pages
A Workflow for Big Red Demo III • PubMed abstracts • 555,007 PubMed abstracts of 2005 – 2006 (part) R. Guha • 1,000 abstracts per node distributed (Simple parallelism) • 511 nodes X 1,000 input abstracts used for the demo • OSCAR3 • A Cambridge tool which extracts chemical information from text and produces an XML instance highlighting the chemical information • Used a revised version for convenient batch processing (some incompatibility to ‘BigRed’ architecture) • SMILES extraction • Extracting SMILES elements from OSCAR’s XML output files • Unique SMILES list within a batch CICC Project Meeting
A Workflow for Big Red Demo IV • Generating 3D formats K. Gilbert • Converting from SMILES to SDF format • Molecular Mechanics program: “mengine” (MM engine) • No Quantum Mechanics (QM) in the demo • Converting 3D formats to pictures J. N. Huffman • Persistence of Vision Raytracer (POV-Ray): converting SDF to POV • Another program which converts the POV files to JPEG format • Generating HTML script • Showing those graphic files in an HTML page CICC Project Meeting
Bigger Picture for the Workflow NIH PubMed Database OSCAR Text Analysis Cluster Grouping Toxicity Filtering Docking Initial 3D Structure Calculation High Throughput Screening (HTS) Data Organization and Flagging Molecular Mechanics Calculations NIH PubChem Database Quantum Mechanics Calculations Big Red Demo IU’s Varuna Database POV-Ray Parallel Rendering