1 / 31

NCBI PubChem Workshop

NCBI PubChem Workshop. TOPICS: PubChem What it is and How to Search it Searching with Structures: Structural Similarity & Structure Clustering Analysis BioAssay Database Information: Activity Analysis Structure–Activity Relationships: SAR Analysis. PubChem Webpage:

tienm
Télécharger la présentation

NCBI PubChem Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCBIPubChem Workshop • TOPICS: • PubChem • What it is and How to Search it • Searching with Structures: • Structural Similarity & Structure Clustering Analysis • BioAssay Database Information: • Activity Analysis • Structure–Activity Relationships: • SAR Analysis PubChem Webpage: http://pubchem.ncbi.nlm.nih.gov/ PubChem Workshop Webpage: http://www.ncbi.nlm.nih.gov/Class/PubChem/course.htm September 21, 2007

  2. PubChem Mission The NIH Molecular Libraries Roadmap High-throughput and large-scale screening efforts with small molecules to study the functions of genes, cells and biochemical pathways. This is based on: • Accessible Human Genome Project Data • Combinatorial Library Research & Development (methods, compounds and databases) • High-Throughput Screening Methodologies (robotic technology and informatics) Goal: To develop new ways to explore the functions of genes and signaling pathways in health and disease and facilitate the development of new drugs.

  3. PubChem Mission The Molecular Libraries Roadmap has three components: • Molecular Libraries Small Molecule Repository & Screening Center Network: The MLSMR was developed to be a central molecule repository containing substances that will be analyzed via high-throughput screening assays by a consortium of small-molecule screening centers (MLSCN). • Technology Development: With new assay R&D applied by the MLSCN, the ultimate goal is to catalog a comprehensive set of small molecule modulators of the genes and functions of humans and other organisms. • Cheminformatics:NCBI’s PubChem Database was developed as a new and comprehensive database of chemical structures and their biological activities, as well as added links to other NCBI biological databases and the PubMed scientific literature database.

  4. The National Center for Biotechnology Information Bethesda Created in 1988 as a part of the National Library of Medicine at NIH • Establish public databases • Research in computational biology • Develop software tools for sequence analysis • Disseminate biomedical information

  5. http://www.ncbi.nlm.nih.gov Text Entrez Sequence BLAST Protein Structure Small Molecule Structure PubChem Structure Search VAST

  6. http://pubchem.ncbi.nlm.nih.gov http://pubchem.ncbi.nlm.nih.gov/search/ Data Analysis Tools:Differential display of data via structure clustering, structure-activity heat maps and customizable result retrieval tables. http://pubchem.ncbi.nlm.nih.gov/assay/assaycluster.cgi

  7. PubChem Databases • Composed of discrete compounds with known chemical structure. • Summary reports about the known chemical compounds described in PubChem Substance. • Addition of Automated Links which are combined from information provided on PubChem Substance & BioAssay records. Derivative Database: information is provided, updated and “owned” by NCBI. • Composed of Substances which may be of known or unknown composition and also may contain a discrete compound or mixtures of compounds. • Submitters add “Hard” links to PubChem BioAssay records and outside sources. Hydrolyzed feathers Diphenhydramine Citrate Aspartame • Composed of Experimental data with Background, Protocols and Results for bioactivity screens of chemical substances described in PubChem Substance • Submitters add “Hard” links to PubChem Substance records and outside sources. Cell-line Growth Effectors HIV Infectivity Pyruvate Kinase Estrogen Receptor Binding Primary Databases: information is provided, updated and “owned” by Submitters.

  8. PubChem Overview PubChem Substance:19,599,664 substances Molecular Libraries Screening Center Network & NIH Substance Repository Theoretical Properties Zinc, ChemDB, NIST Biological Properties ChemBankProtein 3D Structures MMDB Toxicology ChemIDplus Imaging Agents MICAD, MOLI Journal Publishers Thompson Pharma Metabolic Pathways BIND, Biocyc,(KEGG) Substance Vendors Sigma-Aldrich Chemical Reactions Boston University Center for Chemical Methodology and Library Development (CMLD-BU) PubChem BioAssay: 604 assays of 1,709,112 substances Molecular Libraries Screening Center Network: NIH Chemical Genomics Center Columbia DTP/NCI Emory Scripps Southern Research Institute Structural Genomics Consortium University of New Mexico Penn Center for Molecular Discovery University of Pittsburgh San Diego Center for Chemical Genomics Vanderbilt Screening Center for GPCRs, Ion Channels and Transporters PubChem Compound: 10,938,543 compounds 545,627 have been assayed

  9. Structure of BioAssay Records

  10. Structure of a Substance Record

  11. NCBI’s PubChem Compounds What we do: Standardization of Structures Chemical Data Verification Atom description (label, element) Functional group clean-up Atom valence verification to prevent non-sense structures “Normalize” and “Standardize” Valence-Bond canonicalize (for Tautomer invariance) Aromaticity detection and self- consistency Stereochemistry detection Explicit hydrogen assignment Structural Representation 2D Coordinate generation Images created Calculation and Addition of Information Nomenclature IUPAC SMILES & SMARTS InChI Structural Information Calculate & store “Fingerprints” Calculate & link to similar structures (90% level) Physical Properties Molecular Formula Molecular Weight Number of H-bonds donor/acceptor sites XLogP value Lipinski value (bioavailability) Number of Rotatable bonds Links to NCBI Database Records Structures (MMDB records) Protein sequences (from Structure links) Genes (from Protein links) Links to MeSH Termsthrough IUPAC name MeSH is NLM’s controlled vocabulary used for indexing articles for MEDLINE/PubMed. MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts. Then we “Index” the records!

  12. Structural, Stereochemical & NomenclatureStandardization Substance as Deposited Validated & Standardized Compound

  13. Structural, Stereochemical & NomenclatureStandardization CID: 5757 IUPAC: (8S,9S,13S,14S,17S)- 13-methyl-6,7,8,9,11,12,14,15,16,17-decahydrocyclopenta[a]phenanthrene-3,17-diol SMILES: C[C@]12CC[C@H]3[C@H]([C@@H]1CC[C@@H]2O)CCC4=C3C=CC(=C4)O CID: 11616093 IUPAC: (8R,9S,13S,14S,17S)-2,4,16,16,17-pentadeuterio- 13-methyl-6,7,8,9,11,12,14,15- octahydrocyclopenta[a]phenanthrene-3,17-diol SMILES: [2H]C1=CC2=C(CC[C@@H]3[C@@H]2CC[C@]4([C@H]3CC([C@]4([2H])O)([2H]) [2H])C)C(=C1O)[2H] CID: 450 IUPAC: 13-methyl-6,7,8,9,11,12,14,15,16,17- decahydrocyclopenta[a]phenanthrene-3,17-diol SMILES: CC12CCC3C(C1CCC2O)CCC4=C3C=CC(=C4)O Compound

  14. NCBI’s PubChem Compounds What we do: Standardization of Structures Chemical Data Verification Atom description (label, element) Functional group clean-up Atom valence verification to prevent non-sense structures “Normalize” and “Standardize” Valence-Bond canonicalize (for Tautomer invariance) Aromaticity detection and self- consistency Stereochemistry detection Explicit hydrogen assignment Structural Representation 2D Coordinate generation Images created Calculation and Addition of Information Nomenclature IUPAC SMILES & SMARTS InChI Structural Information Calculate & store “Fingerprints” Calculate & link to similar structures (90% level) Physical Properties Molecular Formula Molecular Weight Number of H-bonds donor/acceptor sites XLogP value Lipinski value (bioavailability) Number of Rotatable bonds Links to NCBI Database Records Structures (MMDB records) Protein sequences (from Structure links) Genes (from Protein links) Links to MeSH Termsthrough IUPAC name MeSH is NLM’s controlled vocabulary used for indexing articles for MEDLINE/PubMed. MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts. Then we “Index” the records!

  15. Examples of Linking in Entrez PubMed Protein Sequences PubChem Substance PubChem BioAssay 3-D Structure PubChem Compound Term frequency statistics Structure Similarity Activity Profile Structure Similarity Structure Similarity (VAST) Structure Sequence Similarity (BLASTp)

  16. Structure of Substance & Compound Records Emory U MLSC NCGC DTP/NCI MLSMR SMID PDSP NIAID EPA DSSTox ChEBI MMDB ChemIDplus ChemBank ZINC BIND LipidMAPS Prous Science Drugs of the Future Nature Chemical Biology CambridgeSoft Corporation KEGG Thomson Pharma DiscoveryGate SyndiolVagifemAltradEvorelDihydromenformoncis-EstradiolDihydrofolliculinDermestrilOvocyclineClimadermCompudoseEstradermEstrogelMenorestNordicolOvocylin OestergonOestradiol OvahormonOvasterolOvastevolOvocyclin PerlatanolProfoliolTrocosoneAerodiolCorpagenGynergonPrimofolBardiolFemogenLamdiol EstrifamEstroclimEstrogen ExtrasorbFemanestFemestrolFempatchGelestraGinediscGynestrelMicrodiolOestrogelProgynon-DHEstrevaFemtranZesteen ……. estrolevex D-Estradiol17beta-OestradiolPolyestradiol SK-EstrogensTheelin, dihydro-AmnestrogenCombipatchOestradiolumOestrogynalOvociclinaActivellaEstraderm TTS EstradotEstrasorb TradeliaClimaraDivigelEstringOesclimVivelleZerellaZumenonEncoreMenestSystenProgynon DHAloraDihydroxyoestrinestradiol-17beta D-Oestradiol

  17. Direct Links to Additional Information Emory U MLSC NCGC DTP/NCI MLSMR SMID PDSP NIAID EPA DSSTox ChEBI MMDB ChemIDplus ChemBank ZINC BIND LipidMAPS Prous Science Drugs of the Future Nature Chemical Biology CambridgeSoft Corporation KEGG Thomson Pharma DiscoveryGate

  18. Entrez Indexing & Searching Searching Hints Field Delimiters Limits Preview/Index History Record Structures Downloading data Wildcards: estrogen[synonym] vs. estrogen*[synonym] Ranges: 120:250[MW] Complex queries: chemidplus[SourceName] AND 300:500[MW] NOT (ca[Element] OR fe[Element]) term1 term2 term1[limit]OPterm2[limit]OP…… where: limit= Entrez indexing field in square brackets OP= AND, OR, NOT (Boolean operators must be in all CAPS!)

  19. Setting Limits Compound & Substance BioAssay

  20. Preview/Index: Finding & Choosing Terms

  21. History: Customized, Stored Databases —Entrez keeps track of all of your searches.— You can “concatenate” searches with ANDs, ORs, NOTs. Or you can purposefully store searches for later use, for example: A Compound or Substance A set of Compounds or Substances: BioAssay tab link (those in the search that were assayed) MeSH term structures link (those with an assigned function) Similar Structures link A BioAssay A set of BioAssays: similar BioAssays BioAssays that tested the same compound

  22. Entrez DocSum Pages

  23. Direct Links to Additional Information

  24. New Link to MeSH/PubMed Searches

  25. Downloading BioAssay Records Downloading the data

  26. Downloading Substance & Compound Records Emory U MLSC NCGC DTP/NCI MLSMR SMID PDSP NIAID EPA DSSTox ChEBI MMDB ChemIDplus ChemBank ZINC BIND LipidMAPS Prous Science Drugs of the Future Nature Chemical Biology CambridgeSoft Corporation KEGG Thomson Pharma DiscoveryGate Downloading the data:

  27. Downloading Bulk Data Searching Hints Record Structures Downloading data from Entrez DocSum Pages Estradiol[synonym]

  28. Accessing the Data in Bulk

  29. Help for Programmers • NCBI Toolbox:In-house source code useful for incorporating • NCBI-like functionality into their programs. • Three main parts:Data Model, Data Encoding and Programming Libraries. • Examples:BLAST, Cn3D, Sequin, Data format conversion scripts http://www.ncbi.nlm.nih.gov/IEB/ToolBox/index.cgi • E-Utilities:Guidelines for Entrez “URL calls” used to access data. • Designed for use in scripts. • Examples:ESearch, EPost, ESummary, EFetch and ELink http://www.ncbi.nih.gov/entrez/query/static/eutils_help.html Caution:Overuse may result in blocked IPs!

  30. - Guided Example - - Practice Problems - http://www.ncbi.nlm.nih.gov/Class/PubChem/course.html Essentials

  31. Intermission • TOPICS: • PubChem • What it is and How to Search it • Searching with Structures: • Structural Similarity & Structure Clustering Analysis • BioAssay Database Information: • Activity Analysis • Structure–Activity Relationships: • SAR Analysis

More Related