1 / 68

Commercial Vendors & Databases

Commercial Vendors & Databases. Gary Wiggins I590 Spring 2006. Factors in the Current Environment. Interdisciplinary science Consolidation of the Scientific-Technical-Medical (STM) publishing world Different cultures in the chemistry publishing environment compared to that in biology

venetia
Télécharger la présentation

Commercial Vendors & Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Commercial Vendors & Databases Gary Wiggins I590 Spring 2006

  2. Factors in the Current Environment • Interdisciplinary science • Consolidation of the Scientific-Technical-Medical (STM) publishing world • Different cultures in the chemistry publishing environment compared to that in biology • Move to open access journals • Influence of the Web

  3. Huge Size of the Chemical Literature • ~ 50 million chemical substances • ~ 6 million reagents • ~ 7 million published reactions • ~16,000 protein crystal structures • ~250,000 small molecule x-ray structures --Robert Glen and Susan Aldridge (2002)

  4. Vendors and Publishers • Partnership between commercial vendors and abstracting/indexing services (and to some extent with journal publishers) • Most activity in online searching started in the 1970s • Comparatively little change in the vendors’ search systems until relatively recently • Aggregation of databases • Cross-file searching • Command-driven access

  5. Commercial Database Vendors • STN International (http://info.cas.org/stn.html) • SciFinder and SciFinder Scholar (http://www.cas.org/) • ISI Thomson (http://www.isinet.com) • QuestelOrbit (http://www.questel.orbit.com/index.htm) • Merged Markush Service • Dialog (http://www.dialog.com/) • MDL (http://www.mdl.com/) • Scopus (http://www.scopus.com/scopus/home.url) • US National Library of Medicine (http://www.nlm.nih.gov/) • Ovid Technologies (http://www.ovid.com/) • CSA (Cambridge Scientific Abstracts) (http://www.csa.com/) • Chemical Information System (http://www.nisc.com/cis/qcis1.asp) • knovel (http://www.knovel.com/) • Technical Database Services (http://www.tdsonline.com/) • Google Scholar (http://scholar.google.com/)

  6. STN International • Partnership among Chemical Abstracts Service, FIZ Chemie, and the Japan Science and Technology Corporation • Has over 200 STM databases • STN Database Summary Sheets: http://info.cas.org/ONLINE/DBSS/dbsslist.html • Includes some databases also available free through other venues (e.g., Medline, GenBank)

  7. Features in Commercial Systems • Concept of the Basic Index • Default field; in bibliographic databases often limited to keywords from titles, abstracts, and index terms • Special Boolean operators (proximity, adjacency, etc.) • Truncation (wild cards and left-hand or right-hand truncation) • Controlled vocabulary tools (MeSH, CAS’s Index Guide, CA Lexicon) • Classification of the documents • PACS (Physics and Astronomy Classification Scheme) • CA Sections/Subsections • Structure searching (usually range from exact to full substructure search) • Numeric and other data that is searchable • Data analysis tools • Current awareness options

  8. Vocabulary Control in Life Science Databases • “The entities we deal with, such as genes, sequences, and chemical data, and manipulate and analyze in the context of bioinformatics and biomedical research have not always been properly defined. There are no control vocabularies, no standards for much of the data, and no unified way to refer to them.” --Pablo Tamayo, senior computational biologist and manager, cancer genomics informatics, MIT Broad Institute, quoted in Drug Discovery & Development August 2004, 7(8), 52.

  9. Command Language Systems • Allow field-directed seaches • Incorporate sophisticated Boolean relationships • AND, OR, NOT • Adjacency, Proximity, Logical linking to the same field or sub-field of a record • Numbers of intervening words can be specified • User must learn the commands

  10. Main Life Sciences Databases • BIOSIS • Medline and PubMed • Many public domain databases, including: • Protein Data Bank • GenBank • Many others

  11. Main Chemical Databases • Chemical Abstracts • Beilstein/Gmelin • Cambridge Structural Database • Protein Data Bank • Many other relevant databases, including now some large public databases, e.g. • PubChem

  12. CAS DBs: CA File • CA File, a bibliographic database covering journal articles (from ~8000 journals), technical reports, conference proceedings, dissertations, patents and other literature • 1907 to the present (and now even earlier!); full indexing has been added for all records retrospectively • Linked through the Registry Number to compound data • CAplus File, includes CA File data plus e-journals, some preprints, and all articles from ~1500 key chemical journals within one week of receipt

  13. Old References Recently Added to CA Database The boiling-point curve for mixtures of ethyl alcohol and water. Noyes, William A.; Warfel, R. R. Rose Polytechnic Institute, Terre Haute, Journal of the American Chemical Society (1901), 23(7), 463-8. CODEN: JACSAT ISSN: 0002-7863. Journal written in English. CAN 0:1311 AN 1906:1311 CAPLUS (Copyright 2004 ACS on SciFinder (R)) Abstract In the determination with small amounts of alcohol, the readings of the thermometer were taken when the vapors first entered the condenser, as after boiling for a few minutes a relatively large proportion of the alcohol present would be found in the upper layers and in the condenser. The thermometer under these conditions registered about 0.3 higher. An examination of the table and curve revealed that the minimum boiling point is for alcohol of 96% by weight. The curve was steeper on the side toward absolute alcohol. Alcohol of 90.7% had the same boiling point as absolute alcohol.

  14. Relative Contributions of Literature Types to CA Used with the permission of Chemical Abstracts Service (CAS), a division of the American Chemical Society, from:http://www.cas.org/casdb.html

  15. Growth of Articles in CA

  16. Special Fields in the CA File • In addition to the standard bibliographic citation data, have: • Controlled Terms (CT) • Classification Codes (CC: the 80 section codes into which the content of the paper CA is divided: http://www.cas.org/PRINTED/sects.html) • Document type (DT) • Language Code (LA) • Role (RL)

  17. CAS Roles • Used in conjunction with chemical substance searches • Seven super roles, e.g., ANST, BIOL, CMBI, FORM, OCCU, PREP, PROC, RACT, USES • Over 60 more specific role descriptors, e.g., with PREP: • BMF Bioindustrial manufacture • BPN Biosynthetic preparation • BYP Byproduct • Combinatorial preparation • IMF Industrial manufacture • PNU Preparation, unclassified • PUR Purification or recovery • SPN Synthetic preparation • Also two roles not up-posted to super roles: PRP (Properties) and MSC (Miscellaneous)

  18. CAS DBs: Registry File • “Authority” file that lets indexers and searchers definitively identify a substance as new or find a previous entry • Contains all types of chemical substances, including biomolecules • Best file for chemical names • Many physical properties being added • Linked to CA and other files through the Registry Number (RN)

  19. CAS Registry Number • Serves as the accession number in the Registry File • RN has no meaning • Example: Isatin is 91-56-5

  20. Registry File Contents • Includes synonyms, molecular formulas, alloy composition tables, classes for polymers, nucleic acid and protein sequences, ring analysis data, and structure diagrams • Also: experimental and calculated property data from various sources as well as super roles and document type information from CAplus

  21. Registry File Contents • 83,654,922 substances have a RN in the Registry File as of 10/16/2004 • All substances in CAS files plus others • Many physical constants now added to the records, most of them calculated • Lipinski Rule of Five values • BP, MP, Density, Optical Rotatory Power, Refractive Index • Data for 3D visualization

  22. Size of the Registry File

  23. PubChem: A Threat to CAS? • PubChem, part of the NIH Roadmap plan under the Molecular Libraries and Imaging Initiative • Several million compounds already in the database • To be linked to assay data from High Throughput Screening analyses • http://pubchem.ncbi.nlm.nih.gov/

  24. CAS DBs: CASReact • Derived from journal and patent documents from 1840 to date • Contains both single-step and multistep reactions • Structure searchable • Contains yield data, reaction conditions, etc.

  25. CAS Databases: Other • CHEMCATS--information about commercially available chemicals and their worldwide suppliers • CHEMLIST--contains chemical substances on national inventories • MARPAT--more than 500,000 Markush structure records for patents found in the CA File with patent publication year 1988 to the present • TOXCENTER--covers the pharmacological, biochemical, physiological, and toxicological effects of drugs and other chemicals

  26. User-Oriented Software • Front-end systems to mask command language • STN’s SciFinder (&SF Scholar) • STN on the Web, STNEasy, STN Express • CrossFire Commander and MDL DiscoveryGate • Questel-ORBIT’s QWeb and Imagination

  27. SciFinder and SciFinder Scholar • Includes access to the CA, Registry, CHEMCATS, CHEMLIST files, plus Medline (1957-) • Easy structure searching capabilities • Integrated with ChemPort for easy access to the primary literature • Download page for SFS: • http://www.libraries.iub.edu/index.php?pageId=2114

  28. SciFinder Scholar Under the Hood • A. Ben Wagner’s look at what really underlies the apparent simplicity of SFS searches • http://ublib.buffalo.edu/libraries/e-resources/SciFinder/SciFinder200dpi.pdf

  29. Beilstein Database • Covers organic chemistry back to 1771 • Includes many physical properties • Includes reaction information • Structure searchable • Available on the CrossFire Commander system for academic institutions

  30. Gmelin Database • Covers inorganic and organometallic chemistry back to 1771 • Includes many physical and chemical properties • Not searchable for reactions • Accessible through the CrossFire Commander system for academic institutions

  31. MDL’s CrossFire Commander • Download page for Commander at IU: • http://www.libraries.iub.edu/index.php?pageId=2114 • May soon be replaced by Discovery Gate

  32. DiscoveryGate for Academics • CrossFire Beilstein • CrossFire Gmelin • MDL® Available Chemicals Directory • MDL® Screening Compounds Directory • MDL® Reference Library of Synthetic Methodology • MDL® Solid-Phase Organic Reactions • ORGSYN (Organic Syntheses) Database • Encyclopedia of Reagents for Organic Synthesis • Comprehensive Organic Functional Group Transformations • Comprehensive Asymmetric Catalysis • MDL® Comprehensive Medicinal Chemistry • MDL® Drug Data Report • MDL® Metabolite Database • MDL® Toxicity Database • ChemInform Reaction Library • Current Synthetic Methodology • Derwent Journal of Synthetic Methods • National Cancer Institute Database • http://www.mdl.com/solutions/solutions_for/academics/dg_academics.jsp

  33. Reaction Databases • CASReact • SPRESI • http://www.spresi.com/ • Organic Syntheses • Free version: http://chemfinder.cambridgesoft.com/reactions/orgsyn.asp • ISI’s Index Chemicus • e-EROS (Encyclopedia of Reagents for Organic Synthesis) • MDL’s Integrated Major Reference Works • Reactions indexed with InfoChem’s Reaction Classification Code, based on the degree of specificity around the reacting center: • http://www.infochem.de/eng/index.htm

  34. Cross-Product Approaches • MDL/InfoChem’s Integrated Major Reference Works • Thieme’s Science of Synthesis (successor to Houben–Weyl) • Springer’s Comprehensive Asymmetric Synthesis and their Glycoscience • Elsevier Science’s Comprehensive Organic Functional Group Transformations • Wiley’s Encyclopedia of Reagents for Organic Synthesis • Links to primary journal literature.

  35. Physical Property Databases • Beilstein & Gmelin • CRC Handbook (CHEMnetBASE) • Ei ChemVillage • knovel • Perry’s Chemical Engineers’ Handbook • Lange’s Handbook of Chemistry • Landolt-Börnstein

  36. Spectral Databases • Bio-Rad • Aldrich • NIST Chemical WebBook • Some high-quality free databases on the Web, e.g., • SDBS, Spectral Database for Organic Compounds • http://www.aist.go.jp/RIODB/SDBS/menu-e.html

  37. SDBS IR Spectrum for Traumatic Acid

  38. CCDC

  39. Isatin on the CSD

  40. Cambridge Structural Database • Bibliographic, chemical and crystallographic information for: • organic molecules • metal-organic compounds • 3D structures have been determined using: • X-ray diffraction • neutron diffraction • The CSD records results of: • 3D atomic coordinate data for at least all non-H atoms

  41. CSD components • ConQuest: search and information retrieval • Mercury: structure visualization • Vista: numerical analysis • PreQuest: database creation

  42. Accessing the CSD at IUB • Download the Citrix Metaframe client at: • http://www.citrix.com/site/SS/downloads/downloads.asp?dID=2755 • Connect to IUB via VPN and link to: • http://bl-libg-wind.ads.iu.edu/ICAS/conquest.ica • For IUPUI, ask Kelsey Forsythe

  43. Other Structural Databases • Protein Data Bank for polypeptides and polysaccharides having more than 24 units http://www.rcsb.org/pdb/ • Nucleic Acids Database for oligonucleotides http://ndbserver.rutgers.edu/ • Inorganic Crystal Structure Database http://www.fiz-informationsdienste.de/en/DB/icsd/ • CRYSTMET® for metals and alloys http://www.tothcanada.com/

  44. Materials Chemistry Databases • TDS specializes in chemical engineering data. Includes: • American Institute of Chemical Engineers’ DIPPR Pure Component Data • 29 fixed-value properties and 13 temperature-dependent properties for about 1600 industrial chemicals

  45. Patent Databases • Derwent World Patents Index • USPATFULL • PCTFULL (WIPO/PCT Patents Full Text) • INPADOC (INternational PAtent DOcumentation Center) • IFIPAT • CA and CAplus • MDL Patent Chemistry Database

  46. Chemical Information System • 34 environmental databases • Originally developed by the US National Institutes of Health and the Environmental Protection Agency • Covers over 515,000 compounds • Toxicological and/or carcinogenic research data • information on handling hazardous materials • chemical/physical property information • Regulations • safety and health effects information • pharmaceutical data

  47. Hybrid Links to the Web • STN’s eScience • http://www.escience.org/ • Elsevier Science’s Scirus • http://www.scirus.com/srsapp/ • Elsevier Science’s Scopus (includes Scirus) • http://www.info.scopus.com • 14,000 titles going back to the mid 1960s • More than 500,000 records link to the Beilstein database on either CrossFire or DiscoveryGate

  48. Traumatic Acid: SFS  eScience

  49. Electronic Journals • Coverage in some cases back to the 17th century • Most major publishers’ backfiles are now online • CrossRef • DOI • SFX

  50. Shift from Ownership to Licensing of Journals • IUB Chemistry Library e-journals • http://www.indiana.edu/~libchem/ejournals.html • Shift away from ownership • Archival issues • Publisher archives (usually 2-3 locations) • LOCKSS and other proposals • Libraries often have no archival rights

More Related