1 / 31

and

and. Tools for exploring the biomedical information landscape. Les Grivell EMBO Electronic Information Programme. EAHIL 2004, Santander,. Electronic information programme. Online research information environment for the life sciences.

mura
Télécharger la présentation

and

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. and Tools for exploring the biomedical information landscape Les Grivell EMBO Electronic Information Programme EAHIL 2004, Santander,

  2. Electronic information programme Online research information environment for the life sciences A next generation information service for the life sciences Communities@embo Life Sciences Mobility Portal

  3. But first, let me take you back – not to Altomira, but to the ……early days ofscientific publishing(pre- impact factor)

  4. When libraries were comfortable places that had everything you needed …

  5. and it was possible to keep track of the literature …. (more or less) …

  6. Where are we now? – Publishing is big business • STM publishing is a multi-billion EUR activity(In the UK alone, GBP 22 billion in 2000) • Estimated 164000 scientific periodicals worldwide; around 16% of these are online

  7. – Core science; core journals • PubMed lists some 4600 journals in bio-medical disciplines • As of 19 Sept 2004, 4429 of these are online • The PubMed database provides access to circa 15 million abstracts (but if you can’t be found, you won’t be read …) • The Science Citation Index lists 5876 journals with impact factors ranging from 54.45 – 0.00. (you’ve been found, but are you worth reading? …)

  8. Another information explosion: genomics 35 30 Sequence entries in the EMBL DNA database 25 Base pairs (billions) 20 15 10 Morowitz 5 0 Year 1980 1985 1990 1995 2000 2005

  9. Raw sequences are not the onlyform of digital information

  10. The nice thing about biological information resources is that there are so many ….. • Hundreds of different databases, many in flat-file format • A variety of user interfaces • General lack of interoperability

  11. Micro-array chip Discover relationships Database lookup Wouldn’t it be nice to …… find all published literature references for a large set of gene symbols and explore their relationships? Co-regulated genes Find literature

  12. This is not really such a novel idea ….

  13. I don’t want there to be endlesssearching in the library! It is at the expense of nerves and these should not be wasted on such stupidities…. Fritz Saxl (1890– 1948) ‘Ich will nicht, dass in der Bibliothekewig gesucht wird! Dieses Suchenkostet Nerven und die dürfen nichtverschwendet werden an solcheDummheiten... Aby Warburg (1866– 1929)

  14. Saxl & Warburg:Mnemosyne Atlas

  15. Biosis Some text search engines Bibliographic databases Full text / web-pages

  16. Pubmed Text-based! Search only title, authors, abstract Boolean keyword search (AND / OR) Search language is English No ranking on relevance to query! No direct linkage to other datasets All documents stored and indexed in one location

  17. main features • Ability to interconnect literature articles with different types of molecular data, including images • Ability to search through and retrieve journal articles and other full text documents, even when in different physical locations • Ability to support multi-lingual documents and queries • Services free to the academic community A discovery tool Features implemented via conceptual fingerprinting

  18. Fingerprint database Full text document Index and link index terms to (multi-lingual) thesauri • 1 conceptual fingerprint (CFP) = 400 bytes • Abstraction: 250.000 pages/PC/day • Matching: 500.000 CFP’s: 40 millisec. conceptual fingerprints

  19. prototypes • Initial prototypes in September 2002 and July 2003 • Current prototype online since 1st March 2004 • Next launch due mid-October 2004

  20. Content selection: abstracts + full text Choose search focus Full text query in English, French or German. Is fingerprinted for search E-BioSci

  21. … and now a word about 8 partners ( DE, ES, FR,UK) (Platform) 13 partners (ES, FR, IT, NL, UK) (Research project)

  22. Oriel’s aims

  23. www.bioimage.org (Dr David Shotton, Univ. Oxford) Wouldn’t it be nice to be able to navigate from an image to literature and molecular databases?

  24. Gene symbol identification in text Text containing symbols

  25. PEO1 GUCY2C TYRO3 CD44 Improved literature – molecular dataset linkage Twinkle, twinkle, little star,How I wonder what you are.Up above the world so high,Like a diamond in the sky.Twinkle, twinkle, little star,How I wonder what you are

  26. Problems in gene symbol recognition • Many gene symbols are indistinguishable from everyday words or abbreviations • Synonyms • Homonyms • Homonym synonyms (ELK1 = SAP1; CAR1 = SAP1; BD-2 = SAP1; RIP1_SAPOF = SAP1)

  27. gene FRDA protein depletion disease frataxin Yah1p required activates Word-“processing”

  28. Natural language processing

  29. Protein interaction networks ataxia Yfh1 requires regulates Ssc1 Isu1 interacts activates Oct1

  30. Hoffman & Valencia (Madrid)

  31. http://www.e-biosci.org http://www.oriel.org http://www.bioimage.org http://www.pdg.cnb.uam.es/UniPub/iHOP/ Some web-addresses

More Related