1 / 35

Joint EBI-Wellcome Trust

Joint EBI-Wellcome Trust. Summer School 14-18 June 2010. Concepts, historical milestones & the central place of bioinformatics in modern biology: a European perspective.

sian
Télécharger la présentation

Joint EBI-Wellcome Trust

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Joint EBI-Wellcome Trust Summer School 14-18 June 2010

  2. Concepts, historical milestones & the central place of bioinformatics in modern biology: a European perspective Teresa K.Attwood University of Manchester

  3. Concepts, historical milestones & the central place of bioinformatics in modern biology: a personal perspective from a European Teresa K.Attwood University of Manchester

  4. Concepts, historical milestones & the central place of bioinformatics in modern biology: a personal perspective from a European Teresa K.Attwood University of Manchester

  5. Overview • Where the concept of bioinformatics originated • Some key milestones & key people • Its place in ‘the new biology’ Teresa K.Attwood University of Manchester

  6. Disclaimer • Bear in mind that this is a personal view • That it’s hard • to step out of a situation & look back in • & remain objective • to separate the European & American histories • Observers from different perspectives will see & tell the story differently! • So this is just my perspective… • & it’s bound up with sequences & dbs Teresa K.Attwood University of Manchester

  7. Origin of bioinformatics • The origins of bioinformatics are rooted in sequence analysis • And driven by the desire to • collect them • annotate them • & analyse them • systematically (i.e., using computers)! The concept ‘bioinformatics’ was barely known pre 1990… Teresa K.Attwood University of Manchester

  8. Key milestones ARPAnet insulin ribonuclease Dayhoff Atlas Teresa K.Attwood University of Manchester

  9. Margaret Dayhoff1925-1983 • Pioneered development of computer methods to compare protein sequences • & to derive evolutionary histories from alignments • Particularly interested in deducing evolutionary connections from sequence evidence Teresa K.Attwood University of Manchester

  10. Margaret Dayhoff • Collected all the known protein sequences • made them available to the scientific community • In 1965, she compiled a book • the 1st Atlas of Protein Sequence and Structure Teresa K.Attwood University of Manchester

  11. Margaret Dayhoff Teresa K.Attwood University of Manchester

  12. Key milestones 7 structures 65 sequences ARPAnet Internet email insulin DNA sequencing ribonuclease Dayhoff Atlas Auto DNA sequencing Auto protein sequencers PDB Teresa K.Attwood University of Manchester

  13. Data overload in the USA Teresa K.Attwood University of Manchester

  14. Data overload in the USA Teresa K.Attwood University of Manchester

  15. Data overload in Europe • The data overload problem had also been noticed in Europe • The solution was to create the 1st nucleotide sequence database • this was the EMBL databank • this preceded the 1st release of GenBank by ~6 months Teresa K.Attwood University of Manchester

  16. Key milestones 7 structures 65 sequences 859 sequences 568 sequences ARPAnet Internet email insulin DNA sequencing ribonuclease Dayhoff Atlas Auto DNA sequencing Auto protein sequencers PIR-PSD EMBL, GenBank PDB Teresa K.Attwood University of Manchester

  17. Enter Amos Bairoch • A crazy postgrad student in Switzerland • interested in space exploration & the search for ET life • His project was to develop software to analyse protein & nucleotide sequences • PC/Gene Teresa K.Attwood University of Manchester

  18. Amos Bairoch • He published his 1st paper in 1982 • A letter to the BJ suggesting the use of checksums to “facilitate the detection of typographical & keyboard errors” • a true computer nerd! Teresa K.Attwood University of Manchester

  19. Amos Bairoch • Why did he do this? • In the process of developing PC/Gene, he typed in >1,000 protein sequences • some from the literature, most from the Atlas • by 1981, this was a large book & several supplements, & listed 1,660 proteins • it was not then available electronically Teresa K.Attwood University of Manchester

  20. Amos Bairoch • In 1983, he acquired a computer tape of the EMBL databank • this was version 2, with 811 sequences • In 1984, he received the 1st available computer tape copy of the Atlas • (which quickly became the PIR-PSD) • but he was deeply unhappy with the PIR format Teresa K.Attwood University of Manchester

  21. Amos Bairoch • So he decided to convert the PIR database into the semi-structured format of EMBL • part manually & part automatically • the result was PIR+ • it was distributed as part of PC/Gene (now commercial) • In summer 1986, he decided to release the database independently of PC/Gene • so that it would be available to all, free of charge Teresa K.Attwood University of Manchester

  22. Amos Bairoch • The new database was called Swiss-Prot • The 1st release was made on 21 July 1986 • the exact number of entries is unknown, as he can’t find the original floppy disks! Teresa K.Attwood University of Manchester

  23. Key milestones 30 entries 58 entries ~3,900 sequences 7 structures 859 sequences 65 sequences 568 sequences ARPAnet Internet email insulin DNA sequencing ribonuclease Dayhoff Atlas Auto DNA sequencing Auto protein sequencers DDBJ, Swiss-Prot PRINTS PROSITE PIR EMBL, GenBank PDB Teresa K.Attwood University of Manchester

  24. Global data overload • The number of sequences was growing • The number of structures was growing • So was the number of protein family signatures • Two extraordinary developments had yet to take place • what were they? Teresa K.Attwood University of Manchester

  25. Key milestones ~3,900 sequences 58 entries 30 entries 7 structures 859 sequences 65 sequences 568 sequences ARPAnet Internet www email insulin DNA sequencing ribonuclease Dayhoff Atlas Auto DNA sequencing Auto protein sequencers DDBJ, Swiss-Prot PRINTS PROSITE FlyBase PIR EMBL, GenBank PDB Teresa K.Attwood University of Manchester

  26. Key milestones 2,423entries ~3,900 sequences 70,000 sequences 58 entries 30 entries 7 structures 859 sequences 65 sequences 568 sequences ARPAnet Internet www email insulin DNA sequencing C.elegans genome H.sapiens genome ribonuclease Dayhoff Atlas HT DNA sequencing S.cerevisae genome M.jannachii genome H.influenzae genome Auto DNA sequencing Auto protein sequencers D.Melanogaster genome DDBJ, Swiss-Prot PRINTS FlyBase PROSITE Pfam InterPro TrEMBL PIR EMBL, GenBank PDB Teresa K.Attwood University of Manchester

  27. Original InterPro partners Prosite ProDom PRINTS ProDom InterPro Profiles Pfam Teresa K.Attwood University of Manchester

  28. What is InterPro? “InterPro is an integrated documentation resource for protein families, domains & sites. By uniting databasesthat use different methodologies & a varying degree of biological information, InterPro capitalises on their individual strengths, producing a powerful integrated database & diagnostic tool.” Teresa K.Attwood University of Manchester

  29. The vision? • Naïvely, we wanted to make life easier! • We aimed to • simplify & rationalise protein family analysis • centralise & streamline the annotation process • & reduce manual annotation burdens • &, in the wake of all the genome projects, to facilitate automatic functional annotation of uncharacterised proteins In fact (& now with 11 partners) we made life a lot harder! But that’s another story… Teresa K.Attwood University of Manchester

  30. Key milestones 2,423entries ~3,900 sequences 70,000 sequences 58 entries 30 entries 7 structures 859 sequences 65 sequences 568 sequences ARPAnet Internet www email insulin DNA sequencing C.elegans genome H.sapiens genome ribonuclease Dayhoff Atlas HT DNA sequencing S.cerevisae genome M.jannachii genome H.influenzae genome Auto DNA sequencing Auto protein sequencers D.Melanogaster genome DDBJ, Swiss-Prot PRINTS FlyBase PROSITE Pfam InterPro TrEMBL PIR EMBL, GenBank PDB Teresa K.Attwood University of Manchester

  31. Key milestones 2,423entries ~3,900 sequences 70,000 sequences 58 entries 30 entries 7 structures 859 sequences 65 sequences 568 sequences ARPAnet Internet www email insulin DNA sequencing C.elegans genome H.sapiens genome ribonuclease Dayhoff Atlas HT DNA sequencing S.cerevisae genome M.jannachii genome H.influenzae genome Auto DNA sequencing Auto protein sequencers D.Melanogaster genome UniProt DDBJ, Swiss-Prot PRINTS FlyBase PROSITE Pfam InterPro TrEMBL PIR EMBL, GenBank PDB Teresa K.Attwood University of Manchester

  32. 185,231,366 sequences 517,100 sequences 10,867,798 sequences Key milestones 2,423entries ~3,900 sequences 70,000 sequences 58 entries 30 entries 7 structures 859 sequences 65 sequences 568 sequences ARPAnet Internet www email insulin DNA sequencing C.elegans genome H.sapiens genome ribonuclease Dayhoff Atlas HT DNA sequencing S.cerevisae genome M.jannachii genome H.influenzae genome Auto DNA sequencing Auto protein sequencers D.Melanogaster genome DDBJ, Swiss-Prot PRINTS FlyBase UniProt ENA PROSITE Pfam InterPro TrEMBL PIR EMBL, GenBank PDB Teresa K.Attwood University of Manchester

  33. 185,231,366 sequences 517,100 sequences 10,867,798 sequences Key milestones 2,423entries ~3,900 sequences 70,000 sequences 58 entries 30 entries 7 structures 859 sequences 65 sequences 568 sequences billions more ARPAnet Internet www email insulin hundreds more DNA sequencing C.elegans genome H.sapiens genome ribonuclease Dayhoff Atlas HT DNA sequencing S.cerevisae genome M.jannachii genome H.influenzae genome Auto DNA sequencing Auto protein sequencers D.Melanogaster genome DDBJ, Swiss-Prot PRINTS FlyBase UniProt ENA PROSITE Pfam InterPro TrEMBL PIR EMBL, GenBank hundreds more PDB Teresa K.Attwood University of Manchester

  34. The central place of bioinformatics in modern biology • Hopefully, this potted history speaks for itself • In the last 30 years, bioinformatics has given us • the first ‘complete’ catalogues of DNA & protein sequences • including genomes & proteomes of organisms across the Tree of Life • software to analyse biological data on an unprecedented scale • & hence tools to help understand • more about evolutionary processes in general • our place on the Tree of Life in particular • &, ultimately, more about health & disease • It isn’t a panacea, but its contribution has been huge Teresa K.Attwood University of Manchester

  35. Recommended reading A.B.Richon. A short history of bioinformatics (http://www.netsci.org/Science/Bioinform/feature06.html) A.Bairoch (2000) Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times. Bioinformatics, 16(1), 48-64. M.Ashburner (2006) Won for all – How the Drosophila genome was sequenced. Cold Spring Harbor Laboratory Press. B.J.Strasser (2008) GenBank – Natural history in the 21st century? Science, 322, 537-538. Teresa K.Attwood University of Manchester

More Related