1 / 25

Introduction to Bioinformatics

Introduction to Bioinformatics. The Swiss Institute of Bioinformatics. Collaborative structure Lausanne - Geneva Groups at ISREC, Ludwig Institute, CHUV, Unil, HUG, UniGe, and recently UniBas Several roles: research, services, teaching DEA (master degree) in Bioinformatics: 1 year full time.

coy
Télécharger la présentation

Introduction to Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Bioinformatics

  2. The Swiss Institute of Bioinformatics • Collaborative structure Lausanne - Geneva • Groups at ISREC, Ludwig Institute, CHUV, Unil, HUG, UniGe, and recently UniBas • Several roles: research, services, teaching • DEA (master degree) in Bioinformatics: 1 year full time. • EMBnet courses: 2x 1 week per year, to be extended • Pregrade courses in Geneva, Fribourg and Lausanne Universities

  3. Projects at SIB • Databases • SWISS-PROT, PROSITE, EPD, World-2DPAGE, SWISS-MODEL • TrEST, TrGEN (predicted proteins), tromer (transcriptome) • Softwares • Melanie, Deep View, proteomic tools, ESTScan, pftools, Java applets • Services • Web servers ExPASy, EMBnet • Teaching and helpdesk • Research • Mostly sequence and expression analysis, 3D structure, and proteomic

  4. EMBnet organisation • European in 1988, now world-wide spread • 29 country nodes, 9 special nodes. • Role • Training, education • Software development (EMBOSS, SRS) • Computing resources (databases, websites, services) • Helpdesk and technical support • Publications

  5. Swiss node http://www.ch.embnet.org

  6. Other important sites • ExPASy - Expert Protein Analysis System • www.expasy.org • EBI - European Bioinformatics Institute • www.ebi.ac.uk • NCBI - National Center for Biotechnology Information • www.ncbi.nlm.nih.gov • Sanger - The Sanger Institute • www.sanger.ac.uk

  7. Bioinformatics: definition • Every application of computer science to biology • Sequence analysis, images analysis, sample management, population modelling, … • Analysis of data coming from large-scale biological projects • Genomes, transcriptomes, proteomes, metabolomes, etc…

  8. The new biology • Traditional biology • Small team working on a specialized topic • Well defined experiment to answer precise questions • New « high-throughput » biology • Large international teams using cutting edge technology defining the project • Results are given raw to the scientific community without any underlying hypothesis

  9. Example of « high-throughput » • Complete genome sequencing • Large-scale sampling of the transcriptome (EST) • Simultaneous expression analysis of thousands of genes (DNA microarrays, SAGE) • Large-scale sampling of the proteome • Protein-protein analysis large-scale 2-hybrid (yeast, worm) • Large-scale 3D structure production (yeast) • Metabolism modelling • Simulations • Biodiversity

  10. Role of bioinformatics • Control and management of the data • Analysis of primary data e.g. • Base calling from chromatograms • Mass spectra analysis • DNA microarrays images analysis • Statistics • Database storage and access • Results analysis in a biological context

  11. First information: a sequence ? • Nucleotide • RNA (or cDNA) • Genomic (intron-exon) • Complete or incomplete? • mRNA with 5’ and 3’ UTR regions • Entire chromosome • Protein • Pre/Pro or functional protein? • Function prediction • Post-translational modifications? • Holy Grail: 3D structure?

  12. Genomes in numbers • Sizes: • virus: 103 to 105 nt • bacteria: 105 to 107 nt • yeast: 1.35 x 107 nt • mammals: 108 to 1010 nt • plants: 1010 to 1011 nt • Gene number: • virus: 3 to 100 • bacteria: ~ 1000 • yeast: ~ 7000 • mammals: ~ 30’000 • Plants: 30’000-50’000?

  13. Sequencing projects • « small » genomes (<107): bacteria, virus • Many already sequenced (industry excluded) • More than 90 microbial genomes already in the public domain • More to come! (one new every two weeks…) • « large » genomes (107-1010) eucaryotes • 12 finished (S.cerevisiae, S. Pombe, E. cuniculi, C.elegans, D.melanogaster, A. gambiae, D. rerio, F. rubripes, A.thaliana, O. sativa, M. musculus, Homo sapiens) • Many more to come: rat, pig, cow, maize (and other plants), insects, fishes, many pathogenic parasites (Plasmodium…) • EST sequencing • Partial mRNA sequences • ~12x106 sequences in the public domain

  14. centromer exons of a gene locus control region telomer regulatory elements repetitive sequences Human genome • Size: 3 x 109 nt for a haploid genome • Highly repetitive sequences 25%, moderately repetitive sequences 25-30% • Size of a gene: from 900 to >2’000’000 bases (introns included) • Proportion of the genome coding for proteins: 5-7% • Number of chromosomes: 22 autosomal, 1 sexual chromosome • Size of a chromosome: 5 x 107 to 5 x 108 bases

  15. How to sequence the human genome? • Consortium « international » approach: • Generate genetic maps (meiotic recombination) and pseudogenetic maps (chromosome hybrids) for indicator sequences • Generate a physical map based on large clones (BAC or PAC) • Sequence enough large clones to cover the genome • « commercial » approach (Celera): • Generate random libraries of fixed length genomic clones (2kb and 10kb) • Sequence both ends of enough clones to obtain a 10x coverage • Use computer techniques to reconstitute the chromosomal sequences, check with the public project physical map

  16. Sequencing progression

  17. Interpretation of the human draft • Still many gaps and unordered small pieces (except for chr 6, 7, 20, 21, 22, Y) • Even a genomic sequence does not tell you where the genes are encoded. The genome is far from being « decoded » • One must combine genome and transcriptome to have a better idea

  18. The transcriptome • The set of all functional RNAs (tRNA, rRNA, mRNA etc…) that can potentially be transcribed from the genome • The documentation of the localization (cell type) and conditions under which these RNAs are expressed • The documentation of the biological function(s) of each RNA species

  19. Public draft transcriptome • Information about the expression specificity and the function of mRNAs • « full » cDNA sequences of know function • « full » cDNA sequences, but « anonymous » (e.g. KIAA or DKFZ collections) • EST sequences • cDNA libraries derived from many different tissues • Rapid random sequencing of the ends of all clones • ORESTES sequences • Growing set of expression data (microarrays, SAGE etc…) • Increasing evidences for multiple alternative splicing and polyadenylation

  20. Example mapping of ESTs and mRNAs mRNAs ESTs Computer prediction

  21. The proteome • Set of proteins present in a particular cell type under particular conditions • Set of proteins potentially expressed from the genome • Information about the specific expression and function of the proteins

  22. Information on the proteome • Separation of a complex mixture of proteins • 2D PAGE (IEF + SDS PAGE) • Capillary chromatography • Individual characterisation of proteins • Tryptic peptides signature (MS) • Sequencing by chemistry or MS/MS • All post-translational modifications (PTMs) !

  23. Tridimentional structures • Methods to determine structures • X-ray cristallography • NMR • Data format • Atoms coordinates (except H) in a cartesian space • Databases • For proteins and nucleic acids (RSCB, was PDB) • Independent databases for sugars and small organic molecules

  24. Visualisation of the structures • Secondary structure elements • Alpha helices, beta sheets, other • Softwares • Various representations (atoms, bonds, secondary…) • Big choice of commercial and free software (e.g., DeepView)

  25. Sequence information, and so what ? • How to store and organise ? • Databases (next lecture) • How to access, search, compare ? • Pairwise alignments, BLAST (tomorrow) • EST clustering, Multiple Alignments (Wednesday) • Patterns, PSI-BLAST, Profiles and HMMs (Thursday) • Gene prediction (Thursday) • Your problems? • Friday

More Related