1 / 47

Introduction in Bioinformatics

Introduction in Bioinformatics. Dr. Chris Evelo Department of Bioinformatics –BiGCaT Maastricht University. A translational product path: Small Molecules. Drug Design. Choose a protein target? But which one?. Cells are protein factories.

verena
Télécharger la présentation

Introduction in Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction in Bioinformatics Dr. Chris Evelo Department of Bioinformatics –BiGCaT Maastricht University

  2. A translational product path: Small Molecules

  3. Drug Design

  4. Choose a protein target?But which one?

  5. Cells are protein factories Differences in protein production (= gene expression regulation)determine the cell type, its function, its health.

  6. Figure 3-15.The transfer of information from DNA to protein.The transfer proceeds by means of an RNA intermediate called messenger RNA (mRNA). In procaryotic cells the process is simpler than in eucaryotic cells. In eucaryotes the coding regions of the DNA (in the exons,shown in color) are separated by noncoding regions (the introns). As indicated, these introns must be removed by an enzymatically catalyzed RNA-splicing reaction to form the mRNA. Alberts et al. Molecular Biology of the Cell, 3rd edn.

  7. Gene expression regulation

  8. Step 1: transcriptional control Binding of transcription factors determines expression

  9. Signalling, receptors set the transcription factors to work

  10. Two steps… • Find the regulated proteins • Find out how they are regulated

  11. Find the regulated proteins? Different conditions show different levels of gene expression for specific genes

  12. What about the human genome? Copied chromosomal sequences to hard discs. So now you can read it (although I still prefer a good novel) If you are good at it (and care to read it 6 times over) you can even predict genes But even if you are among the best you can’t predict protein structure or function

  13. And this week...Tweets for #cgc2011 Ion Torrent did EHEC in a day, soon can do Human Genome in 2hr (incl sample prep), for $ 5000. Illumina: $4000 for a full human genome. Noblegen: 96 human genomes in 17 hours Complete genomics 55x coverage (covering 98% of genome >10x) for $5000.They did 1500 so far.

  14. Here is the challenge Take a 5 minute break… Think of something useful to do with a human genome. Describe what other info you need to make it work.

  15. About proteins and mRNA Biochemists and physiologists spent over a century describing proteins, their function, structure and sequence (see: UniProt) Molecular biologists used decadesfound huge amounts of expressed mRNA sequences (ESTs)tried to relate them to functionand failed Cluttering up the databases with things like “EST found in very seldom tumor so and so” (could be myoglobin mRNA) (see: Genbank, EMBL)

  16. UniProt a combined database SwissProt (EU) and PIR (US)highly expert curated trEMBL (translated EMBL)automatically translated from RNA

  17. UniGene an historic database Clusters of mRNA (ESTs). Basis for transcript info in RefSeq and ENSEMBL.

  18. Cluster sizes in UniGene This is a gene with 10 ESTs associated; the cluster size is 10

  19. Using the information Take the EST sequences and cluster them to full mRNA sequences (Unigene!) Build the full coding sequences from this (RefSeq and Ensembl) Translate that into hypothetical proteins (UniProt/trEMBL) Check for known proteins (UniProt/SwissProt) Use to find microarray reporter sequences for known and hypothetical proteins BLAST is against the genome to find the location.

  20. Reporter Annotation

  21. DNA sequence useful? Yes, if you know from population genetics or animal experiments about loci (QTLs) important for trades. Your gene might be in such a locus.(check OMIM, RGD) to find regulatory sequences to compare genomes (e.g. tumor and healthy)This weeks oncology conference in the US:“ It is unethical not to sequence a tumor before treatment”

  22. Two steps… • Find the regulated proteins • Find out how they are regulated

  23. Changes in gene expression • Comparison of gene expression shows important pathways and receptors which can be influenced • Different gene expression e.g. • Between healthy and sick conditions • At different stages of disease progression • At different stages of healing • As a response to successful treatment • Between more and less vulnerable individuals

  24. Gene expression DNA  mRNA protein • Changes in mRNA (transcriptomics) • Differential expression libraries • Gene expression microarrays • Changes in protein levels (proteomics) • 2D electrophoresis • antibody arrays • GC-MS and HPLC-MS • Epigenetic changes (e.g DNA methylation) • Changes in regulatory proteins (e.g. ChIP) • Changes in activity

  25. mRNA processing • Genes contain: • Expressed regions (exons) • Non expressed regions (introns) • During gene splicing introns are removed and exons connected • A poly-adenosine (poly-A) tail is added • Complete mRNA’s leave the nucleus • mRNAs are “attacked” by miRNAs

  26. Figure 9-87. Control of the poly-A tail length affects both mRNA stability and mRNA translation. (A) Most translated mRNAs have poly-A tails that exceed a minimum length of about 30 As. The tails on selected mRNAs can be either elongated or rapidly cleaved in the cytosol, and this will have an effect on the translation of these mRNAs. (B) A model proposed to explain the observed stimulation of translation by an increase in poly-A tail length. The large ribosomal subunits, on finishing a protein chain, may be directly recycled from near the 3' end of an mRNA molecule back to the 5' end to start a new protein by special poly-A-binding proteins (red). Alberts et al. Molecular Biology of the Cell, 3rd edn.

  27. mArray

  28. Layout of a microarray experiment • Get the cells • Isolate RNA • Incorporate fluorescent dye • Hybridize • Laser read out • Analyze image

  29. Whennot to usemicroarrays • Expression changes of single known genes (cheaper alternatives) • Visible tissue changes (e.g. inflammation, collagen). Arrays would just be expensive microscopes! Useful at early stages.

  30. Getting the cells Critical aspects • We need a controls (but controls can be pooled) • Cell isolation must be fast (mRNA should be kept) • About 5 µg total RNA needed (with amplification) • Microdissection possible • Tissue changes will result in RNA changes

  31. Understanding Array data • Typical procedure • Annotate the reporters with something useful (UniProt!) • Sort based on fold change • Search for your favorite genes/proteins • Throw away 95% of the array the European Nutrigenomics Organisation

  32. Secondary Analyses • Gene clusteringOrder the genes according to behavior • Pathway and function findingUse pathways and Gene Ontology

  33. the European Nutrigenomics Organisation

  34. Understanding Array data • “Advanced” procedures • Gene clustering or principal component analysis • Get groups of genes with parallel expression patterns • Useful for diagnosis • Not adding much to understanding (unless combined) the European Nutrigenomics Organisation

  35. Functional Mapping Annotation/coupling the European Nutrigenomics Organisation

  36. That was Step 1… • Find the regulated proteins • Find out how they are regulated

  37. Finding the TF binding sites Sequence determines binding of transcription factors the European Nutrigenomics Organisation

  38. TF binding site motifs the European Nutrigenomics Organisation

  39. Conserved GCNF binding site If it is important it should be conserved 390 400 410 420 430 440 human TTGGACCTTGAACTTATGTATCATGTGGAGA-AGAGCCAATTTAACAAACTAGGAAGATG :||||:|||||||:|||::||||:||::| |||||||||:|:|||:|||||:|| rat --AGACCATGAACTTCTGTGCCATGGGGCAACAGAGCCAATGTCACATACTAGAAA---- 360 370 380 390 400 Result of rVista (Transfac Pro) analysis the European Nutrigenomics Organisation

  40. ChIP technology Immunoprecitation of DNA withcrosslinked TF’s.Detect DNA withPCR or arrays the European Nutrigenomics Organisation

  41. SNPs: sequence variations the European Nutrigenomics Organisation

  42. SNP in TF binding site? the European Nutrigenomics Organisation

  43. ClustallW alignment (relevant part shown only), arrow = SNP location: HUMAN CAAGGTTTTTTGGAGGCTT--TTT-GTAAATTGTGA-----TAGGAACTTTGGACCTTG- 395 CHIMP CAAGGTTTTTTGGAGGCTT--TTTTGTAAATTGTGA-----TAG-AACTTTGGACCTTGC 396 RHESUS_MACAQUE CAAGGTTTCTTGGAGGCTT--TGT-GCAAATTGTGA-----TAACCACTTTGGACCTTC- 395 RAT CAAGGTGTTTTG----TTT--TGAAGGGAATT-----------AAAAGAACAGACCATG- 362 MOUSE CAAGGT-TTTTG----TTT--TAAAGGGACTTTTAAATTGTCTAAAATATCAGTAGACC- 379 STICKLEBACK TCACGC--TACG----TTT--CTGAGTAAGCTGT--------CGCTTCTACGGAGTCAAG 277 TETRAODON CGAGGAGTCCCGCTG-TTT--CTTTGTAGCCACTTTAGTACTTTACGGTTGGGGCCAAGC 274 ZEBRAFISH TTATATCATGCATCACTCAAGTTAAATGTGTTTTTGTCATATTACCGATGCTGTTTCAGG 315 * * HUMAN AACTTATGTATC----ATGTGG-AGAAGAGCCAATTTAACAAACTAGGAAGATGAAAAGG 450 CHIMP AACTTATGTATC----ATGTGG-AGAAGAGCCAATTTAACAAACTAGGAAGATGAAAAGG 451 RHESUS_MACAQUE AACTTATGTATCTATCATGTGG-AAAAGAGCCAATTTAGCAAACTAGGAACATGAAAAGG 454 RAT AACTTCTGTGCC----ATGGGGCAACAGAGCCAATGTCACATACTAG------AAAAAGA 412 MOUSE ATCATCTGTGCC----ATGGGG-GACAGAGCCAATTTCA--------------------- 413 STICKLEBACK GCGCTCAGGGTCT--CACTCCCCTTCTCAGCCACTTTATGACTTTGCCTTGGGGGGCCGA 335 TETRAODON CTCCGCGACTCCGCCCCCTGGCCTGCTGGGACATGGGAGA----TGGTTTCTGCCAAGGA 330 ZEBRAFISH GCCTGAAAGAGGGCACAAGGGCTGTTTGGTGTGCTGTATTTCATTATATTT--GAGCTGC 373 ▲ T[AC][TC]GT[AG][CT]C T M Y GT R Y C

  44. Off course that was just part of step 2… • Find the regulated proteins • Find out how they are regulated • We found the transcription factor • We need the whole path up to the receptor • It might help if part of that path itself showed up in gene expression studies.

  45. Still looking for the rest of the path

More Related