1 / 51

A BIOINFORMATIC GENE HUNTING

A BIOINFORMATIC GENE HUNTING. E-learning "Tools and tips for science teachers" http://ariel.ctu.unimi.it/corsi/bioteach/home. Bioinformatics When biology meets informatics. What is bioinformatics? Creation and maintenance of databases to store biological information

Télécharger la présentation

A BIOINFORMATIC GENE HUNTING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A BIOINFORMATIC GENE HUNTING

  2. E-learning"Tools and tips for science teachers"http://ariel.ctu.unimi.it/corsi/bioteach/home

  3. Bioinformatics When biology meets informatics

  4. What is bioinformatics? • Creation and maintenance of databases to store biological information • Development of mathematical and statistical tools for analysis, interpretation and continuous updating of biological information • Development of new tools to assess relationships among members of large data sets in order to obtain a comprehensive picture of normal cellular activities and their alterations • Data sharing

  5. Bioinformatics includes: 1. Databases collecting experimental data generated in research laboratories 2. Software for navigating databases

  6. Where does bioinformatics stem from? Experimental efforts to determine structure and function of biological molecules Human Genome Project Production of large data sets Molecular biology databases (genes and proteins) Interpretation Techniques, tools, algorithms for analysis, comparison, classification, interpretation

  7. The global approach to the study of biological data refers to the possibility for analysis and comparison of: • Genomes ( the whole genetic information of a given organism) • Transcriptomes ( the full set of RNAs of a given organism ) • Proteomes ( the full set of proteins of a given organism)

  8. Applications of bioinformatics analysis MEDICINE PHARMACEUTICS AGRICULTURE

  9. Databases A database is a collection of information. Databases are made of “entries”.

  10. Biological databases • A biological database is a large collection of information and data derived from laboratory studies (in vitro and invivo analysis), from bioinformatics (in silico analysis) and from the scientific literature. • Data are structured so to enable efficient user access and management of different types of information.

  11. Bioinformatics was essential to obtain the complete sequence of the human genome Whole genome shotgun Genomic DNA Random long (5-20 kb) and short (0.4-1.2 kb) fragments derived from mechanical breakage of DNA were cloned in vectors and sequenced. Bidirectional automated sequencing Computerized reconstruction of genomic sequence

  12. Primary and specialized databases Primary databases collect nucleotide sequences (DNA , RNA) or protein sequences containing general information for the retrieval of sequences, and to identify species of origin and function. Specialized databases collect large sets of homogeneous records (taxonomic, functional, literature, etc. etc...), with additional annotations and specific information.

  13. Nucleotide sequence database ---ATGTTGAAGTTCAAGTATGGT--- Amino acid sequence database --MLKFKYG-- 3D structures database Gene expression database Genetic diseases database

  14. AND OR BUT NOT How to extract information from a database By entering a text in a box (like with a search engine, i.e. google) or filling in a given form We can combine different criteria by means of Booleanoperators to intersect (operator AND), add (operator OR) or exclude (operator BUT NOT) information. More Boolean operators are available for more sophisticated searches (IN, NEAR and WITH).

  15. Algorithms in bioinformatics Algorithms to compare sequences: - to assess similarities - to study molecular evolution and phylogenesis Algorithms to predict: - genes - regulatory elements (promoters, etc.) - RNA structures - protein structures

  16. Some important results obtained by bioinformatics: • Search for homologous genes in the same and in different species • Identification of genes and genetic markers • Identification of disease-associated genes • Prediction of three-dimensional structures of proteins • Design of new drugs • Data sharing

  17. Genetic-based differences in the response to drugs Comparing two human genomes, single base differences are found, on average, every 1200-1500 base pairs Each individual is unique A new “omics” discipline: PHARMACOGENOMICS

  18. What is pharmacogenomics for? standard drug reduced dose of drug 1/10 thiopurine Patient with genetic defect

  19. What do you need to know to “surf among the genomes” without being submerged by the waves !!!

  20. metacentric submetacentric acrocentric short arm p satellite centromere long arm q Chromosome structure and classification

  21. Human karyotype and chromosome map

  22. Chromosome banding Karyotype: G banding Karyotype: Q banding

  23. Each chromosome has a specific banding pattern

  24. delezione Basi perse traslocazione inversione Fig.10.2.1 Mutazioni cromosomiche Chromosomes mutations Gene mutations or point mutations • GAC-AAA-GGA-TGA-CTG original sequence • GAC-AAA-CGA-TGA-CTG substitution • GAC-AAA-TGG-ATG-ACT-G insertion • GAC-AA~G-GAT-GAC-TG deletion

  25. Identification of genes and genetic markers

  26. Identification of disorder- associated genes

  27. From gene to protein Exon 1 Exon 2 Exon 3 Exon 4 DNA 5' 3' 3' 5' Intron 1 Intron 2 Intron 3 End transcription Start transcription Transcription preRNA 3’UTR 5’UTR Maturation 5' mRNA 3' Translation protein H2N COOH

  28. Prediction of genes within a genomic region • Internal exons (---exon---gt---intron---ag---exon---) • First exon (5’ UTR sequence) • Last exon (3’ UTR sequence) • Unique exons • Alternative splicing sites • Promoters (TATA e CAATboxes) • Polyadenylation signals (AAUAAA) • start codon ATG • STOPcodon

  29. splicesites

  30. Splicing

  31. Alternative splicing

  32. Alternative splicing

  33. Here is a comprehensive view of what you should find among the genome waves .… enjoy your surfing!!

  34. Finding the Genes Dr. Blat helping a gene find itself.

  35. delezione Basi perse traslocazione inversione Fig.10.2.1 Mutazioni cromosomiche Chromosomes mutations Gene mutations or point mutations • GAC-AAA-GGA-TGA-CTG original sequence • GAC-AAA-CGA-TGA-CTG substitution • GAC-AAA-TGG-ATG-ACT-G insertion • GAC-AA~G-GAT-GAC-TG deletion

  36. Bioinformatics uses algorithms Algorithms to compare sequences: - to assess similarities - to study molecular evolution and phylogenesis Algorithms to predict: - genes - regulatory elements (promoters, etc.) - RNA structures - protein structures

  37. Sequence Similarity Searches Genetic variability Genome sequence Genome spequence mutations Ganome sequence Genme sequence Genome sequerce

  38. Sequences conservation and evolution • Evolution implies the generation of morphological and molecular variants. • At the molecular level, variants are created by errors (mutations) during DNA replication not corrected by DNA repair systems. • Introduction of mutations (single aa substitutions, deletions, insertions) imply that DNA segments with the same function in different organisms don’t share exactly the same sequence.

  39. Sequence alignment programs to study variability Sequence alignment establishes a biunivocal relationship between two sequences (or parts of them) so minimizing the number of operations necessary to transform one sequence into the other.

  40. Alignment is obtained by comparing sequences in a pairwise fashion Each comparison is given a score which is a measure of the degree of similarity

  41. When sequences are not identical, the alignment must contain gaps and mismatches SA= E V D Q K I S K W D SB= E V K K I T R P K W D Alignment: E V D Q K I S - - K W D | | | | | | | E V - K K I T R P K W D match gap mismatch

  42. Identity, Similarity and Homology Identity The extent to which two sequences are invariant Similarity Quantitative parameter defined by the alignment score Homology Origin from a common ancestor sequence

  43. ATA GAAKAVALVLPNLKGKLNGIALRVPTPNVSVVDLVVQVSKK-TFAEEVNAAFRDSAEK-- 328 ATB GAAKAVSLVLPQLKGKLNGIALRVPTPNVSVVDLVINVEKKGLTAEDVNEAFRKAANG-- 351 HS GAAKAVGKVIPELNGKLTGMAFRVPTANVSVVDLTCRLEKP-AKYDDIKKVVKQASEG-- 268 MM GAAKAVGKVIPELNGKLTGMAFRVPTPNVSVVDLTCRLEKP-AKYDDIKKVVKQASEG-- 266 XL GAAKAVGKVIPELNGKITGMAFRVPTPNVSVVDLTCRLQKP-AKYDDIKAAIKTASEG-- 266 DM GAAKAVGKVIPALNGKLTGMAFRVPTPNVSVVDLTVRLGKG-ASYDEIKAKVQEAANG-- 265 CE GAAKAVGKVIPELNGKLTGMAFRVPTPDVSVVDLTVRLEKP-ASMDDIKKVVKAAADG-- 274 SP GAAKAVGKVIPALNGKLTGMAFRVPTPDVSVVDLTVKLAKP-TNYEDIKAAIKAASEG-- 268 ATC GAAKAVGKVLPALNGKLTGMSFRVPTVDVSVVDLTVRLEKA-ATYEEIKKAIKEESEG-- 272 OS GAAKAVGKVLPDLNGKLTGMSFRVPTVDVSVVDLTVRIEKA-ASYDAIKSAIKSASEG-- 270 SC GAAKAVGKVLPELQGKLTGMAFRVPTVDVSVVDLTVKLNKE-TTYDEIKKVVKAAAEG-- 266 ECA GAAKAVGKVLPELNGKLTGMAFRVPTPNVSVVDLTVRLEKA-ATYEQIKAAVKAAAEG-- 266 HI GAAKAVGKVLPALNGKLTGMAFRVPTPNVSVVDLTVNLEKP-ASYDAIKQAIKDAAEGKT 268 ECC GAAKAIGLVIPELSGKLKGHAQRVPVKTGSVTELVSILGKK-VTAEEVNNALKQATTN-- 266 Homologous Sequences • Homologous sequence comparison helps in: • identifying important structural and functional domains of a given protein • identifying aa residues responsible for common features and those responsible for different features of a given protein

  44. Degree of Sequences Conservation • In sequence alignment both sequence identity and degree of conservation of different aa residues in positions where the two sequences differ are taken into consideration. • Molecules with similar primary aa sequence tend to have similar secondary and tertiary structures Conservative (two aa with similar chemical properties) substitutions Semi-conservative substitutions Non-conservative substitutions • If two proteins share 50% of their sequence, the probability that they have superimposable 3D structures is very, very high

  45. Genes in evolution Homologous genes are those evolved from a common ancestral precursor gene: • orthologous genes: genes in different species that have evolved directly from an ancestral gene, generally maintaining the same function. • paralogous genes: two genes or clusters of genes at different chromosomal locations in the same organism that have structural similarities and have diverged from the parent copy by duplication. In general, their function is different although correlated with that of the ancestral precursor gene.

  46. The three-letter and one-letter amino acid code

  47. - + non polar aa polar aa Amino acid polarity

  48. -ATGTTGAAGTTT- - M L K F - -ATGTTGAAGTTT- - M L K F - -ATGTTGAAGTTT- - M L K F - -ATGTTGAAGTAT- - M L K Y- -ATGTTGAAGGTT- - M L K V - -ATGTTGAAGTTC- - M L K F - aa sequence identity Different aa sequence, conserved structure Different aa sequence, altered 3D structure Sequence conservation during evolution • Evolution doesn’t work on DNA sequences or on primary structures of proteins, but only on 3D structures of proteins • As a consequence of this and of the degeneration of the genetic code, 3D structure of proteins is more conserved than primary structure, which in turn is more conserved than the nucleotide coding sequence

  49. Model organisms Nematode Caenorhabditis elegans Mouse Mus musculus 19.000 geni 30.000 geni 30,000 genes Fruit fly Drosophila melanogaster Zebrafish Danio rerio 13.600 geni

More Related