1 / 16

Introduction to Bioinformatics II

Introduction to Bioinformatics II. Lecture 6 By Ms. Shumaila Azam. Gene : A sequence of nucleotides coding for protein Gene Prediction Problem : Determine the beginning and end positions of genes in a genome. Gene Prediction: Computational Challenge.

Télécharger la présentation

Introduction to Bioinformatics II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Introduction to Bioinformatics II Lecture 6 By Ms. ShumailaAzam

  2. Gene: A sequence of nucleotides coding for protein • Gene Prediction Problem: Determine the beginning and end positions of genes in a genome.

  3. Gene Prediction: Computational Challenge aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg Gene!

  4. DNA transcription RNA translation Protein Central Dogma: DNA -> RNA -> Protein CCTGAGCCAACTATTGATGAA CCUGAGCCAACUAUUGAUGAA PEPTIDE

  5. Gene Prediction • Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced. • In computational biology gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. • protein-coding genes • RNA genes • regulatory regions

  6. Gene Prediction • Statistical analysis of the rates of homologous recombination of several different genes could determine their order on a certain chromosome, and information from many such experiments could be combined to create a genetic map specifying the rough location of known genes relative to each other. • Determining that a sequence is functional should be distinguished from determining the function of the gene or its product. • in vivo experimentation through gene knockout • bioinformatics research are making it increasingly possible to predict the function of a gene based on its sequence alone.

  7. Extrinsic approaches • In extrinsic (or evidence-based) gene finding systems, the target genome is searched for sequences that are similar to extrinsic evidence in the form of the known sequence of a messenger RNA (mRNA) or protein product. • Given an mRNA sequence, it is trivial to derive a unique genomic DNA sequence from which it had to have been transcribed. • Given a protein sequence, a family of possible coding DNA sequences can be derived by reverse translation of the genetic code.

  8. Extrinsic approaches • Once candidate DNA sequences have been determined, it is a relatively straightforward algorithmic problem to efficiently search a target genome for matches, complete or partial, and exact or inexact. • BLAST is a widely used system designed for this purpose.

  9. Ab initio approaches • Ab Initio gene prediction is an intrinsic method based on gene content and signal detection. • Because of the inherent expense and difficulty in obtaining extrinsic evidence for many genes, it is also necessary to resort to Ab initio gene finding. • genomic DNA sequence alone is systematically searched for certain tell-tale signs of protein-coding genes. • These signs can be broadly categorized as either signals, specific sequences that indicate the presence of a gene nearby, or content, statistical properties of protein-coding sequence itself.

  10. Ab initio approaches(prokaryotes) • In the genomes of prokaryotes, genes have specific and relatively well-understood promoter sequences (signals). • the sequence coding for a protein occurs as one contiguous open reading frame (ORF). • one would expect a stop codon approximately every 20–25 codons, or 60–75 base pairs, in a random sequence. • These characteristics make prokaryotic gene finding relatively straightforward, and well-designed systems are able to achieve high levels of accuracy.

  11. Open Reading Frame Finder(Input)

  12. Output

  13. Ab initio approaches(Eukaryotes) • Ab initio gene finding in eukaryotes, especially complex organisms like humans, is considerably more challenging. • First: the promoter and other regulatory signals in these genomes are more complex and less well-understood. • Two classic examples of signals identified by eukaryotic gene finders are CpG islands and binding sites for a poly(A) tail. • Second: splicing mechanisms

  14. Combined approaches • combine extrinsic and ab initio approaches by mapping protein and EST data to the genome to validate ab initio predictions.

  15. Comparative genomics approaches • As the entire genomes of many different species are sequenced, a promising direction in current research on gene finding is a comparative genomics approach. • This is based on the principle that the forces of natural selection cause genes and other functional elements to undergo mutation at a slower rate than the rest of the genome. • Genes can thus be detected by comparing the genomes of related species. • This approach was first applied to the mouse and human genomes

  16. GeneMarkS

More Related