1 / 1

Eukaryotic Genome Annotation

join(9265..9395,9749..99342). complement(join(10164..10295,10349..10420,10467..10514,10566..10626,10681..10770,10823..10949,11001)). SpliceMachine. Start sites Splice sites. Coding IMM Intron IMM Intergenic IMM. Content potential for coding, intron and intergenic.

taniel
Télécharger la présentation

Eukaryotic Genome Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. join(9265..9395,9749..99342). complement(join(10164..10295,10349..10420,10467..10514,10566..10626,10681..10770,10823..10949,11001)) SpliceMachine Start sites Splice sites • Coding IMM • Intron IMM • Intergenic IMM Content potential for coding, intron and intergenic Eukaryotic Genome Annotation Lieven Sterck1, Stéphane Rombauts1, Jeffrey Fawcett1, Yao-Cheng Lin1, Steven Robbens1, Jan Wuyts1, Francis Dierick1, Pierre Rouzé2 and Yves Van de Peer1 1  Bioinformatics & Evolutionary Genomics Division, Plant Systems Biology, VIB/Ugent, Technologiepark 927, B-9052 Gent, Belgium 2  INRA-associated to Bioinformatics & Evolutionary Genomics Division, Plant Systems Biology, VIB/Ugent, Technologiepark 927, B-9052 Gent, Belgium E-mail: yves.vandepeer@psb.ugent.be URL: http://bioinformatics.psb.ugent.be/ Gene prediction and genome annotation have always been one of the main research topics of our group. Over the past years we have demonstrated the strength of our annotation platform and gained name and fame in the field of genome annotation through a number of collaborative efforts to annotate newly sequenced plant genomes. Now, although we are still involved in several annotation projects for higher plants, we are also more and more asked to be responsible for producing automatic genome annotations for a broader diversity of eukaryotic genomes like fungi and algae. Introduction Raw sequence data is not useful for biologists. To be meaningful it has to be converted into biological significant knowledge : markers, genes, RNAs, protein sequences. Genome annotation is the first step toward this knowledge acquisition. Information incorporation • A thorough annotation must take into account: • similarities with known sequences (proteins, ESTs, other genomes,…) • region content analysis • signal prediction software (ATG, splice sites) • integrated prediction tools (GenScan, FgenesH, … ) • all available significant biological knowledge Strengths of EuGene • Try to automate this as much as possible through the use of annotation platforms. • EuGene can be specifically adapted to the particularities of newly sequenced genomes which leads to higher quality predictions • exploits probabilistic models like Markov models for discriminating coding from non coding sequences • integrates information from several signal (splice site, translation start...) prediction software or other 3rd party software • Exploits the wealth of existing sequences (mRNA, 5'/3' EST couples, proteins, genomic homologous sequences) • integrates each source of information through small independent software components, called "plugins" The EuGene Annotation Platform Intrinsic approaches ATCCGTAAGATGGTGCGATGCCCTAAATGGGTCGGTTTATAAAGGCGCGTAGGTAAGTGCAATTTATTCTTCAAGTTCCGAATTTTATATGCGCATATCGTCAGTTCTTCTGTTGCAGTTGGCGCACTTGGACTACCTGCAATTTATTCTTCAAGTTCCGAATTTTATAT • each base of the genomic sequence is represented individually (nodes) • weighting, removal and addition of edges according to available information • shortest path in the graph = a possible gene structure EuGene Extrinsic approaches Blastx Blastn RepeatMasker Predicted Genes (structural annotation) Genomic sequence cDNA & EST proteins repeats Schematical representation of the EuGene platform. Depicted above is the basic set-up of EuGene, this scheme can be modified according to the genome that has to be annotated and the available data. • Based on all the available information, EuGene will output a prediction of maximal score, i.e. maximally consistent with the provided information. EuGene is developed by T. Schiex and co-workers (INRA-Toulouse, France) in cooperation with our group. 1: Schiex T, Moisan A, and Rouzé P. (2001) EuGène: An Eukaryotic Gene Finder that combines several sources of evidence. Computational Biology, Eds. O. Gascuel and M-F. Sagot, LNCS 2066, pp. 111-125, 2001 This work is supported by the European Commission (QLRI-CT-2001-00006) 2: Tuskan et al. (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray ex Brayshaw). Science 313, 1596 - 1604 3: Derelle et al. (2006) Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features, Proc. Natl. Acad. Sci. USA 103, 11647-11652 References

More Related