1 / 17

Genome annotation techniques: new approaches and challenges Presented by Haili Ping

Genome annotation techniques: new approaches and challenges Presented by Haili Ping. Genome annotation techniques: new approaches and challenges ,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA. .

Antony
Télécharger la présentation

Genome annotation techniques: new approaches and challenges Presented by Haili Ping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome annotation techniques:new approaches and challengesPresented byHaili Ping Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA.

  2. Exponential increase of the amount of human genomic sequence and genomes from other species needs to be matched by increases in the accurate annotation of this huge variety of genomes Accurate annotation of the human genome and other species is an essential element in supporting current drug discovery efforts Bioinformatics solutions are increasingly required to develop automatic annotation techniques to support and complement the manual curation process

  3. Automatic genome annotation pipelines • Primary : goal is to deliver highly accurate and reliable genome annotations, using the widest range of evidence from available databases. • Enssence : pipelines are the integration of suites of bioinformatics software tools with multiple databases, to manage automatically the analysis and storage of genomic sequence • Trend : single algorithm methods consensus-based approaches combined results of gene predictors and similarity search methods are used

  4. The generic structure of an automatic genome annotation pipeline and delivery system

  5. Box 1. Useful human genome annotation and browser URLs Automated annotation pipelines                 EBI/Sanger Institute Ensembl Project: http://www.ensembl.org/Homo_sapiens/                 NCBI Human Genome Browser: http://proxy.library.uiuc.edu:3367/genome/guide/human/                 The Oak Ridge National Laboratories Genome Channel: http://compbio.ornl.gov/channel/                 Celera Discovery System: http://cds.celera.com/                 Incyte Genomics ¯ Genomics Knowledge Platform: http://www.incyte.com/incyte_science/technology/gkp/                 Paracel GeneMatcher2 System: http://www.paracel.com/products/gm2.html Human genome browsers                 UCSC Human Genome Browser: http://genome.cse.ucsc.edu/cgi-bin/hgGateway/                 Softberry Genome Explorer: http://www.softberry.com/berry.phtml?topic=genomexp                 Viaken Enterprise Ensembl Solution: http://www.viaken.com/ns/solutions/ensembl.html                 LabBook Inc. Genomic Explorer Suite: http://www.labbook.com/products/ExplorerSuite.asp                 University of Tokyo Gene Resource Locator Browser: http://grl.gi.k.u-tokyo.ac.jp/ Other useful sites                 The Institute for Genomic Research (TIGR): http://www.tigr.org/                 Human Genome Central: http://www.ensembl.org/genome/central/ and http://proxy.library.uiuc.edu:3528/genome/central/

  6. From raw sequence to gene predictions • Raw sequence pre-processing • masking known repeats and low comlexity sequences using RepeatMasker • identifying homology matches using BLAST • Scans for other features, such as sequence tagged site (STS) markers and CpG islands • Gene prediction • Predictions based on protein matches • Predictions based on DNA sequence • Ab initio gene prediction programs

  7. A simplified schematic of algorithmic gene prediction

  8. Gene function characterization • Mapping to known genes • RefSeq and SWISS-PROT • HUGO (NCBI,UCSC and Ensemble) • Protein domain annotation • Pam, PRINTS, PROSITE, ProDom, BLOCKS and SMART. • Interpro project :creating a unique characterization for a given protein family, domain or functional site. Domains of the protein sequences can then be identified using this signature method. The use of Interpro provides the least-redundant and extensive annotation currently available • Gene ontology • Gene Ontology (GO) project aims at defining such common terms to specify molecular function, biological process and cellular location

  9. Sharing genome annotations • Website display and ftp sites Chromosome 20   Overview

  10. Pros : does not require expert bioinformatics skills and they are thus more accessible to a wide range of researchers wishing to gain access to genomic annotation • Cons: it makes it difficult to perform large-scale data mining • Solution : enabling more experienced users to retrieve the data they require and to run analyses locally • Open annotation • The need for researchers to have access to annotations available in the community and to share their own contributions with the community • The need for a common protocol between systems that enables genome data to be freely exchanged the AGAVE (Architecture for Genomic Annotation, Visualization and Exchange) and the Distributed Annotation System (DAS) projects

  11. Challenges facing automatic annotation systems • Data warehousing: a solution for large-scale data mining • First, the desired query statement might be too complex to implement • Second, the computing power needed might be too expensive in most cases for queries performed on large, monolithic databases • Solution: • the business sector using data warehousing, which segregates information into denormalized databases, enabling fast querying and data retrieval. • a large variety of data-mining tools to extract datasets of interest efficiently can result in subsequent stages of statistical analyses or data mining

  12. The requirement to remain flexible The development of automated annotation pipelines is an evolving process. • the quality of sequences and assemblies continue to improve, redundant sequences are replaced with new, superior sequences demands • a flexible system in which new, individual sequences can be added and analysed without disrupting the whole system • new, improved algorithms and methodologies demands the architecture of a pipeline flexible to incorporate them into the analysis process without redesign of the system.

  13. Future opportunities • Comparative genomics As more genomes are sequenced and become publicly available in the next few years, comparative genomics will become one of the greatest areas of development • Cross-species Analysis : human-mouse Protein coding genes are likely to be highly conserved between closely related species (e.g. mouse and human), and other regions, such as RNA genes and regulatory regions, could also be elucidated • need for the development of bioinformatics tools Vista, Synplot and FamilyJewels the integration of such tools with the current automated approaches the design of genome browsers and websites that can intelligently display and annotate comparative results

  14. Integrating and delivering new data • Horizontal integration genomic systems should be able to cross-match species that can be sensibly compared • Vertical integration New flows of data coming from proteomics and microarray sources will soon have to be incorporated

  15. Concluding remarks • Automatic genome annotation systems • increased and is increasing. • Grounded upon central cores of bioinformatics software tools and associated relational databases sequenced genomes  integration of new genomes into the current systems the demand for an openess towards the distribution of annotation data the delivery of genomic data in forms suitable for large- scale data mining

  16. References : 1.Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA. 2.Discovering new genes with advanced homology detection, Trends in Biotechnology, Volume 20, Issue 8, 1 August 2002, Pages 315-316 Weizhong Li and Adam Godzik 3.Biswas M, O'Rourke JF, Camon E, Fraser G, Kanapin A, Karavidopoulou Y, Kersey P, Kriventseva E, Mittard V, Mulder N, Phan I, Servant F, Apweiler R. Applications of InterPro in protein annotation and genome analysis. Brief Bioinform. 2002 Sep;3(3):285-95. PMID: 12230037 [PubMed - in process] http://www.ebi.ac.uk/interpro/ 4.Visualizing the genome: techniques for presenting human genome data and annotations. BMC Bioinformatics. 2002 Jul 30;3(1):19. http://www.pubmedcentral.gov/articlerender.fcgi?tool=pubmed&pubmedid=12149135 5.Oshiro G, Wodicka LM, Washburn MP, Yates JR 3rd, Lockhart DJ, Winzeler EA. Parallel identification of new genes in Saccharomyces cerevisiae. Genome Res. 2002 Aug;12(8):1210-20. PMID: 12176929 [PubMed - indexed for MEDLINE] http://www.genome.org/cgi/content/full/12/8/1210

More Related