1 / 29

Building a Unified Gene Catalog for the Mouse Reference Genome

Building a Unified Gene Catalog for the Mouse Reference Genome. Carol Bult The Jackson Laboratory. Mouse Genome Annotation Summit Bethesda, Maryland March, 2008. How similar are the results of different gene prediction pipelines for Build 37 of the reference mouse genome?. Gene Unification.

india
Télécharger la présentation

Building a Unified Gene Catalog for the Mouse Reference Genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building a Unified Gene Catalog for the Mouse Reference Genome Carol Bult The Jackson Laboratory Mouse Genome Annotation Summit Bethesda, Maryland March, 2008

  2. How similar are the results of different gene prediction pipelines for Build 37 of the reference mouse genome?

  3. Gene Unification • Compare genome annotations from: • NCBI (31,711 annotations) • Ensembl (28,167 annotations) • VEGA (14,919 annotations) • Determine: • Equivalent gene models • Gene models unique to Ensembl • Gene models unique to NCBI • Gene models unique to VEGA • Etc.

  4. Method:Genome feature overlap analysis • Assess genome coordinate overlaps for annotated exons • NCBI, Ensembl and Vega provided their annotations in a standardized file format w/B37 genome coordinates • Richardson, J. “fjoin: Simple and Efficient Computation of Feature Overlaps” J. Comp Biol 13:1457-64 (2006). • Overlap of a single nucleotide between two exons is sufficient to call two gene models “equivalent” • Overlap parameter is adjustable • Features to use to detect overlaps is configurable

  5. Caveats • Equivalent does not mean identical gene structure • Analysis does not evaluate which gene model is “best” --only that the annotations from different sources likely represent the same gene or transcriptional unit • Unique does not mean novel • Some known genes are present in one annotation file but not the other

  6. Example: Ensembl and NCBI 31711 28167 Unification (Exon Overlap Detection) Equivalent Unique to NCBI Unique to Ensembl 23650 8678 5248 1:1 1:n n:1 n:m 21528 629 788 705

  7. 0:1 1:0 1:1 1:n n:1 n:m E vs V 4764 17923 9322 333 505 433 N vs E 5248 8678 21528 629 788 705 V vs N 20208 3409 10606 405 410 535 Build 37 Summary E = Ensembl (28167) V = Vega (14919) N = NCBI (31711) E unique = 4707 N unique = 6953 V unique = 2986

  8. Equivalent (1:1:1) 11:84331455..84340462 Screenshots from MGI Mouse GBrowse

  9. Equivalent (1:n) 1:58765343..58820514

  10. Equivalent (n:1) Clec2g Clec2f 6:128876095..128986094 Some annotations masked out to improve clarity of example

  11. Equivalent (n:m) 2:155895575..155939706

  12. Unique to Ensembl and Vega Some annotations in this region are masked to enhance clarity of the example. Csmd2 Chr4:136463772..137119871

  13. Common Issues • Gene duplications/gene family • Read through transcripts • Shared first exons

  14. Gene Duplication/Gene Family Rex2 Reduced expression 2 Zinc finger protein 4:145845084..145895083

  15. Rex2?? 4:146339646..146439645

  16. 4:145845084..145895083 Rex2 4:146339646..146439645 Rex2

  17. Read through Transcripts Raver1 and Fdx1l 9:20862521..20912520

  18. Shared Exons Defb41 and novel defensin gene 1:18240353..18255926

  19. 10:21849916..22136785 Raet1a,b,c,d,e Some annotations masked out to improve clarity of example

  20. Importance of Annotation Coordination • Genome feature identity • Functional annotation associations • Experimental genetics • KOMP

  21. Gene Identity 16:96582252..96792251 Pcp4 and Igsf5

  22. Pcp4 – Purkinje cell protein 4 (MGI:97509) Igsf5 – Immunoglobulin superfamily, member 5 (MGI:1919308) There is no Igsf5 in Ensembl, but Igsf5 appears to be used as a synonym for Pcp4

  23. Clec2g Clec2f Clec2f

  24. Functional Annotations Clec2f (MGI:3522133) Clec2g (MGI:1918059)

  25. KOMP 10:51199649..51217200 Gp49a and Lilrb4 www.knockoutmouse.org In Ensembl, this gene model is associated only with Lilrb4. In MGI we associated it with Gp49a.

  26. 11:62630999..62696530 Trim16 and Fbxw10 www.knockoutmouse.org

  27. Joel Richardson Yunzia “Sophia” Zhu Ken Frazer TBK Reddy Bob Sinclair Deb Reed Richard Baldarelli Paul Flicek Steve Searle Acknowledgements • Deanna Church • Donna Maglott • Laurens Wilming NIH HG00330-P1

  28. Smgc and Muc19 15:91663946..91769797

More Related