120 likes | 228 Vues
Explore the annotation metrics, data types, and gene prediction confidence in the VectorBase BRC4 2006 release. Learn about the manual annotation progress, merging gene sets, canonical gene predictions, and more.
E N D
VectorBase annotation metrics Daniel Lawson VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton UK VectorBase BRC4 2006
Topics • Annotation metrics • Numbers (Gene numbers & xrefs) • Data types (Availability & Integration) • Annotation SOPs • Genome specific • Gene specific • Gene build profile & prediction confidence VectorBase BRC4 2006
Considerations • Importance of calculating all metrics using similar methodology from the same data set • Metrics calculated from Ensembl using BioMart & raw SQL queries. • GO terms - many ways of calculating (InterPro2GO, projection from Drosophila orthologs) • No VectorBase capability to automatically assign EC numbers VectorBase BRC4 2006
Canonical Gene set VectorBase gene prediction pipeline (SOP) Blessed predictions Manual annotations Community submissions VB:SOP010 VB:SOP007 Similarity predictions Species-specific predictions VB:SOP002 & SOP003 VB:SOP001 Protein family HMMs ncRNA predictions VB:SOP009 VB:SOP008 Transcript based predictions Ab initio gene predictions VB:SOP004 VB:SOP005 VectorBase BRC4 2006
Assignment of SOPs to VectorBase genes: AgamP3.3 VectorBase BRC4 2006
Display of Metrics & SOPs • Metrics • VectorBase wiki • Species-page containing the three tables available from the VectorBase species homepage • Expansion of documents relating to genomic resources (citations, links to primary data where possible) • Single collated table for BRC as separate download • SOPs • VectorBase wiki • ‘Documents’ section of main site VectorBase BRC4 2006
Manual annotation progress VectorBase BRC4 2006
Merging gene sets Gene set #1 Gene set #2 Reduce to single predictions per locus Compare exon/intron structures Identical structures Compatible structures Different structures Merge/Split structures Complex No Map Add isoform predictions based on EST/Peptide data Canonical gene set VectorBase BRC4 2006