120 likes | 217 Vues
VectorBase annotation metrics. Daniel Lawson VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton UK. Topics. Annotation metrics Numbers (Gene numbers & xrefs) Data types (Availability & Integration) Annotation SOPs Genome specific Gene specific
E N D
VectorBase annotation metrics Daniel Lawson VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton UK VectorBase BRC4 2006
Topics • Annotation metrics • Numbers (Gene numbers & xrefs) • Data types (Availability & Integration) • Annotation SOPs • Genome specific • Gene specific • Gene build profile & prediction confidence VectorBase BRC4 2006
Considerations • Importance of calculating all metrics using similar methodology from the same data set • Metrics calculated from Ensembl using BioMart & raw SQL queries. • GO terms - many ways of calculating (InterPro2GO, projection from Drosophila orthologs) • No VectorBase capability to automatically assign EC numbers VectorBase BRC4 2006
Canonical Gene set VectorBase gene prediction pipeline (SOP) Blessed predictions Manual annotations Community submissions VB:SOP010 VB:SOP007 Similarity predictions Species-specific predictions VB:SOP002 & SOP003 VB:SOP001 Protein family HMMs ncRNA predictions VB:SOP009 VB:SOP008 Transcript based predictions Ab initio gene predictions VB:SOP004 VB:SOP005 VectorBase BRC4 2006
Assignment of SOPs to VectorBase genes: AgamP3.3 VectorBase BRC4 2006
Display of Metrics & SOPs • Metrics • VectorBase wiki • Species-page containing the three tables available from the VectorBase species homepage • Expansion of documents relating to genomic resources (citations, links to primary data where possible) • Single collated table for BRC as separate download • SOPs • VectorBase wiki • ‘Documents’ section of main site VectorBase BRC4 2006
Manual annotation progress VectorBase BRC4 2006
Merging gene sets Gene set #1 Gene set #2 Reduce to single predictions per locus Compare exon/intron structures Identical structures Compatible structures Different structures Merge/Split structures Complex No Map Add isoform predictions based on EST/Peptide data Canonical gene set VectorBase BRC4 2006