100 likes | 243 Vues
Information System for Bee Gene Annotation. Xin He Beespace Grouping Meeting Nov 30, 2005. Motivation. Analysis of bee microarray expression data requires an information system that provides functions not available elsewhere No public database dedicated to honey bee
E N D
Information System for Bee Gene Annotation Xin He Beespace Grouping Meeting Nov 30, 2005
Motivation • Analysis of bee microarray expression data requires an information system that provides functions not available elsewhere • No public database dedicated to honey bee • Non-traditional queries. Example: EST queries, find similarly expressed genes, etc.
Tasks • Gene homologs • Gene GO terms • GO term genes • Gene genes with similar expression • Gene genes with similar GO annotation
Database Design: Basic Entities • Ids: biological sequences. Three subtypes • Gene • Protein • EST • Gonames: GO terms
Database Design: Basic Relationship • Homologs: pairwise sequence similarity • Gos: gene annotation • Gosims: pairwise similarity of GO annotations • Exprsims: pairwise simiarity of gene expression pattern
Implementation of Tasks • Gene homologs: BLAST all pairs of genes. Choose E-value threshold 10E-10 • Gene GO terms • Fly: downloaded from Gene Ontology • Bee: from bee biologists • GO term genes
Implementation of Tasks • Gene genes with similar expression: compute pairwise Pearson correlation. Choose threshold 0.5 • Gene genes with similar GO annotation
GO-based Similarity • Idea: two genes are similar if they share some GO terms. Favor specific GO terms • View each gene as a document and a GO term as a term • Vector-space model: let t be a term, g be a gene, then • TF(t,g) = 1 if g is annotated with t; 0 o/w • IDF(t) = log[n/n(t)] n(t): #genes annotated with t • Cosine similarity
For Discussion • Internal database, shared by all Beespace projects. Include: Genes, Proteins, GO Terms, Expression • Ontology-based similarity: applications? • “Candidate genes” retrieval. Example: find all genes involved in segmentation clock