1 / 27

Gene Expression meets Gene Ontology: A novel statistical method for Microarray analysis

Gene Expression meets Gene Ontology: A novel statistical method for Microarray analysis. Vasanth Singan Advisors: Dr. John Colbourne & Dr. Haixu Tang. OUTLINE Introduction Background Challenge Previous Work Methodology Results Future Works. INTRODUCTION.

myrna
Télécharger la présentation

Gene Expression meets Gene Ontology: A novel statistical method for Microarray analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gene Expression meets Gene Ontology: A novel statistical method for Microarray analysis Vasanth Singan Advisors: Dr. John Colbourne & Dr. Haixu Tang

  2. OUTLINE • Introduction • Background • Challenge • Previous Work • Methodology • Results • Future Works

  3. INTRODUCTION • Gene expression profiling is providing breakthroughs in medical and fundamental biology research. • Many statistical approaches have been developed to analyze microarray results and identify genes that are regulated under experimental conditions. • Most of the statistical approaches do not consider the existing biological knowledge. We explore the possibility of using existing knowledge to improve the analysis.

  4. BACKGROUND Microarray: 1.cDNA or Spotted Array 2. High Density Oligonucleotide Array

  5. ovootuSxl Drosophila Microarray Experiment The Drosophila gene called ovo(shavenbaby) is required in the germline for sex-determination and female specific germline viability and differentiation OVO regulates its own transcription and the transcription of the gene out OVO-B is a transcriptional activator and is sufficient for female fertility OVO-A is a transcriptional repressor, which when miss-expressed, results in dominant-negative female sterility

  6. Drosophila Microarray Experiment • Goal - To identify additional genes in the germline pathway by probing for both direct and indirect targets of ovo using microarrays • This microarray analysis searched for differentially expressed genes in dissected ovaries from ovo mutants compared to wildtype. • microarrays are printed with ~15k spots - PCR Primers designed by Incyte Genomics amplify 93% of genes in annotation version 1.0 and 75% in version 3.1

  7. Significance Analysis of Microarrays (SAM) SAM computes a statistic di for each gene i, measuring the strength of the relationship between gene expression. It uses repeated permutations of the data to determine significance. SAM produces ranked list of genes based on the expression levels. Problem : Most of the statistical analyses treat each gene independently of each other, but in reality, genes are co-regulated and there are plenty of examples where individual genes do not meet statistical cut-off values yet may be significant if expression profiles are measured as a group.

  8. Example of SAM output

  9. CHALLENGE How to integrate existing knowledge about gene relations to improve tests of significance in microarray analysis ?

  10. Previous Work 1. Sung Geun Lee, Jung Uk Hur, and Yang Seok Kim A graph-theoretic modeling on GO space for biological interpretation of gene clusters Bioinformatics Advance Access published on January 22, 2004 Bioinformatics 2004 20: 381-388. 2. Barry R Zeeberg, et. al.,GoMiner: a resource for biological interpretation of genomic and proteomic data, Genome Biology 2003. 3. Sung Geun Lee, Wan Seon Lee, Yang Seok Kim GOODIES: GO Based Data Mining Tool for Characteristic Attribute Interpretation on a Group of Biological Entities Genome Informatics 14: 675-676 (2003). 4. Boris Adryan and Reinhard Schuh Gene ontology-based clustering of gene expression data Bioinformatics Advance Access published on April 29, 2004. 5. Peter N. Robinson, Andreas Wollstein, Ulrike Böhme, and Brad Beattie Ontologizing gene-expression microarray data: characterizing clusters with Gene Ontology Bioinformatics Advance Access published on February 5, 2004.

  11. Gene Ontology (GO) GO:01 Biological Process GO:02 Development GO:03 Behavior . . . . . . . . . . . . GO:04 Cell differentiation GO:05 Locomotory behavior GO:06 Reproductive behavior • Structured, controlled vocabularies (ontologies) • DAG (Directed Acyclic Graph) Node A Is_a / Part_of Node B

  12. GO:01 Genes a, b, c, d, k, l GO:02 Genes d, k, l GO:03 Genes a, b,c, d GO:06 Genes a, c GO:04 Gene k GO:05 Gene d Annotation

  13. Drosophila Microarray Experiment

  14. Methodology Ranked list of genes from SAM Gene Ontology DAG nodes Gene 01 Gene 02 Gene 03 Gene 04 . . . . . . Gene n Node 01 Node 02 Node 03 Node 04 . . . . . . . Node m

  15. Iterative Refinement Rank List of Genes from SAM Task I Compute significance of GO Nodes N iterations Task II Compute significance of Genes Ranked List of Genes and Nodes

  16. Task - I For each Node N, find the Log Likelihood & probability of it being differentially expressed. Task - II For each gene i, find the posterior probability of it being differentially expressed. Methodology – I(Log-Likelihood)

  17. Gene Significance Original Vs Scrambled

  18. Inferences from Methodology - I • Test against scrambled input shows marginal significance. • The distribution of probabilities of genes within a node are not significantly different from scrambled data set. • Noise is high in lowly expressed genes. • Nodes with too few genes or too many genes are affected by the relatively less proportion of significant genes.

  19. Example of SAM output

  20. Task - I For each Node N, find the E-value based on the average rank of genes. Task - II For each gene i, find the posterior probability based on E-value of the nodes. Methodology- II(Rank Based Permutation Test)

  21. Drosophila Microarray Experiment RANKED LIST OF GENES RANKED LIST OF NODES

  22. RESULTS 1. Functional categories (GO nodes) that are enriched with genes which are up-regulated / down-regulated. 2. A ranked list of genes with associated scores representing how significantly these genes are up-regulated / down-regulated.

  23. FUTURE WORKS • Cut-off value for genes without GO annotations • Jack-knife analysis • Analyze additional data sets

  24. ACKNOWLEDGEMENTS Dr. John Colbourne (CGB) Dr. Haixu Tang Center for Genomics and Bioinformatics Genome Informatics Laboratory

  25. QUESTIONS?

More Related