1 / 23

Gene Ontology and Orthology Correlation

I690: Computational Comparative Genomics Alaa Abi Haidar. Gene Ontology and Orthology Correlation. Motivation: Orthologous genes have similar biological and molecular functions. Similar research has been conducted on Molecular Ontology for Human and Mouse.

oihane
Télécharger la présentation

Gene Ontology and Orthology Correlation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I690: Computational Comparative Genomics Alaa Abi Haidar Gene Ontology and Orthology Correlation

  2. Motivation: Orthologous genes have similar biological and molecular functions. Similar research has been conducted on Molecular Ontology for Human and Mouse. I will compare between Human, Mouse and Rat and find correlation based on Molecular Function and Biological Process using a more granular GO similarity evaluation. Gene Ontology and Orthology Correlation

  3. https://rodeo.med.harvard.edu/ Orthology Data

  4. The Gene Ontology is a controlled vocabulary in the form of a Directed Acyclic Graph (DAG) containing 3 main ontologies which describe the Molecular Function, Cellular Component, and Biological Process of gene products. Gene Ontology

  5. Gene Ontology (Molecular Function)

  6. Gene Ontology (Cellular Component)

  7. GI2GO proccess gene 7215749..7371533 /gene="PEX14" /note="Derived by automated computational analysis using gene prediction method: BestRefseq." /db_xref="GeneID:5195" /db_xref="HGNC:8856" /db_xref="MIM:601791" mRNA join(7215749..7215789,7236065..7236112,7276996..7277080, 7340014..7340142,7359109..7359194,7363796..7363898, 7365117..7365214,7368049..7368140,7370308..7371533) /gene="PEX14" /product="peroxisomal biogenesis factor 14" /exception="mismatches in transcription" /note="Derived by automated computational analysis using gene prediction method: BestRefseq." /transcript_id="NM_004565.1" CDS join(7215754..7215789,7236065..7236112,7276996..7277080, 7340014..7340142,7359109..7359194,7363796..7363898, 7365117..7365214,7368049..7368140,7370308..7370764) /gene="PEX14" /note="NF-E2 associated polypeptide 2; go_component: membrane [goid 0016020] [evidence IEA]; go_component: peroxisome [goid 0005777] [evidence TAS] [pmid 9653144]; go_component: integral to peroxisomal membrane [goid 0005779] [evidence TAS] [pmid 10212238]; go_function: protein binding [goid 0005515] [evidence IPI] [pmid 10704444]; go_process: protein targeting [goid 0006605] [evidence IEA]; go_process: protein transport [goid 0015031] [evidence IEA]" /codon_start=1 /product="peroxisomal biogenesis factor 14" /protein_id="NP_004556.1" /db_xref="GI:4758896" /db_xref="GeneID:5195" /db_xref="HGNC:8856" /db_xref="MIM:601791" 56790895 ch1 44662822 ch1 44829052 ch1 56786139 ch1 47458048 ch1 44662823 ch1 42544157 ch1 47458050 ch1 5454161 ch1 0000004 7949163 ch1 0000004 14150063 ch1 14150064 ch1 88952344 ch1 88952345 ch1 24308457 ch1 24308458 ch1 15812217 ch1 0006810 0006810 0006810 0006810 4826972 ch1 0006810 0006810 0006810 0006810 68563511 ch1 4505718 ch1 0016559 4505719 ch1 0016559 38569397 ch1 0007160 0007160 38569398 ch1 0007160 0007160 24432064 ch1 24432065 ch1 31543399 ch1 0006350 0006350 0006350 5174629 ch1 0006350 0006350 0006350 62543566 ch1 62543567 ch1 52851451 ch1 0006350 0006350 21359969 ch1 0006350 0006350 33859667 ch1 33859668 ch1 51702223 ch1 0008283 0008283 0008283 5901910 ch1 0008283 0008283 0008283 • Download gzipped gbs or gbk files for all chromosomes for each genome from ncbi ftp (e.g. ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/CHR_01/hs_alt_chr1.gbs.gz) • Gunzip and extract GI numbers and associated GO (molecular and biological) for each gene • Create gi2go files for every genome for both molecular and biological ontologies

  8. Mary E. Dolan 1,*, Li Ni 1, Evelyn Camon 2 and Judith A. Blake 1 Bioinformatics 2005 21(Suppl 1):i136-i143; doi:10.1093/bioinformatics/bti1019 A procedure for assessing GO annotation consistency

  9. A procedure for assessing GO annotation consistency • Compares the Molecular Function of GO for human/mouse orthologous genes • MGI for mouse GO annotations (6087 annotated genes based on experimental literature) • GOA (EBI) for human GO annotations (9456 annotated genes based on experimental literature) • MGI database has 14,908 orthology pairs between mouse and humans. (Nov 12 2004) • GO_Slim provides coarse grained classification according to GO annotation of gene products. (13 categories of GO terms for this research)

  10. Mary E. Dolan 1,*, Li Ni 1, Evelyn Camon 2 and Judith A. Blake 1 Bioinformatics 2005 21(Suppl 1):i136-i143; doi:10.1093/bioinformatics/bti1019 A procedure for assessing GO annotation consistency

  11. NCBI for mouse GO annotations (29164) NCBI for human GO (27325 annotated genes) Rodeo Round-up Orthology (Harvard) has 15539 Orthologous mouse/human pairs. (only 6485 pairs have GO) GO_Score provides a more granular measure for assessing Gene functional similarity. Data • MGI for mouse GO annotations (6087 annotated genes) • GOA (EBI) for human GO annotations (9456 annotated genes) • MGI database has 14,908 orthology pairs between mouse and humans. (Nov 12 2004) • GO_Slim provides coarse grained classification

  12. GO Scoring S = max {2 , 3} = 3 (max of shared ancestors) Gene 2 Gene 1 Wu, H., Mao, F., Su, Z., Olman, V., and Xu. Y. (2005) Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Res., 33, 2822-37.

  13. GIM GIH DIST GO 13027410 10048430 0.1699 3 16758450 10048442 0.0287 3 16758042 10048446 0.2918 14 13929056 10048460 0.081 14 25453412 10092608 0.081 3 18376837 10181146 0.135 3 13928944 10181172 0.0547 3 8394100 10181174 0.1216 14 13027394 10198600 0.0739 6 13786136 10242385 0.0326 16 11560079 10863917 0.084 16 20806125 10946576 0.2918 14 8393221 10946582 0.2732 3 11024678 10946592 0.061 3 25742799 10946594 0.1633 3 13928930 10946598 0.0798 16 13027398 10946634 0.293 3 11560020 10946702 0.0677 7 9506841 10946714 0.3506 3 8394408 10946720 0.0364 3 13928922 10946800 0.0199 3 8394053 10946854 0 14 16758186 10946866 0.0478 3 18266700 10946928 0.0054 12 13928998 10946938 0.0878 3 13929006 10946940 0.0095 6 Output

  14. Pearson's Correlation

  15. Pearson's Correlation Results • Molecular Function Similarity correlated to Orthology • -0.08 (rat vs human) • -0.08 (mouse vs human) • -0.02 (rat vs mouse) • Biological Process Similarity correlated to Orthology • 0.013 (rat vs human) • 0.004 (mouse vs human) • 0.058 (rat vs mouse)

  16. results

  17. results

  18. Random set test

  19. Probability Distribution Test

  20. Probability Distribution Test (mouse/human Molecular Function)

  21. Probability Distribution Test Interpretation • The higher the GO score for a pair of genes the higher is the probability that they are orthologous. • For instance, it is 1500 more probable for a gene pair of GO Score=18 to be orthologous.

  22. Future Work • Generate rat/human and rat/mouse random files for molecular function • Generate the corresponding probability distributions • Compare the results with human/mouse • Repeat the whole process for Biological Process • New Orthology similarity measure • Retry the first average evaluation method but with a larger set of orthology. • Publish • Have a vacation

  23. Thank You • Enjoy the summer... :)

More Related