1 / 41

How to Measure Genetic Heterogeneity

How to Measure Genetic Heterogeneity. International Workshop on Statistical-Mechanical Informatics 2009/09/13-2009/09/16 Unit of Statistical Genetics Center for Genomic Medicine Kyoto University Ryo Yamada. What is genetic heterogeneity?. Biological Strategies. Lives cover land.

Télécharger la présentation

How to Measure Genetic Heterogeneity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to Measure Genetic Heterogeneity International Workshop on Statistical-Mechanical Informatics 2009/09/13-2009/09/16 Unit of Statistical Genetics Center for Genomic Medicine Kyoto University Ryo Yamada

  2. What is genetic heterogeneity?

  3. Biological Strategies Lives cover land. Landcover map by Environmental Research and Teaching at the University of Toronto

  4. Wikipedia Slime mold changes its shape and moves around but uses spores to reproduce.

  5. Slime mold keeps looking for new (better?) conditions. Space is too big to be covered completely. Therefore, multiple places are selected and they are bridged without break. Each part seems to act independently.

  6. Food Slime mold is clever enough to find the shortest route in the labyrinth. Its strategy is being investigated as a new model of parallel computing system.

  7. Phylogenic tree

  8. LIFE keeps looking for new (better?) conditions. Space is too big to be covered completely. Therefore, multiple places are selected and they are bridged without break. Each part seems to act independently. Phylogenic tree

  9. They are bridged without break. WE are here because WE are all offspring of “No-break” family sharing the features of continuous LIFE.

  10. ?? Features of LIFE ?? • Keeps looking for something. • Accepts multiple conditions as good ones. • Stays contiguous each other. • Acts independently.

  11. Slime mold distributes in physical space.LIFE distributes in genetic space.Distributions ~ Heterogeneity

  12. LIFE distributes in genetic space. What is genetic space?

  13. DNA molecules 4 letters, {A,T,G,C} L=3 x 109 in length (Homo sapience) Sequence variations 4L; L=1,2,…

  14. Biological space is a part of physico-chemical space.

  15. Biological space is far much smaller than chemical space, But still enormously big.

  16. Environmental fluctuations change width of pathways in biological space

  17. Inter-species heterogeneity Intra-species heterogeneity Inter-species Phylogeny Nature Reviews Genetics3, 380-390 (2002); doi:10.1038/nrg795GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS Nature Reviews Genetics3, 380-390 (2002); doi:10.1038/nrg795GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS Intra-species Recombination Graph

  18. Combination of letters are changed (Recombination) Letters are changed (Mutation) Mutant

  19. Letters are changed : Mutation Combination of letters are changed : Recombination

  20. 4L→ 2L

  21. Space is too big to be covered completely. L=3 x 109 Variable sites~10 x 106 Population size of homo sapience~6x109 210,000,000>>>>> 6x109

  22. k sites → 2k sequence variations 00…000 p(1) 00…001 p(2) 00…010 p(3) 00…011 p(4) … 11…111 p (2k)=1-(p(1)+…+p(2k-1)) • 2k -1 parameters • Flat and equal

  23. Genetic heterogeneityDependency or association among variable sitesHow to summarize the heterogeneity with how many parameters?

  24. Pairwise relation : r2Variance-covariance matrix • describes the heterogeneity with k(k-1)/2 parameters for individual pairs. • predicts test statistics of associated markers for association study.

  25. Ψ Hyper-cubes or lattice • Power set of {1,2,…,k} is consisted of 2k subsets. • φ • {1},{2},…,{k} • {1,2},{1,3},…,{2,3},{2,4},…,{k-1,k}→Pairwise • {1,2,3},{1,2,4},…,{2,3,4},…,{k-2,k-1,k} • … • {1,2,…,k} • Hierarchic parameters in full.

  26. {1,2,…,k} Subsets with tandem elements in Ψ {1,2,3,4} {1,2,3} {1,2} {1,3} {1,4} Tandem pairs are elements of both. Pairwise relation : r2 {1,k}

  27. One parameter for heterogeneity (1) • Entropy H=-Σp(i) ln(p(i)). Effective No. sites to describe heterogeneity. H=k ln(2) when all sites are independent. H=0 when a clone (no variation). • Entropy-based standardized measure of allelic association : ε ε=0 when all sites are independent. ε=1 when only 2 types of sequence exist.

  28. One parameter for heterogeneity (2) • Entropy-based measure of allelic association : ε When k=2, ε=r2=Σ((obs-exp)2/exp)=Σ(obs2/exp)-1 • rk keeps the shape of the equation of r2 and fits the value from 0 to 1 for any k: rk=Σ(obs(1+1/(k-1))/exp (1/(k-1))) -1

  29. Space is too big to be covered completely. 210,000,000>>>>> 6x109 Every sequence is unique. Frequency is not useful.

  30. Sparse graph • Sequences can be plotted at nodes in k-dimensional hyper cube. • Graph distance between sequences is No. mutations.

  31. Graph distance between sequences is No. mutations.Recombination’s distance? Recombination is three-term relation. But graph is for two-term relation. A more informational tool is necessary.

  32. Biological meaning of heterogeneity: • It does not want to lose variations even when a significant part of it can not survive because they might be useful sometime.

  33. Survival curve of variable sites when a fraction of population extinct • Each sequence set draws different survival curve. • A measure to represent curves : • The area upper the curve.

  34. Various ways to measure • Pairwise relation r2 k(k+1)/2 • Power set Ψ 2k-1 Hierarchic • Entropy H 1 • Entropy-based ε 1 • r2-generalization rk 1 • Graph Mutation distance • Graph+α +Recombination distance • Survival curve 1 Simulation, Mutation and Recombination distance

  35. Unit of Statistical Genetic, Center for Genomic Medicine Graduate School of Medicine, Kyoto University http://www.genome.med.kyoto-u.ac.jp/wiki_tokyo/index.php/

More Related