410 likes | 524 Vues
How to Measure Genetic Heterogeneity. International Workshop on Statistical-Mechanical Informatics 2009/09/13-2009/09/16 Unit of Statistical Genetics Center for Genomic Medicine Kyoto University Ryo Yamada. What is genetic heterogeneity?. Biological Strategies. Lives cover land.
E N D
How to Measure Genetic Heterogeneity International Workshop on Statistical-Mechanical Informatics 2009/09/13-2009/09/16 Unit of Statistical Genetics Center for Genomic Medicine Kyoto University Ryo Yamada
Biological Strategies Lives cover land. Landcover map by Environmental Research and Teaching at the University of Toronto
Wikipedia Slime mold changes its shape and moves around but uses spores to reproduce.
Slime mold keeps looking for new (better?) conditions. Space is too big to be covered completely. Therefore, multiple places are selected and they are bridged without break. Each part seems to act independently.
Food Slime mold is clever enough to find the shortest route in the labyrinth. Its strategy is being investigated as a new model of parallel computing system.
LIFE keeps looking for new (better?) conditions. Space is too big to be covered completely. Therefore, multiple places are selected and they are bridged without break. Each part seems to act independently. Phylogenic tree
They are bridged without break. WE are here because WE are all offspring of “No-break” family sharing the features of continuous LIFE.
?? Features of LIFE ?? • Keeps looking for something. • Accepts multiple conditions as good ones. • Stays contiguous each other. • Acts independently.
Slime mold distributes in physical space.LIFE distributes in genetic space.Distributions ~ Heterogeneity
LIFE distributes in genetic space. What is genetic space?
DNA molecules 4 letters, {A,T,G,C} L=3 x 109 in length (Homo sapience) Sequence variations 4L; L=1,2,…
Biological space is far much smaller than chemical space, But still enormously big.
Environmental fluctuations change width of pathways in biological space
Inter-species heterogeneity Intra-species heterogeneity Inter-species Phylogeny Nature Reviews Genetics3, 380-390 (2002); doi:10.1038/nrg795GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS Nature Reviews Genetics3, 380-390 (2002); doi:10.1038/nrg795GENEALOGICAL TREES, COALESCENT THEORY AND THE ANALYSIS OF GENETIC POLYMORPHISMS Intra-species Recombination Graph
Combination of letters are changed (Recombination) Letters are changed (Mutation) Mutant
Letters are changed : Mutation Combination of letters are changed : Recombination
Space is too big to be covered completely. L=3 x 109 Variable sites~10 x 106 Population size of homo sapience~6x109 210,000,000>>>>> 6x109
k sites → 2k sequence variations 00…000 p(1) 00…001 p(2) 00…010 p(3) 00…011 p(4) … 11…111 p (2k)=1-(p(1)+…+p(2k-1)) • 2k -1 parameters • Flat and equal
Genetic heterogeneityDependency or association among variable sitesHow to summarize the heterogeneity with how many parameters?
Pairwise relation : r2Variance-covariance matrix • describes the heterogeneity with k(k-1)/2 parameters for individual pairs. • predicts test statistics of associated markers for association study.
Ψ Hyper-cubes or lattice • Power set of {1,2,…,k} is consisted of 2k subsets. • φ • {1},{2},…,{k} • {1,2},{1,3},…,{2,3},{2,4},…,{k-1,k}→Pairwise • {1,2,3},{1,2,4},…,{2,3,4},…,{k-2,k-1,k} • … • {1,2,…,k} • Hierarchic parameters in full.
{1,2,…,k} Subsets with tandem elements in Ψ {1,2,3,4} {1,2,3} {1,2} {1,3} {1,4} Tandem pairs are elements of both. Pairwise relation : r2 {1,k}
One parameter for heterogeneity (1) • Entropy H=-Σp(i) ln(p(i)). Effective No. sites to describe heterogeneity. H=k ln(2) when all sites are independent. H=0 when a clone (no variation). • Entropy-based standardized measure of allelic association : ε ε=0 when all sites are independent. ε=1 when only 2 types of sequence exist.
One parameter for heterogeneity (2) • Entropy-based measure of allelic association : ε When k=2, ε=r2=Σ((obs-exp)2/exp)=Σ(obs2/exp)-1 • rk keeps the shape of the equation of r2 and fits the value from 0 to 1 for any k: rk=Σ(obs(1+1/(k-1))/exp (1/(k-1))) -1
Space is too big to be covered completely. 210,000,000>>>>> 6x109 Every sequence is unique. Frequency is not useful.
Sparse graph • Sequences can be plotted at nodes in k-dimensional hyper cube. • Graph distance between sequences is No. mutations.
Graph distance between sequences is No. mutations.Recombination’s distance? Recombination is three-term relation. But graph is for two-term relation. A more informational tool is necessary.
Biological meaning of heterogeneity: • It does not want to lose variations even when a significant part of it can not survive because they might be useful sometime.
Survival curve of variable sites when a fraction of population extinct • Each sequence set draws different survival curve. • A measure to represent curves : • The area upper the curve.
Various ways to measure • Pairwise relation r2 k(k+1)/2 • Power set Ψ 2k-1 Hierarchic • Entropy H 1 • Entropy-based ε 1 • r2-generalization rk 1 • Graph Mutation distance • Graph+α +Recombination distance • Survival curve 1 Simulation, Mutation and Recombination distance
Unit of Statistical Genetic, Center for Genomic Medicine Graduate School of Medicine, Kyoto University http://www.genome.med.kyoto-u.ac.jp/wiki_tokyo/index.php/