Genealogies of time structured data, an application on cave bear ancient DNA
Genealogies of time structured data, an application on cave bear ancient DNA. UMR 7625 Laboratoire d’écologie Paris 6/ENS. Frantz Depaulis. UMR 5534 Centre de Génétique Moléculaire et Cellulaire Université Claude Bernard, Lyon I. Ludovic Orlando Catherine Hannï.
Genealogies of time structured data, an application on cave bear ancient DNA
E N D
Presentation Transcript
Genealogies of time structured data, an application on cave bear ancient DNA UMR 7625 Laboratoire d’écologie Paris 6/ENS Frantz Depaulis UMR 5534 Centre de Génétique Moléculaire et Cellulaire Université Claude Bernard, Lyon I Ludovic Orlando Catherine Hannï
Outline of the presentation • Introduction: Gene genealogies • Results • .1 Simulation exploratory results • .2 Cave bear application • Conclusions
-Coalescence- Wright Fisher Neutral model Assumptions • Selective neutrality (Ne s <<1) • Demography - Isolated panmictic Population, - Constant size N - Poisson Distribution of offspring P (1) - Same sampling time • Mutational, sequence data: infinite site model (ISM) - No recombination - Independent mutations - Constant mutation rate µ Along the sequence Across time - Each mutation affects a new nucleotide site
-Coalescence- Genealogy of a gene sample Most recent common ancestor (MRCA) coalescence= common ancestor ancestral lineage gene sample
-Coalescence- Coalescent Most recent common ancestor of the sample (MRCA) A G Common ancestor (CA) T C C neutral mutations G A C c d e a b f sample of “genes” / of individuals
Exp( p ) t5: p=1/2N t4 t3 t2 t1 1°) Ages of the nodes a b c d e f -Coalescence- Constructing coalescents, additional assumption: n << N p = (n (n -1)/2)/2N
MRCA A common ancestor (CA) G T C neutral mutations C G A C T T A A A C C A G G C -Coalescence- 2°) Topology of the tree Constructing-deconstructing coalescents t5: t4 t3 t2 t1 100 000 times gene sample a b c d e f neutral distribution of sequence polymorphism 3°) uniform distribution of mutations
-Coalescence- Haplotype tests: simulations T parameters‡ : S =8 n =6 T A ... A G G A G C T A C C A G T C C 10 000 T G A C C Distribution of simulated H C simulations T T C density G T C C C C C T T T T A T G C C G G G A A A A A A G A A A C C C G C haplotype number K{ K = 5 K = 6 K = 4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 { 2 H= 1- S H f haplotype diversity H = 0.72 H = 0.78 H = 0.83 i observed H : P = 0.03 * Depaulis and Veuille MBE 1998 ‡ Hudson 1993
C T → GCCCGCGAATCCATT GCGTGCGATCCGATT GCGTACAATCCCGTC GTGTACAATCTCGAC GTGTACAATCTCGAC GCGTGGAATCCCGTT CCGCGCGGTCCCATT -Coalescence- Alignment of polymorphic sites: frequencies of mutations S =15 T C n =7 C GCGCGCGAACCCATT outgroup 121531416121423 frequencies
-Coalescence- Frequency spectrum of mutations & neutrality tests Number of polymorphic sites q=4Ne m fi : number of occurrences in a sample H=qp-qH =0 (Tajima Genetics 1989) (Fu and Li Genetics 1993) (Fay and Wu Genetics 2000)
Mitochondria, correlation LD/distance recombination or mutational effects? r 2 = ↘(d ) Pearson’s statistic tested by permutations of sites distance d Awadalla et al. (Science 1999)
-Coalescence- Time structured data & genealogies - Parasites during disease evolution (virus…) - Microbial experimental evolution - Ancient DNA • Issue: • To what extent the analyses are affected by time structure? • How to correct for this?
- Simulations- Algorithm for time structured coalescent n =2 n =3 n =4 n =2 n =5 d e f n1=3 n =3 t 1 a b c The exponential law is memoryless !
- Simulations- Age structure effect on gene genealogies n1=4 Two subsets with large time spacing Contemporaneous sample t 1 Limited time structure Excess of rare variants Deficit of LD Deficit of rare variants Excess of LD Differentiation
- Simulations- Effect of subset size on statistical tests : mean t1 =0.2 Ne generations n1 Dt D*fl Hfw ZnS K H Pearson Fst pi/pi0 S/S0 1.2 Mean 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 n1/n -0.2 -0.4 -0.6 n =40, S =20 Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests (K is scaled to its expected maximal value S+1 corresponding to q); Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutations; Fst Hudson et al's (1992) Fst.
- Simulations- Dt_inf D*fl_inf Hfw_inf ZnS_inf K_sup H_sup Fst 0.15 significance rate 0.1 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 n1/n Effect of subset size on statistical tests : significance rate t1 =0.2 Ne generations n1 n =40, S =20 Empty symbols: deficit of the statistics; Filled symbols: excess of the statistics. Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests; Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutation tested by permutations according to Awaddala et al. (1999); Fst Hudson et al's (1992) Fst tested by permutations
- Simulations- Dt D*fl Hfw K H ZnS Pearson Fst Pi/Theta0 S/S0 3 Mean 2.5 2 1.5 1 0.5 0 -0.5 -1 0.001 0.01 0.1 1 10 t1 in 2 generations Ne Effect of a half subset age on statistical tests: mean n1=n/2 t 1 Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests (K is scaled to its expected maximal value S+1 corresponding to q); Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutations; Fst Hudson et al's (1992) Fst.
- Simulations- Dt_inf Dt_sup D*fl_inf D*fl_sup ZnS_inf K_sup H_sup Fst 0.35 Significance rate 0.3 0.25 0.2 0.15 0.1 0.05 0 t1 0.001 0.01 0.1 1 10 in 2 generations Ne Effect of a half subset age on statistical tests: significance rates n1=n/2 t 1 Empty symbols: deficit of the statistics; Filled symbols: excess of the statistics. Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*;Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests; Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutation tested by permutations according to Awaddala et al. (1999); Fst Hudson et al's (1992) Fst tested by permutations
- Application- Cave bear: Ursus spelaeus(12-300kYA)
- Application- Sampling sites
- Application- Alignment of polymorphic sites: D-loop of cave bear REF TTGTCAACTT TCGAATTGAA GT#NOASC3500_40-45 ..A....T.C ..A....... ..#NOASC3800_40-45 ..A....T.C ..A....... ..#NOASC85F16_40-45 .......... .......... ..#NOASC95456_40-45 ..A....T.C ..A....... ..#NOASC92386_40-45 ..A....T.C ..A....... ..#NOASC92413_40-45 C.A....T.C ..A....... ..#NOASC92152_40-45 C.A....T.C ..A....... A.#NOASC5300_50-60 ..A....T.C ..A....... ..#NOASC11600_80 .......... .......... ..#NOASC12500_80 .......... .......... ..#NOASC13800_80 .......... .......... ..#NOASC100801_80 .......... .......... ..#NOASC12400_80 ..A....T.C ..A....... ..#NOASC11800_80 .CA....T.C ..A.G..... ..#NOASC11700_80 C.A....T.C ..A....... A.#NOASC84E16_90-130 C.A....T.C ..A....... ..#NOASC84G19_90-130 C.A....T.C ..A....... ..#NOASCbrC5-02_90-130 C.A....T.C ..A....... ..#NOASC15400_90-130 C.A....T.C ..A......G ..#NOASC15700_90-130 ....T.G.C. .TA..C..G. ..#NOATAB2_40 .......... .......... ..#NOAGrotteMerve_? .......... .T........ ..#NOAAZE_80-130 .......... .......... .C#NOAGigny189F3_? ..A....T.C ..A....... ..#NOAJAL104_? C.A....T.C ..A....... ..#NOATAB15_25-35 ..A......C ..A....... ..#NOAGailenreuth_? ..A......C ..A....... ..#NOA47910_30 ..A....T.C ..A....A.. ..#NOAHohleFels_? ..A....T.C ..A..C.... ..#NOACLA_35 ..A....T.C C.A....... ..#NOACLB_35 ..A....T.C C.A....... ..#NOAChiemsee_35 ..A..G.... ..A...C... ..#NOARamesch1_? ..A..G.... ..A...C... ..#NOARamesch2_? ..A..G.... ..A...C... ..#NOAGeissenklt1_? ...CT..... .T.G.C.... ..#NOAGeissenklt2_? ...CT..... .T.G.C.... ..#NOANixloch_? ...CT..... .T...C.... .. --------------------------------------------- Alp barrier #SOAPoto_? ...CT..... .T...C.... ..#SOAVind1_? ...CT..... .T...C.... ..#SOAVind2_? ...CT..... .T...C.... ..#SOAConturi_? .......T.. .......... .. n =41 S =22 Ne= 13 000 (Loreille et al. 2001) (Orlando et al. 2002) (Hofreiter et al. 2002) (Kühn et al. 2001)
- Application- Neutrality tests, Belgium cave * Statistic D D H K H Z Pearson t fl fw nS a Scladina Observed - 0.82 - 1.55 - 1.32 7 0.79 0.24 - 0.39 (2.8*) P No time (21.0) (5.3) (18.4) (16.4) (37.7) (43.7) (2.8*) ( value %) n =20 structure Mean 0.06 - 0.05 0.30 8.3 0.79 0.26 0.00 S =15 CI [ - 1.42;1.51] [ - 1 .89;1.18] [ - 4.46;2.62] [5;11] [0.64;0.88] [0.10;0.55] [ - 0.25;0.20] % rejected (4.9;5.5) (5.2;2.8) (5.4;4.8) (1.7;3.9) (4.9;4.6) (5.5;5.1) (5.0;/) Average P (30.0) (8.8) (17.2) (8.6) (31.2) (31.7) (2.7*) ( value %) time Mean - 0.30 - 0.38 0.3 9 9.1 0.80 0.22 0.00 structure CI [ - 1.56;1.26] [ - 1.89;0.84] [ - 4.04;2.56] [6;12] [0.66;0.89] [0.08;0.47] [ - 0.29;0.23] % rejected (7.8;3.0) (8.2;1.0) (4.2;3 . 7) (0.8;9.5) (3.3;7.8) (11.5;2.9) (4.9;/) P (30.0) (8.6) (17.4 ) (7.9) (30.9) (31.9) (2.8*) ( value %) Uncertainty Mean - 0.33 - 0.42 0.37 9.1 0.80 0.22 0.00 in time CI [ - 1.59;1.18] [ - 1.89;0.84] [ - 4.20;2.54] [6;12] [0.66;0.89] [0.08;0.48] [ - 0.29;0.24] structure (4.8;/) % rejected (9.3;2.8) (9.3;0.8) (4.5;3.6) (0.7;9.8) (3.7;7.5) (11.6;2.8) a permutation test
- Application- Neutrality tests, dated subsample * Statistic D D H K H Z Pearson t fl fw nS a all dated Observed - 1.21 - 2.28 - 0.69 12 0.86 0.14 - 0.27 (11.4) n No time P (10.5) (0.6**) (25.7) (16.5) (32.1) (24.3) (11.5) =27, ( value %) structure S Mean - 0.09 - 0.08 0.29 10.3 0.82 0.23 0.00 =20 CI [ - 1.49; 1.50] [ - 1.98;1.32] [ - 5.66;3.18] [7;14] [0.69;0.90] [0.09;0.48] [ - 0.19;0.16] % rejected (5.0;5.2) (3.6;1.4) (5.3;4.7) (4.0;2.8) (5.3;4.7) (5.7;5.0) (4.7;/) Average P (17.7) (1.7*) (24.3) (38.2) (42.6) (41.8) (11.2) ( value %) time structure Mean - 0.4 2 - 0.59 0.35 11.8 0.84 0.18 0.00 CI [ - 1.69;1.11] [ - 2.28;0.72] [ - 5.34;2.98] [8;15] [0.71;0.91] [0.07;0.39] [ - 0.23;0.20] % rejected (9.3;2.1) (6.9;0.3) (4.7;2.6) (1.2;11.1) (3.4;9.5) (13.7;2.4) (4.9;/) Uncertainty P (18.5) (1.9*) (23.4) (39.9) (43.2) (41.1) (11.9) ( value %) in time Mean - 0.44 - 0.61 0.37 11.8 0.84 0.18 0.00 structure CI [ - 1.70;1.09] [ - 2.28;0.72] [ - 5.23;2.99] [8;16] [0.71;0.91] [0.07;0.40] [ - 0.24;0.19] % rejected (9.3;2.4) (7.0;0.2) (4.6;2.7) (1.2;11.7) (3.5;9.7) (14.1;2.5) (5. 4;/) a permutation test
- Application- Neutrality tests, total sample * Statistic D D H K H Z Pearson F t fl fw nS st a a n Observed - 0.45 - 0.88 1.35 17 0.91 0.10 - 0.09 (22.0) 0.32 (0.4**) =41, No time P (37.1) (14.7) (47.1) (1.7*) (3.7*) (18.1) (21.5) (0.4**) ( value %) S =22 structure Mean - 0.09 - 0.09 0.30 12.3 0.83 0.19 0.0 0 - 0.03 CI [ - 1.44;1.52] [ - 1.85;1.38] [ - 5.84;3.15] [8;16] [0.70;0.90] [0.07;0.41] [ - 0.20;0.17] [ - 0.38;0.27] % rejected (4.5;5.3) (4.1;1.1) (4.8;4.7) (3.0;4.3) (4.8;4.9) (5.5;4.6) (4.8;/) (/;4.6) Average P (45.5) (35.6) (45.6 ) (7.8) (5.5) (36.6) (21.8) (1.3*) ( value %) time Mean - 0.45 - 0.74 0.32 13.9 0.84 0.15 0.00 - 0.01 structure CI [ - 1.71;1.10] [ - 2.49;0.73] [ - 5.38;2.93] [9;18] [0.71;0.91] [0.05;0.34] [ - 0.23;0.20] [ - 0.40;0.38] % rejected (10.2;2.2) (10.7;0.1) (4.2;2.4) (0.8;16.1) (4.3;7.9) ( 15.2;2.2) (4.9;/) (/;8.9) Uncertainty P (42.1) (40.7) (44.9) (10.3) (6.2) (39.2) (21.8) (1.7*) ( value %) in time Mean - 0.54 - 0.90 0.26 14.3 0.84 0.14 0.00 - 0.01 structure CI [ - 1.76;0.96] [ - 2.81;0.73] [ - 5.70;2.90] [10;18] [0.71;0.91] [0.05;0.32] [ - 0 .24;0.21] [ - 0.40;0.41] % rejected (12.2;1.4) (14.2;0.1) (4.5;2.3) (0.5;19.8) (4.0;7.9) (16.7;2.1) (4.7;/) (/;9.7) a permutation test
1 2 R = 0.4174 2 r 0.1 0.01 0 10 20 30 40 50 60 70 distance (nt) - Application- LD as a function of distance
Time structure , Conclusion • Can substantially bias the results • Even if within 10% of the age of the MRCA bottom of the tree with more branches non random subset of mutations (rare ones) • small: long external branches, excess of rare variants (negative D, deficit of LD) • great: a long internal branch apparent differentiation excess of intermediate frequency variants (positive D, excess of LD) if equilibrated
Acknowledgements • CNRS • Nick Barton