Molecular Evolution and Phylogeny Examples
Weakly deleterious mutations • Weakly deleterious mutations can reach high frequencies in local populations and, thus, may contribute significantly to genetic variance in disease susceptibility.
Sequencing of human polymorphisms • A team at Celera Genomics sequenced by exon-specific polymerase chain reaction (PCR) amplification 20,362 loci in 20 European Americans, 19 African Americans and one male chimpanzee with the initial intention of finding novel nonsynonymous single nucleotide polymorphisms (SNPs) based on their 2001 build of the human genome.
Divergence between human and mouse • A total of 34,099 fixed synonymous differences between 39 humans and the chimpanzee yield a genomic average synonymous divergence of dS = 1.02%. • 20,467 non-synonymous differences dN = 0. 242% across 11.81 megabases (Mb) of aligned coding DNA.
Polymorphisms • 15,750 synonymous and 14,311non-synonymous SNPs among the human subjects, yielding averagesynonymous and non-synonymous SNP densities of pS = 0.470%and pN = 0.169%.
Polymorphisms are more than divergence • a highly significant excess of amino acid variation relative todivergence.
Can you comment on the following? • Evolution of human populations since sharing a last common ancestor with chimps • Type of nonsynonymous mutations (very deleterious or mildly deleterious) in human populations • Positively selection • Negative selection • Disease associations?
Non-neutral evolution • dN/dS = 1 neutral evolution • dN/dS > 1 positive selection • dN/dS <1 negative selection
What makes us a vertebrate? • Neural crest? • Highly sophisticated nervous system? • Bones/cartilage? • Vertebrate specific genes?
Origin of bilateria • Some vertebrate genes date prior to the origin of bilateria
Bilateria • Bilateria: a monophyletic group of metazoan animals characterized by bilateral symmetry.
Radial symmetry • Bilateria excludes the Cnidaria, Ctenophora (sea gooseberries), Porifera (sponges) and Placozoa.
Cnidaria • Cnidaria: a basal phylum, has two body layers, radial symmetry and being at the tissue grade of morphological organization. • There are two basic morphologies; the sessile polyp and the swimming medusa or jellyfish. • The phylum contains four classes (examples), including jellyfish, sea anemone and hydra
Body Axis • Oral–aboral axis: the single obvious body axis of the two ‘radiate’ phyla (Cnidaria and Ctenophora), marked at one end by the mouth or oral pore.
Wnts signaling http://www.stanford.edu/~rnusse/reviews/NaVReviewFinal438747a.pdf
Wnt Signaling • In Wnt signalling pathway, ligand binding triggers the formation of a receptor complex, and protein kinases modify the receptor tails, leading to recruitment of cytoplasmic factors. • In other signalling pathways, receptor-induced protein phosphorylation amplifies the signal, and the receptor-associated kinase acts as a catalyst for the modification of many substrate molecules.
Wnt genes • Mammals have 19 wnts • Sea anemone has 12: • Nematostella vectensis, a diploblast Kusserow A, Pang K, Sturm C, Hrouda M, Lentfer J, Schmidt HA, Technau U, von Haeseler A, Hobayer B, Martindale MQ, Holstein TW (2005) Unexpected complexity of the Wnt gene family in a sea anemone. Nature 433:156-160.
Nematostella vectensis http://www.nematostella.org/
Expression of wnts The original bilaterian was equipped with a fairly elaborate set of molecular tools.
Endoderm, ectoderm, mesoderm • For example, the Nematostella ectodermal genes, NvWnt1, NvWnt2, NvWnt4 and NvWnt7 correspond to the neuroectodermal Wnt genes in the higher Bilateria. • NvWnt5, NvWnt6 and NvWnt8 are expressed in the endoderm, whereas the corresponding genes in deuterostomes are all expressed in the mesoderm.
Collagen • Bone is significantly linked to cartilage, both in development and evolution, with earlier forms having a cartilaginous skeleton that is replaced by bone. In vertebrates, cartilage also contains threads of collagen running through it.
Collagen • Bone is a living tissue continually remodeling the mineral matrix threaded with fibers of a protein, type II collagen, gives strength.
Collagen • Collagen is an ancient protein (800 million years ago?). • There are about 27 different types of collage in at least a dozen different classes. • http://web.indstate.edu/thcme/mwking/extracellularmatrix.html • One particular type, type II collagen, is an essential part of the matrix of bones and cartilages.
A primitive jawless fish from the late Devonian, around 370 million years ago.Do lampreys have collagen?
Initially it was thought lampreys don’t have collagen • Zhang et al. screened a library of lamprey sequences and isolated two forms of collagen II, Col2α1a and Col2α1b. • The presence of a collagen homolog related to human collagen II the gene arose before the (jawless)lamprey-gnathostome (true-jaws) split. • Col2α1 is used in developing branchial cartilaginous skeleton. Proc Natl Acad Sci U S A. 2006 Feb 21; Lamprey type II collagen and Sox9 reveal an ancient origin of the vertebrate collagenous skeleton. Zhang G, Miyamoto MM, Cohn MJ.
Bootstrapping • The bootstrap is a procedure that involves choosing random samples with replacement from a data set and analyzing each sample the same way.
Bootstrapping • Sampling with replacement means that every sample is returned to the data set after sampling. So a particular data point from the original data set could appear multiple times in a given bootstrap sample.
Bootstrapping • The number of elements in each bootstrap sample equals the number of elements in the original data set. The range of sample estimates we obtain allows us to establish the uncertainty of the quantity we are estimating.
Reliability of a tree • reliability of an estimated tree is to examine the reliability of each interior branch.
Bootstrap • the reliability of an inferred tree is examined by using Efron’s bootstrap resampling technique. • A set of nucleotide sites is randomly sampled with replacement from the original set, and this random set is used for constructing a new phylogenetic tree. • This process is repeated many times, and the proportion of replications in which a given sequence cluster appears is computed. • If this proportion (PB) is high (say, PB > 0:95) for a sequence cluster, this cluster is considered to be statistically significant.
Bootstrapping • Open Matlab • Open Help • Type bootstrap and read
Example > load lawdata > plot(lsat,gpa,'+') > lsline
Calculate correlation between lsat and gpa > rhohat = corrcoef(lsat,gpa) > rhohat = 1.0000 0.7764 0.7764 1.0000
Is 0.78 significant? • Now we have a number, 0.7764, describing the positive connection between LSAT and GPA, but though 0.7764 may seem large, we still do not know if it is statistically significant.
Bootstrp function • Using the bootstrp function we can resample the lsat and gpa vectors as many times as we like and consider the variation in the resulting correlation coefficients.
Generate 1000 lsat and gpa vectors by resampling from the original vectors • rhos1000 = bootstrp(1000,'corrcoef',lsat,gpa); • hist(rhos1000(:,2),30)
What is the uncertainty associated with the observed correlation? >> mean(rhos1000(:,2)) ans = 0.7711 >> std(rhos1000(:,2)) ans = 0.1350 >> 0.1350*1.96 ans = 0.2646 Mean +/-1.96*std
You have data on the expression pattern of two genes • HOXA1 and CDK6 expression values in different tissues are collected. • Open the excel file named data.xls • Copy and paste the numerial data columns (two of them) into the workspace as follows naming the data as ‘a’: • >> a =[ paste….here and close bracket];
Calculate the uncertainty associated with the correlation btw HOXA1 and CDK6 genes • Plot the expression values (x, HOXA1; and y, CDK6). • Place a lsline on the data • Calculate the correlation coefficient between the genes • Generate 1000 bootstrapped samples to estimate the sample correlation coefficient. • Determine the 95% confidence interval around the bootstrapped correlation coefficient.
Bootstrap of align2.m • Generate 1000 samples of bootstraped alignment score and its 95% confidence interval using the ‘bootstrp’ function.
Bayesian Inference • There are three basic methods that have been used to estimate phylogeny, including distance, maximum parsimony (MP),and maximum likelihood (ML). • Bayesian statistics differs in that in addition to the current data, prior knowledge is included in the testing of the hypothesis.
Medical tests and Bayesian Stats • Assume that previous studies have evaluated the accuracy of this test and have shown that, if you are in fact ill, there is a 99% likelihood that the test will give a true positive result (and thus, a 1% likelihood that the test will give a false negative).
Medical tests and Bayesian Stats • It was also found that if you are healthy, there is a 0.1% likelihood of a false positive result from the test. If we were simply using the “data” (i.e., the test result), we would then conclude that a positive test result had approximately a 99% chance of being correct.