Assessing Tree and Branch Reliability (1) Consistency and retention indices (2) Decay Indices

Assessing Tree and Branch Reliability (1) Consistency and retention indices (2) Decay Indices (3) Randomization Tests (4) Nonparametric Bootstrapping (5) Jackknifing (6) Bayesian Posterior Probabilities

Consistency Index * for character = minimum # changes/observ. # changes e.g., if two-state character evolves three times on a tree c.i. will be 1/3=0.333 * for a tree or ensemble c.i.= minimum # steps for all characters ------------------------------------------------- # observed steps for all characters

Retention Index measures only synapomorphy (ignores autoapomorphy) r.i. = (g-s)/(g-m) where m = minimum amount of change possible s = observed change (in a given tree) g = maximum homoplasy possible on given tree

Decay Index or Bremer support (Bremer 1995) • Bremer support value = the # of extra steps needed to lose a branch in the consensus tree of all most parsimonious or near most parsimonious trees • Watch for branch loss among strict consensus trees as steps are added to the tree lengths.

PTP = permutation tail probability Archie (1989) and Faith & Cranston (1991) Method: 1) randomize character state information in data matrix * draw out a matrix * done by randomizing states within characters * same frequencies of states are retained (in matrix/column) * make new data set of equal size 2) generate MPT and record length 3) compare observed MPT length to lengths of permuted data * thus randomizing states across taxa

Skewness & the G Statistic (Huelsenbeck and Hillis 1991) - n is the number of the trees - T tree length - s is the standard deviation of the tree lengths - g1 gets more negative as distribution becomes skewed to left - left skewness results from greater phylogenetic signal/structure

(Non-parametric) Bootstrap (Felsenstein 1985) Method: (1) characters are randomly sampled from data set with replacement until data set of equal size is obtained * some characters sampled more than once, others not at all (2) generate one tree from each replicate (3) repeat process, i.e., resample data 100 to 1000 times (4) save trees from each resampling (5) generate majority rule consensus tree (6) note fraction of times each branch was recovered: if 763 trees out of a 1000 bootstrap replicated had a clade of interest then the bootstrap value for a clade would be 76.3

Bootstrap values are not confidence intervals: * they are a measure of internal branch support “or branch reliability” * they measure not whether a tree is right but the probability of getting this same branch if more data were collected * any systematic error in data will result in higher bootstrap values as more data is collected but not the right topology

* another way to think about bootstrap branch estimates: • * 1-P = probability of getting that much evidence if the group, in fact, did not exist • * Thus, if a branch comes up supported 95% of the time: 1-95%=5% (5% of the time you can expect to see this branch (this well) supported when in fact the group does not exist)

Jackknifing * a resampling procedure without replacement * trees built from smaller data sets * compares trees built from random subsets of the data - can delete characters or - others delete taxa

Bayesian Posterior Probabilities * sample tree space, changing one parameter at a time, building a tree, then changing another parameter, building a tree, and so forth (tweak and build) * algorithm encouraged to find most likely tree given the data (and a model of evolution) * Bayesian approach yields the set of trees that is most likely to be explained by the sequences, or formally P[H|D] (the probability of the hypothesis being correct given the data)

Bayesian Posterior Probabilities * save a tree each time one of the parameters in the model is changed, i.e., at every interval determined by “samplefreq” command * common to generate 5 million trees, and save/sample one tree every 1000 generations * makes a tree file from sampled trees * builds a majority rule consensus tree * number tells us what proportion of the trees had a given clade * unlike bootstraps Bayes posterior probabilities will be an estimate of the true probabilities of that clade

Assessing Tree and Branch Reliability (1) Consistency and retention indices (2) Decay Indices

Assessing Tree and Branch Reliability (1) Consistency and retention indices (2) Decay Indices

Presentation Transcript

Assessing KSAs

Reliability

Executive Branch

Executive Branch

RELIABILITY

Reliability

AVL Tree

Judicial Branch

Tree Recursion

1. Crop/Tree

Tree to build

Which Tree?

Rhythm Tree

Branch Building – Hierarchy Tree

The Judicial Branch

Source Reliability

The Birch Tree

Fault Tree Analysis

Apple Tree

Executive Branch