1 / 19

Multivariate Analysis

Multivariate Analysis. Pattern Analysis Finding patterns among objects on which two or more independent variables have been measured Principal Coordinates Analysis (PCO) Principal Components Analysis (PCA) ( Flury 1988 ) Cluster analysis ( Everitt 1992)

jin
Télécharger la présentation

Multivariate Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate Analysis Pattern Analysis • Finding patterns among objects on which two or more independent variables have been measured • Principal Coordinates Analysis (PCO) • Principal Components Analysis(PCA) (Flury 1988) • Cluster analysis (Everitt 1992) • Allow the projection of multivariate phenotypic or genotypic measurements in lower dimensional spaces so that the underlying patterns or structures can be described and visually displayed • The ‘genetic’ patterns among a set of entities (genetic materials) is difficult to discern from DNA fingerprints (raw multivariate data) • Patterns among the entities can be ‘extracted’ by PCA, PCO or cluster analyses of pairwise genetic distance matrices

  2. Principal Components Analysis (PCA) Neighbor-Joining Cladogram

  3. Similarity and Dissimilarity (Genetic Distance) Measures Applications include: • Assessment genetic relationships • Prediction of heterosis • Heterotic group definition • Identification of duplicates in collections • Assessment of genetic diversity • Plant variety protection

  4. Similarity and Dissimilarity (Genetic Distance) Measures Choice of distance measure is affected by: • Properties of marker system • Genealogy of germplasm • Lines or populations • Objectives of study • Subsequent multivariate analysis

  5. Genetic distance (Dissimilarity) measures based on allele frequency data • The first step is to build a matrix of pair-wise measures of dissimilarity • Multiple indexes can be used to estimate dissimilarity

  6. Genetic distance measures based on allele frequency data (Reif et al. 2005. Crop Science 45 (1), 1-7

  7. Genetic distance measures based on allele frequency data • Euclidean (dE) - No underlying genetic concept. Can be used with multivariate methods that require Euclidean distances • Roger (1972) (dR) - Linearly related to coefficient of coancestry • Modified Roger’s(dW) - dW2 is linearly related to panmictic-midparent heterosis • Cavalli-Sforza and Edwards (1967) (dCE)- Based on Kimura’s (1954) model of selective drift

  8. Genetic distance measures based on allele frequency data • Reynolds et al. (1983) (dRE) – Based on a model where mutation and selection can be neglected and drift is the major evolutionary force • Nei (1972) (dN72) - Based on the infinite-allele model (Kimura and Crow, 1964) • Nei et al. (1983)(dN83) - For homozygous inbred lines, dN83 = dR and, hence, dN83 is also linearly related to the coancestry coefficient

  9. Similarity Measures for Binary Data Dice (1945) Simple matching Jaccard (1908)

  10. Shared allele distance S = No. of shared alleles u = No. of loci (Bowcock et al. 1994)

  11. Similarity Measures for Binary Data

  12. PRINCIPAL COORDINATES ANALYSIS (PCO or PCoA) Distance between Oregon towns (miles) Genetic distance between barley varieties (Nei et al., 1983 index)

  13. Principal Coordinates Analysis is a method to visualize similarities or dissimilarities of data. • It starts with a distance matrix (dissimilarity) and assigns for each item a location in a 2 or 3 dimensional space

  14. PRINCIPAL COMPONENTS ANALYSIS (PCA) Transforms a number of possibly correlated variables (in this case allelic states) into a smaller number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible

  15. The goal of PCA is to reduce the dimensionality of the data while retaining as much as possible of the variance of the observed variables: Reduces the number of observed variables to a smaller number of principal components which account for most of the variance The total amount of variance in PCA is equal to the number of observed variables being analyzed. Observed variables are standardized, e.g., mean=0, standard deviation=1 The first principal component identified accounts for most of the variance in the data. The second component identified accounts for the second largest amount of variance in the data and is uncorrelated with the first principal component and so on. Components accounting for maximal variance are retained while other components accounting for a trivial amount of variance are not retained. Eigenvalues indicate the amount of variance explained by each component. Eigenvectors are the weights used to calculate components scores.

  16. Cluster Analysis: Individuals with similar descriptions are mathematically gathered into a cluster. • Distance-based methods (starting from a distance matrix) • UPGMA (Unweighted Pair Group Method with Arithmetic Mean) • Neighbor-Joining • Model-Based methods Neighbor-Joining Cladogram

  17. POPULATION STRUCTURE Hypothesis 1: There is one population that has intermediate frequencies at all loci and all individuals are from that population Hypothesis 2: There are two populations: blue and pink, with high allele frequency at some loci and low allele frequency at other loci

  18. POPULATION STRUCTURE • It is important to estimate: • How many subpopulations there are • - To which subpopulation each individual belongs (%)

More Related