Multivariate Statistics

Multivariate Statistics Least Squares ANOVA & ANCOV Repeated Measures ANOVA Cluster Analysis

Least Squares ANOVA • Do ANOVA as a multiple regression. • Each factor is represented by k-1 dichotomous dummy variables • Interactions are represented as products of dummy variables.

A x B Factorial: Dummy Variables • Two levels of A • A1 = 1 if at Level 1 of A, 0 if not • Three levels of B • B1= 1 if at Level 1 of B, 0 if not • B2 = 1 if at Level 2 of B, 0 if not • A x B interaction (2 df) • A1B1 codes the one df • A1B2 codes the other df

A x B Factorial: The Model • Y = a + b1A1 + b2B1 +b3B2 + b4A1B1 + b5A1B2 + error. • Do the multiple regression. • The regression SS represents the combined effects of A and B (and interaction).

Partitioning the Sums of Squares • Drop A1 from the full model. • The decrease in the regression SS is the SS for the main effect of A. • Drop B1 and B2 from the full model. • The decrease in the regression SS is the SS for the main effect of B. • Drop A1B1 and A1B2 from the full model • The decrease in the regression SS is the SS for the interaction.

Unique Sums of Squares • This method produces a unique sum of squares for each effect, representing the effect after eliminating overlap with any other effects in the model. • In SAS these are Type III sums of squares • Overall and Spiegel called them Method I sums of squares.

Analysis of Covariance • Simply put, this is a multiple regression where there are both categorical and continuous predictors. • In the ideal circumstance (the grouping variables are experimentally manipulated), there will be no association between the covariate and the grouping variables. • Adding the covariate to the model may reduce the error sum of squares and give you more power.

Big Error = Small F,Large p

Add Covariate, Lower Error

Big F = Happy Researcher

Confounded ANCOV • If the data are nonexperimental, or the covariate measured after manipulating the independent variables, then the covariate will be correlated with the grouping variables. • Including it in the model will change the treatment sums of squares. • And make interpretation rather slippery.

A Simple Example • One Independent Variable (A) with three levels. • One covariate (C) • Y = a + b1A1 + b2A2 +b3C + b4A1C + b5A2C + error. • A1C and A2C represent the interaction between the independent variable and the covariate

Covariate x IV Interaction • We drop the two interaction terms from the model. • If the regression SS decreases markedly, then the relationship between the covariate and Y varies across levels of the IV. • This violates the homogeneity of regression assumption of the traditional ANCOV.

Wuensch & Poteat, 1998 • Decision (stop or continue the research) was not the only dependent variable. • Subjects also were asked to indicate how justified the research was. • Predict justification scores from • Idealism and relativism (covariates) • Sex and purpose of research (grouping variables)

Covariates Not Necessarily Nuisances • Psychologists often think of the covariate as being nuisance variables. • They want their effects taken out of error variance. • For my research, however, I had a genuine interest in the effects of idealism and relativism.

The Results • There were no significant interactions. • Every main effect was significant. • Idealism was negatively related to justification. • Relativism was positively related to justification. • Men thought the research more justified than did women.

Purpose of the research had a significant effect. The cosmetic testing and neuroscience theory testing received mean justification ratings significantly less than those of the medical research. Hmmm, our students think the cosmetic testing not justified, but they vote to continue it anyhow.

Repeated Measures ANOVA • In the traditional (“univariate”) approach, subjects is treated as an additional classification variable. • A one-way RM ANOVA is really a two-way ANOVA, with subjects being the second factor. • This analysis assumes sphericity.

Sphericity • Suppose we have five levels of repeated factor A. • Find the standard error for the difference between level j and level k. • We assume that standard error is constant across jk pairs. • This assumption is frequently violated with behavioral data.

Corrections • There are procedures that correct for violation of the assumption of sphericity. • They reduce the degrees of freedom, much like done in the Welch ANOVA. • Greenhouse-Geisser is the more conservative procedure. • Huynh-Feldt is the less conservative procedure.

The Multivariate Approach • Suppose you have a one-way RM design with five levels of the grouping variable (G). • You treat the scores at any one level of G as one variable, so you now have five variables (G1 through G5), not two variables (G and Y).

Orthogonal Contrasts • Behind the scenes, your statistical program creates a complete set of orthogonal contrasts for the RM factor. • It then tests the null that every one of those contrasts has a mean of zero. • If that null is rejected, you conclude the RM factor has a significant effect. • There is no sphericity assumption with the multivariate-approach analysis.

Doubly Multivariate Analysis • Suppose that you have a design with one or more RM factor(s) • And you also have multiple dependent variables. • If you take the multivariate approach to analysis of the RM factor(s), then you have a doubly multivariate analysis.

Effects of Cross-Species Rearing • Wuensch (1992) • Newborn Mus fostered onto Mus, Peromyscus or Rattus. • Tested in apparatus where could visit four tunnels which smelled like • Clean pine shavings • Mus • Peromyscus • Rattus

Mus musculus

Peromyscus maniculatus

Rattus norwegicus

The Design • Dependent variables were • Latency to first visit of each tunnel • Number of visits to each tunnel • Cumulative time spent in each tunnel • Independent variables were • Scent of tunnel (4 levels, within-subjects) • Foster species (3 levels, between-subjects)

Doubly Multivariate Results • There were significant results of Foster Species, Scent of Tunnel, and the Interaction. • This was followed by univariate ANOVA, Foster Species x Scent of Tunnel, on each of the three dependent variables.

Results of the Univariate ANOVAs • The interaction was significant for each dependent variable. • Conducted simple main effects analysis. • Mus reared by Rattus had significantly more visits to and cumulative time in the rat-scented tunnel that did the other groups, and shorter latencies as well. • The other groups avoided the rat-scented tunnel.

Cluster Analysis • Goal is to cluster cases into groups based on shared characteristics. • Start out with each case being a one-case cluster. • The clusters are located in k-dimensional space, where k is the number of variables. • Compute the squared Euclidian distance between each case and each other case.

Squared Euclidian Distance • the sum across variables (from i = 1 to v) of the squared difference between the score on variable i for the one case (Xi) and the score on variable i for the other case (Yi)

Agglomerate • The two cases closest to each other are agglomerated into a cluster. • The distances between entities (clusters and cases) are recomputed. • The two entities closest to each other are agglomerated. • This continues until all cases end up in one cluster.

What is the Correct Solution? • You may have theoretical reasons to expect a certain k cluster solution. • Look at that solution and see if it matches your expectations. • Alternatively, you may try to make sense out of solutions at two or more levels of the analysis.

Faculty Salaries • Subjects were faculty in Psychology at ECU. • Variables were rank, experience, number of publications, course load, and salary. • The 2 cluster solution was adjuncts versus everybody else. • Adjuncts had lower rank, experience, number of publications, course load, and salary.

Three Cluster Solution • Non-adjuncts were split into senior faculty and junior faculty. • Senior faculty had higher salary, experience, rank, and number of pubs.

Four Cluster Solution • The senior faculty were split into two groups: The acting chair of the department and all of the rest of the senior faculty. • The acting chair had a higher salary and number of publications.

Workaholism • Aziz & Zickar (2005) • Workaholics may be defined as those • High in work involvement, • High in drive to work, and • Low in work enjoyment. • For each case, a score was obtained for each of these three dimensions.

The Three Cluster Solution • Workaholics • High work involvement • High drive to work • Low work enjoyment • Positively engaged workers (KLW) • High work involvement • Medium drive to work • High work enjoyment

Unengaged workers • Low work involvement • Low drive to work • Low work enjoyment • Past research/theory indicated there should be six clusters, but the theorized six clusters were not obtained.

Multivariate Statistics