1 / 18

Plotting Multivariate Data

Plotting Multivariate Data. Harry R. Erwin, PhD School of Computing and Technology University of Sunderland. Resources. Everitt , BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold .

breena
Télécharger la présentation

Plotting Multivariate Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Plotting Multivariate Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

  2. Resources • Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold. • Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer

  3. Edward Tufte’s Recommendations • Show the data • Induce the viewer to think about the substance of the data • Avoid distorting what the data have to say • Present many numbers in a small space • Make large data sets coherent • Encourage comparison • Reveal the data at several levels of detail • Serve a clear purpose • Be closely integrated with the statistical and verbal descriptions of the data • Tufte, E R (2001), The Visual Display of Quantitative Information, Graphics Press.

  4. Tufte’s Points • Graphics reveal data. • Graphics can be more precise and revealing than conventional statistics. • Anscombe’s data • Anscombe, F J (1973) “Graphs in Statistical Analysis”, American Statistician, 27:17-21. • All four data sets are described by the same linear model.

  5. The Anscombe Graphics

  6. Ways of Looking at Data • Scatterplots • Demonstration • “The convex hull of bivariate data” • Demonstration • Chiplot • Demonstration • BivariateBoxplot • Demonstration

  7. And More Multivariate Graphics • Bivariate Densities • Demonstration • Other Variables in a Scatterplot • Demonstration • Scatterplot Matrix • Demonstration of pairs • 3-D Plots • Demonstration • Conditioning Plots • Demonstration

  8. Demonstration • Launch R • Set the working directory to Statistics/RSPCMA/Data • airpoll<-source("chap2airpoll.dat")$value • Review exercises on pages 19-22

  9. Convex Hull of Bivariate Data • Scatterplots are often used during the calculation of the correlation coefficient of two variables. • Used to detect outliers. • Convex hull trimming generates a robust estimate of the correlation coefficient. • Demonstration • attach(airpoll) • cor(SO2, Mortality)

  10. Robust Estimation of the Correlation • hull<-chull(SO2, Mortality) # finds the convex hull • plot(SO2, Mortality, pch=1) • polygon(SO2[hull],Mortality[hull], density=15, angle=30) • cor(SO2[-hull],Mortality[-hull]) • The results are almost identical, which is unusual.

  11. Chiplot • A way of augmenting the scatterplot to spot dependence/independence. • See Statistics/RSCMPA/functions.txt • chiplot(SO2,Mortality,vlabs=c("SO2", "Mortality") • For independent data, the points will be scattered in ahoriszontal band centered around 0. • Departure from independence here is shown by the points missing from (-0.25,0.25)

  12. BivariateBoxplot • Two-dimensional analogue of the boxplot • A pair of concentric ellipses—the inner ellipse (the “hinge”) holds half the data, and the outer ellipse (the “fence”) identifiers outliers. • Regression lines of x on y and y on x are shown. • bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality") • Cleaned up (more robust): • bvbox(cbind(SO2,Mortality), xlab="SO2", ylab="Mortality", method="O")

  13. Bivariate Densities • The goal of examining a scatterplot is to identify clusters and outliers. • Humans are not particularly good at this, so graphical aids help. • Adding a bivariate density estimate is good. • Histograms are too rough, though.

  14. Demo of Bivariate Density • den1<-bivden(SO2,Mortality) • persp(den1$seqx, den1$seqy, den1$den, xlab=“SO2”, ylab=“Mortality”, zlab=“Density”, lwd=2) • plot(SO2, Mortality) • contour(den1$seqx, den1$seqy, den1$den, lwd=2, nlevels=20, add=T)

  15. Adding a Third Variable to the Scatterplot • Thebubbleplot • plot(SO2, Mortality, pch=1, lwd=2, ylim=c(700,1200), xlim=c(-5,300)) # basic scatterplot. • symbols(SO2, Mortality, circles=Rainfall, inches=0.4, add=TRUE, lwd=2) # adding Rainfall to each point.

  16. Scatterplot Matrix • pairs(airpoll) • To add regression lines • pairs(airpoll,panel=function(x,y) { abline(lsfit(x,y)$coef,lwd=2) lines(lowess(x,y),lty=2,lwd=2) points(x,y)}) • For 3D graphics, use cloud • cloud(Mortality~SO2+Rainfall)

  17. Conditioning Plots • coplot(Mortality~SO2|Popden) • To add a local regression fit coplot(Mortality~SO2|Popden, panel=function(x,y,col,pch) panel.smooth(x,y,span=1))

  18. Conclusions • The purpose of graphics is to aid your intuition. • Explore them—the appropriate graphics reflect your questions and the structure of the data. • Next week: graphic presentations to avoid, because they mislead you and your audience. • Look at the books by Edward Tufte in the library.

More Related