140 likes | 282 Vues
This resource provides a comprehensive overview of multivariate data presentation using R, focusing on best practices and visual representation techniques. It emphasizes the importance of accurately displaying data to avoid misleading representations. Key principles, such as maximizing the data/ink ratio and avoiding chartjunk, are discussed alongside practical R plotting commands and the Lattice graphics model for creating multi-panel plots. Ideal for individuals looking to enhance their data visualization skills and present clear, effective graphics.
E N D
Presenting Multivariate Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland
Resources • Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London: Arnold. • Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London: Springer • Tukey’s seminal paper: • <http://www.edwardtufte.com/tufte/tukey> • Tufte’s work: • http://en.wikipedia.org/wiki/Edward_Tufte • Murrell, P, (2006) R Graphics, Florida: Chapman & Hall/CRC.
Do not lie about the data. • How do people lie? • By presenting data selectively • By distorting the visual representation of the data • By failing to extrapolate scales from one portion of the image to another. • By changing scales • By inflating vertical scales • By failing to show the zero point or 100% point of an axis. • By representing linear data by area • By representing areal data linearly • By omitting data
Five Principles • Above all else, show the data • Maximize the data/ink ratio • Erase non-data ink • Erase redundant data ink • Revise and edit freely
Avoid Chartjunk • Avoid moiré effects—shimmering • Mute the grid • Dump the duck—avoid self-promoting graphics
A Laundry-List of Plotting Commands in R • Standard scatterplot commands • plot(dataset) # for two-column data • text(xval,yval,TextString) • barplot() produces barplots. • hist() produces histograms • boxplot() produces boxplots • pie() produces piecharts • pairs() produces a pairs diagramme
What can you plot? • The first argument(s) to any plot command is (are) very flexible • A dataframe • A pair of vectors • A relationship (pressure~temperature, data=pressure) • A model
Adding details to a plot • text(locx, locy, TextString) • points(vecx,vecy) to add points • lines(vecx,vecy) to draw connecting lines • matplot() • par() to customise the graphics • axis() to add an axis • grid() to add a grid • abline() to add a line to the plot • arrows() to add arrows to the plot • mtext() to add marginal text • title() to add a title • legend() to add a legend
Trellis Graphics • package(lattice) # creates objects of class trellis • Developed by Deepayan Sarkar • Generates complete plots • Operates just like traditional graphics, but optimised for us based on Bill Cleveland’s recommendations. • Ensure accurate and faithful communication of information • Supports ‘multi-panel conditioning’, which will be very useful.
The Lattice Graphics Model • tplot<-xyplot(lat~long, data=quakes, pch=“.”) • print(tplot) • tplot2<-update(tplot, main=“Earthquakes in the Pacific Ocean (since 1964)”) • Use the trellis.device() function instead of par()
Trellis Plots • Trellis • barchart() • bwplot() • densityplot() • dotplot() • histogram() • qqmath() • stripplot() • Standard • barplot() • boxplot() • nil • dotchart() • hist() • qqnorm() • stripchart()
More Trellis Plots • Trellis • qq() • xyplot() • levelplot() • contourplot() • cloud() • wireframe() • splom() • parallel() • Standard • qqplot() • plot() • image() • contour() • nil • persp() • pairs() • nil
Multipanel Conditioning • depthgroup<-equal.count(quakes$depth, number=3, overlap=0) • xyplot(lat~long|depthgroup, data=quakes, pch=“.”)