1 / 27

how to speak ggplot2 like a native

DC R Meetup Predictive Analytics World October 19th, 2010 Harlan D. Harris, PhD harlan@harris.name. how to speak ggplot2 like a native. ggplot's philosophy. Graphics are (should be!) created by combining a specification with data . (Wilkinson, 2005)

napua
Télécharger la présentation

how to speak ggplot2 like a native

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DC R Meetup Predictive Analytics World October 19th, 2010 Harlan D. Harris, PhD harlan@harris.name how to speak ggplot2 like a native

  2. Harlan D. Harris, PhD ggplot's philosophy • Graphics are (should be!) created by combining a specification with data. (Wilkinson, 2005) • The specification is not the name of the visual form (bar graph, scatterplot, histogram). • The specification is a collection of rules that together describe how to build a graph, a Grammar of Graphics

  3. Harlan D. Harris, PhD data date ct sz z x=date y=ct/szbars group by z graphics as grammar me

  4. Harlan D. Harris, PhD advantages • Flexible • can define new graph types by changing specifications • can combine many forms into single graphs • Smart • compact: rules have useful defaults • graphs always have meaning • Reusable • can plug new data into old specification • can explore many types of plots from a set of data

  5. Harlan D. Harris, PhD ggplot2 • Hadley Wickham (Rice Univ.) • also: reshape2, plyr, etc. • Extends & implements The Grammar of Graphics(Wilkinson, 1995, 2005) • Focus on layers; based on grid • Specification as R objects constructed by functions • Large library of components with good defaults • ggplot2: Elegant Graphics for Data Analysis (Wickham, 2009)

  6. Harlan D. Harris, PhD my gripes • Specification is hierarchical structure;grammar is left-to-right R expression;graph is spatial • Can't see the structure (usefully) • Abuses both notation and R semantics • Deep Magic with lazy evaluation, proto objects • Existing tutorials lead to conceptual confusion, requires relearning of fundamentals • Start with the structure, not with the shortcuts

  7. Harlan D. Harris, PhD goal

  8. Harlan D. Harris, PhD data to plot

  9. Harlan D. Harris, PhD ggplot likes “long” data

  10. Harlan D. Harris, PhD will plot model vs. empirical

  11. Harlan D. Harris, PhD aes=”aesthetics”=”create mapping” simplest plot

  12. ggplot data layers mapping scales coords facets options x=Param.y=Errs color=Cond. (copy) layer[1] data mapping geom stat geom_ params stat_ params identity line you don't need to know this! structure ggplot(data=d.long.EI, mapping=aes(x=Parameter, y=Errors, color=Condition)) + layer(geom="line") • structure(p), str(p) Ø Harlan D. Harris, PhD

  13. Harlan D. Harris, PhD add empirical data and chance

  14. ggplot data layers mapping scales coords facets options layer[1] data mapping geom stat geom_params stat_ params layer[1] data mapping geom stat geom_params stat_ params layer[1] data mapping geom stat geom_params stat_ params layer[1] data mapping geom stat geom_params stat_ params you don't need to know this! structure so far x=Param.y=Errs color=Cond. (copy) line identity (U) point identity size=3 (K) hline hline yint=Errs size=2 color=”black” linetype=2 size=.5 hline hline yint=[64] Harlan D. Harris, PhD

  15. Harlan D. Harris, PhD scales

  16. Harlan D. Harris, PhD coordinates & scales • coordinates affect display of axes • cartesian, polar, map, etc. • scales affect data mapping • colors, shapes, lines • source of confusion • set axis ticks/breaks and labels with scale_x_continuous() or scale_y_discrete(), but • restrict DATA range with scale_*(limits=c(1,10))restrict AXIS (plotted) range with coord_cartesian(xlim=c(1,10))

  17. Harlan D. Harris, PhD options

  18. Harlan D. Harris, PhD shortcuts • All those layer() calls are tedious! • geom_*() creates a layer with a specific geom (and various defaults, including a stat) • stat_*() creates a layer with a specific stat(and various defaults, including a geom) • qplot() creates a ggplot and a layer

  19. Harlan D. Harris, PhD quick note on stats • stat=”identity” • stat=”lm” • fit y=f(x) with lm(), generate new data to be plotted by geom_line(), CIs with geom_ribbon() • stat=”smooth” • fit y=f(x) with loess() • stat=”summary” • y=f(x) with arbitrary f() • stat=”bin” • histograms

  20. Harlan D. Harris, PhD simplest faceted plot

  21. Harlan D. Harris, PhD everything else (+alpha)

  22. Harlan D. Harris, PhD other things I find useful • scale_x_continuous(breaks=seq(1,9,2), labels=c(“one”, “”, “five”, “”, “nine”)) • geom_text(aes(x=.., y=.., label=..)) • annotate(geom=”text”, x=14, y=19, “outlier!”) • geom_density() • stat_summary(fun.data=”mean_cl_boot”, geom=”crossbar”) • geom_jitter(position=position_jitter(width=.5))

  23. “fizzy bubbly” plot • rated.movies <- subset(movies, mpaa!=“”) • rated.movies$mpaa <- factor(rated.movies$mpaa) • p <- ggplot(rated.movies, aes(mpaa, rating)) + geom_jitter(alpha=.5) + stat_summary(fun.data= “mean_sdl”, geom=“crossbar”, color=“red”, size=1) • ggsave(“movies.png”, p, dpi=150)

  24. Harlan D. Harris, PhD takehomes • a ggplot graph is generated by a specification + data • ggplot specifications are a core object plus layers • mappings among data, x/y, scales, and other attributes are fundamental • geom and stat shortcuts allow smart/compact construction of graphs • ggplot encourages good graphs, with facets, good use of color, minimal chartjunk

  25. 2010 case study competition winner

  26. Harlan D. Harris, PhD resources • Wickham, H. (2009) ggplot2: Elegant Graphics for Data Analysis. Springer. • http://had.co.nz/ggplot2/ • http://groups.google.com/group/ggplot2 • http://stackoverflow.com/questions/tagged/r • http://github.com/hadley/ggplot2/wiki

  27. Harlan D. Harris, PhD thanks!

More Related