1 / 22

Statistical Software Programming

Statistical Software Programming. STAT 6360 –Statistical Software Programming. SAS Graphics SAS has two main facilities for producing graphics: ODS Graphics Included in Base SAS (as of V.9.3) Easy to use. Based on templates for commonly used plot types.

ltemples
Télécharger la présentation

Statistical Software Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Software Programming

  2. STAT 6360 –Statistical Software Programming SAS Graphics SAS has two main facilities for producing graphics: • ODS Graphics • Included in Base SAS (as of V.9.3) • Easy to use. • Based on templates for commonly used plot types. • Includes several graphics PROCs, but also allows “canned” graphs to be generated as part of the execution of non-graphics PROCs (e.g., residual plots generated from PROC REG) • SAS/GRAPH • A separately licensed product. • Challenging to learn. • Flexible and powerful. • ODS Graphics was developed much more recently to make SAS graphics easier to use. • We will concentrate on ODS Graphics.

  3. STAT 6360 –Statistical Software Programming ODS ODS stands for Output Delivery System. • The system that SAS uses to send results from PROCs to different destinations. • ODS is running all the time under its default settings. • In addition, ODS commands can be given to ask for additional output to be sent to certain destinations.

  4. STAT 6360 –Statistical Software Programming ODS Opening and closing a destination: • In SAS V9.3 and later, the HTML destination is used by default. • Recall we changed this default by using the menus in SAS DM. • We went to Tools → Options → Preferences → Resultsand checked “Create Listing” and unchecked “Create HTML” and “Use ODS Graphics”. • We will stick with our changes to the defaults so that, when a SAS session begins, the HTML destination is closed, the LISTING destination is open, and ODS GRAPHICS are turned off. • However, in our program we can turn on or off any destination we like. The general syntax is as follows: ODSdestination <FILE="filename.ext"> <options>; • Here destinationis one of HTML, RTF, PDF, LISTING, etc. • The FILE= option is optional. If omitted no external file is created if LISTING is chosen; otherwise, SAS creates a file and file name and uses it (e.g., sashtml.htm if the HTML destination is chosen). • Options are destination-specific. • Multiple destinations can be open simultaneously.

  5. STAT 6360 –Statistical Software Programming ODS Opening and closing a destination: • To close a destination the syntax is as follows: ODSdestinationCLOSE; • Some destinations (e.g., PDF, RTF) need to be closed before you can view the output. • Exceptions are the LISTING and HTML destinations, where output is generated and can be viewed in the Results Viewer without closing the destination. • The OUTPUT destination is used to capture specific SAS results (e.g., the regression coefficients from a multiple linear regression fitted in PROCREG) to a dataset. It also does not need to be closed for the output dataset to be generated. Output Styles: • SAS has quite a few choices of styles that control how output looks (colors, fonts, font sizes, etc.) • Each destinations has a default style. E.g., the default HTML style is called HTMLBLUE. • Optionally, the user can switch from the default style when opening a destination, but we will rarely do so.

  6. STAT 6360 –Statistical Software Programming ODS Graphics Turning ODS Graphics On: • To turn ODS Graphics on in any SAS session, the syntax is: ODS GRAPHICSON; • In the DM’s preferences, one can set ODS Graphics on or off, which will control the default behavior when SAS starts up, but the ODS statement can be submitted or included in a SAS program to turn graphics on or off at any time. • Note that GRAPHICS is not a destination. We turn it OFF or ON, we don’t open or close it. • When ODS Graphics are on, procedures that have the capability to generate ODS graphics will do so. ODS graphics PROCs that generate stand-alone graphics work if ODS Graphics is on or off. • For the LISTING destination, ODS Graphics are generated, but not automatically displayed. To view them, you must double-click them in the Results window. • For other destinations, ODS graphics are shown in the Results Viewer window automatically. • ODS Graphics are written to the WORK library, but may be saved permanently. • ODS Graphics do not have to be on to use SAS/GRAPH (e.g., PROCGPLOT), which is not part of the ODS Graphics system.

  7. STAT 6360 –Statistical Software Programming ODS Graphics The most commonly used ODS Graphics PROCs are • PROC SGPLOT – creates single-cell plots • PROC SGPANEL – like SGPLOT (very similar syntax), but creates multi-cell plots. • PROC SGSCATTER – creates scatter plots and scatter plot matrices. • We will concentrate on SGPLOT. Examples of ODS Graphics produced by PROCs include: • PROCUNIVARIATE: CDF plots, histograms, Q-Q plots,… • PROCTTEST: Boxplots, histograms, Q-Q plots, plots of CIs for mean,… • PROCFREQ: Frequency plots (side-by-side bar charts, essentially), others. • PROCCORR: Scatter plots, scatter plot matrices. • PROCREG: Fitted line with confidence bands, several residual plots (resids vs fitteds, histogram of resids, Q-Q plots of resids, etc.) Cook’s D plot, leverage plot,… • PROCANOVA: side-by-side boxplots of response within each treatment. • PROCGLM: most plots given by REG and ANOVA, plus interaction plots (2-way models), analysis of covariance plots (for ancova models), …

  8. STAT 6360 –Statistical Software Programming PROC SGPLOT Syntax: PROCSGPLOTDATA=dsname <options>; PLOTSTATEMENT1specifications </options>; PLOTSTATEMENT2 specifications </options>; ⁞ RUN; • Here, PLOTSTATEMENTndenotes one of many plot statements valid in the PROC. These plot statements include statements that produce different types of plots (boxplots, histograms, scatter plots, etc.) and statements that control features within a plot (axes, reference lines, the legend, etc.). • There must be at least one plot statement. Overlays can be achieved by using multiple plot statements. • Specifications differ from one plot statement to the next. • Which plot statements can be used in combination depends on the plot type. • Global PROC statements such as BY, WHERE, LABEL and FORMAT can also be used.

  9. STAT 6360 –Statistical Software Programming PROC SGPLOT Some of the available plot statements that produce plots: • DOT – produces dot plots. • DENSITY – plots an estimated density (useful as an overlay). • HBAR / VBAR – Produce horizontal or vertical bar charts. • HBOX / VBOX – Produce horizontal or vertical boxplots. • HISTOGRAM – Produces histograms. • SCATTER – Produces scatterplots. • SERIES – Produces series plots (like scatter plot but with connected data points) • LOESS / REG / PBSPLINE – Produce scatterplots with a fitted parametric or nonparametric regression line (also useful as overlays). Some of the available plot statements that modify plots: • INSET – adds an inset (a text box within the plotting region) • KEYLEGEND – modifies the plot legend • REFLINE – adds a vertical or horizontal reference line to the plot. • LINEPARM – adds a line (of any slope) to the plot. • XAXIS / YAXIS – controls the labeling, tick marks, etc. for bottom and left axes. • XAXIS2 / YAXIS2 – controls the labeling, tick marks, etc. for top and right axes.

  10. STAT 6360 –Statistical Software Programming PROC SGPLOT – Bar Charts Syntax for vertical bars: VBARvar/ options; • Use HBARinstead of VBARfor a horizontal bar chart. • In simplest form, plots a frequency distribution for var. • Each bar corresponds to a value of var, with bar height the frequency of that value. • Joint frequency distributions can be obtained with the GROUP= option. Options include: • RESPONSE= – makes bar height equal to the mean or sum of a response variable for each value of var. • STAT= – specifies whether the MEAN or SUM of the response variable (if specified) is plotted, or the FREQ of varis plotted. • LIMITSTAT= - specifies how “error bars” for the top of each bar are computed (SD, CLM or STDERR) • GROUP= - specifies a categorical grouping variable. If so, stacked or clustered bars with summaries (FREQ, SUM or MEAN) at each combination of values of varand the grouping variable. • TRANSPARENCY= - takes value between 0 & 1. 0=opaque, 1=fully transparent. • See Bar Chart Examples in Lec6Examps.sas.

  11. STAT 6360 –Statistical Software Programming Collectively, these specify the binning scheme. If you specify NBINS, SAS will determine BINWIDTH and vice versa.

  12. STAT 6360 –Statistical Software Programming PROC SGPLOT – Box Plots Syntax for vertical boxes: VBOXvar/ options; • Use HBOXinstead of VBOXfor a horizontal box chart. • Plots a single box plot or side-by-side box plots across the levels of one or two factors (categorical explanatory variables). Options include: • CATEGORY= – Specifies a categorical variable. Box plots will be created for each category. • GROUP= – Specifies a second categorical variable. Box plots will be created for values of this variable within each level of the CATEGORY variable. • EXTREME – By default, whiskers extend to the smallest and largest data points inside the lower and upper “fence”. More extreme values are identified as outliers. This option makes the whiskers extend to the true min and max. • Many other options to control appearance and box plot features. • See anatomy of a box plot on next page. • See Box Plot Examples in Lec6Examps.sas.

  13. STAT 6360 –Statistical Software Programming Anatomy of a Box Plot

  14. STAT 6360 –Statistical Software Programming PROC SGPLOT – Scatter Plots Syntax: SCATTER X=xvarY=yvar/ options; • Plots yvar versus xvar. Options include: • GROUP= – Specifies a grouping variable. Data values from different groups will be plotted with different symbols. • DATALABEL= - This allows you to use the values of a variable to label the points being plotted. If no variable is specified, DATALABEL uses the value of yvar. Adding Curves or Lines to Scatter Plots: • These statements are useful overlays: • LOESS / REG / PBSPLINE – Produce or overlay fitted parametric (REG) or nonparametric (LOESS, PBSPLINE) regression lines. Same syntax as SCATTER. • Pointwise confidence limits for mean, or individual values can be added with CLM, CLI. • LINEPARM – Adds a fixed line to the plot. Three arguments: X=, Y=, SLOPE= which specify a point for the line to go through (X and Y), and a slope for the line. • REFLINE – adds a vertical or horizontal reference line. • Syntax: REFLINE value/ AXIS= <options>; • AXISchoices are X, Y, X2, Y2. • See Scatter Plot Examples in Lec6Examps.sas.

  15. STAT 6360 –Statistical Software Programming PROC SGPLOT – Series Plots Syntax: SERIES X=xvarY=yvar/ options; • Plots yvar versus xvarand connects the points with straight lines. • Consecutive data values in the input dataset will be connected, so crucial that the data be sorted correctly (usually appropriate to sort by xvar). Options include most of those from SCATTER plus: • MARKERS – Adds a marker (plotting symbol) for each point. • CURVELABEL= - Adds a label for the series being plotted. • GROUP= - Plots (and connects) separate series for each value of the GROUP variable. • See Series Plot Examples in Lec6Examps.sas. • In the gall bladder example, we use the BY statement to get separate graphs for each treatment. An alternative is to produce a single graph with separate panels for each treatment. This can be done with PROC SGPANEL. • Syntax is same as SGPLOT (all the same plot statements are available), but with one additional statement: PANELBY which determines the BY groups for which separate panels are constructed.

  16. STAT 6360 –Statistical Software Programming PROC SGPLOT – Customizing Graphs Axis Statements – XAXIS, YAXIS, X2AXIS, Y2AXIS • Syntax: XAXIS options; • Options: • LABEL= - Specifies a label for the axis. If not specified, the variable’s label (if it has one) or name is used. • TYPE= - Choices include DISCRETE, LINEAR, TIME, and LOG. • VALUES= - Specifies where tick marks should be placed. Can be a list (0, 5,10,15,20,25), or a range (0 TO 25 BY 5). Controlling Legends (KEYLEGEND) and Insets (INSET) • Syntax: KEYLEGEND </options>; INSET ‘string1’ ‘string2’…</options>; • Options: • POSITION= - Specifies the position for the legend or inset. Choices: TOP, TOPLEFT, etc. • BORDER / NOBORDER – puts a border around the legend/inset or not. • Additional Options for KEYLEGEND: • ACROSS= / DOWN= Specifies the # of columns / rows of legend elements. • LOCATION= - specifies the legend location as INSIDE or OUTSIDE the axis area.

  17. STAT 6360 –Statistical Software Programming PROC SGPLOT – Customizing Graphs Setting Graph Attributes (styles for lines, markers, etc.): For most plot statements the options may include • FILLATTRS = (COLOR=value) • LABELATTRS= (attribute=value) • LINEATTRS= (attribute=value) • MARKERATTRS= (attribute=value) • VALUEATTRS= (attribute=value) [controls style of tick labels] • Multiple attributes can be specified in each case. Attributes and some of their possible values: • COLOR= - Literally millions of colors are possible. SAS’s color naming schemes are quite complicated. Run colors.sas (download from eLC) for an HTML display of some of the named colors. Hover over a color to see the name. • SIZE=, THICKNESS= - specify a number with units CM, IN, MM, PCT, PT, or PX. • STYLE= - takes values ITALIC or NORMAL • WEIGHT= - takes values BOLD or NORMAL • SYMBOL=, PATTERN= - See markers symbols and line patterns on next slide.

  18. STAT 6360 –Statistical Software Programming Some of the Symbols and Patterns in SAS/GRAPH

  19. STAT 6360 –Statistical Software Programming ODS Graphics– Saving Graphs • For the HTML destination, graphs in the Results Viewer can be copied and pasted into many other applications. • In the HTML and LISTING destinations, every graph created is saved in a separate file. SAS has a default folder (the graphics path or GPATH) and uses sequentially numbered files in that folder. • E.g., SGPlot1.png, SGPlot2.png,… • One can change the default GPATH with the ODS statement: • E.g.: ODSHTMLGPATH=“E:\STAT 4-6360\Fall 14\SASgraphs"; • One can change the file naming scheme too: • E.g.: ODSGRAPHICS / RESETIMAGENAME=‘MyGraph'outputfmt=PNGheight=4in ; • RESET restarts the sequential numbering at the beginning which may result in over-writing previously created files. • The height, width and output format (PNG, JPEG, TIFF, etc.) can be controlled too. I recommend PNG. • The statements above create files for each graph called MyGraph.png, MyGraph1.png,… in the folder specified by GPATH.

  20. STAT 6360 –Statistical Software Programming ODS Graphics– Saving Graphs • To illustrate how to save graphs while using the HTML destination, two of the graphs we created in Lec6Examps.sas are saved as .png files at the end of the program. • For the PDF and RTF destinations, graphs and all other output will be integrated into a single file, so these destinations are not as convenient for saving graphs as individual files. • To specify the file in which to save all output in these destinations, use an ODS statement like this: ODSPDF FILE=“E:\STAT 4-6360\Fa14\output.pdf" ; or this: ODSRTF FILE=“E:\STAT 4-6360\Fa14\output.rtf";

  21. STAT 6360 –Statistical Software Programming ODS Graphics– Editing Graphs Graphs produced with ODS Graphics can be edited interactively with the SAS ODS Graphics Editor. • This editor can be launched from within a SAS session or opened as a stand-alone application. • Only two types of files can be edited with this tool: • .PNG files. Not fully editable. Only annotation can be added. • .SGE files. All graphical elements, annotations can be edited and annotations can be added. Creating a .SGE version of your graph: • Can only be done using the LISTING OR HTML destination. • Add SGE=ON option to the ODS destination statement. E.g.: ODSHTMLGPATH=“E:\STAT4360\SASgraphs"SGE=on; • With this option, two files will be created for each graph: a .sge file and a file in the requested output format (e.g., .png if using the default).

  22. STAT 6360 –Statistical Software Programming ODS Graphics– Editing Graphs Once the .SGE file exists, it can be opened in the ODS Graphics Editor by • Double-clicking the file name in Windows Explorer (outside of SAS). • Opening the Editor application outside of SAS and then opening the file from the editor. • Within SAS, double click the graph from the Results window. It will have the following icon: • From the Editor, you can change titles, labels, etc.; add text boxes; change the style of lines, markers, fills, etc.; add annotations such as arrows, ovals, etc. • After editing, you can save the file as a .SGE file so that it can be further edited, or a .PNG file for inclusion in other documents. • See Lec6Examps.sas.

More Related