1 / 19

Data Preparation for Analytics Using SAS

Data Preparation for Analytics Using SAS. Gerhard Svolba, Ph.D. Reviewed by Madera Ebby, Ph.D. What is the purpose of this book?. Introduces the reader to data preparation Why data preparation is not only important but a must prior to data analysis

vevay
Télécharger la présentation

Data Preparation for Analytics Using SAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Preparation for AnalyticsUsing SAS Gerhard Svolba, Ph.D. Reviewed by Madera Ebby, Ph.D.

  2. What is the purpose of this book? • Introduces the reader to data preparation • Why data preparation is not only important but a must prior to data analysis • From data preparation process to data analytics

  3. The Analysis Path: From raw data to results that can be implemented

  4. The Analysis Path: From raw data to results that can be implemented Good Results ` Clever Modeling Adequate Preparation Data availability

  5. Four Dimensionsfor Analytic Data Preparation Business and Process Knowledge Analytical Knowledge Analytic Data Preparation Efficient SAS coding Documentation and Maintenance

  6. Business question: How did students who met the provincial standard in grade 3 perform in grade 6? • Generates many other questions • Work with people in other departments such as IT to carry out a data analytic process

  7. Why is this author qualified or not qualified to address this topic? • He is an experienced SAS user as exemplified in the many Macros • He addresses issues by presenting examples from different background

  8. What are the strengths or weaknesses of this book? • The book is written clearly and is easy to read • Provides the reader with a lot of examples of codes, input and outputs

  9. Would you recommend this book?  If so, who would you recommend it to and for what purpose?  • Those who prepare data marts for statistics or data mining or time series analyses • Those who provide data used in creating data marts IT and data warehousing • Both new and experienced SAS users who perform data analyses using data marts • Those who prepare data in relational databases with SQL

  10. Does the book achieve its purpose? Absolutely! It enables one to: • Understand the business environment in which data preparation occurs • Extract and structure your data • Create derived variables from different tables • Program SAS in an efficient way

  11. What is the best tip or technique addressed in this book? • There are many new techniques that I learnt from this book. For example: • Examine the mean scores for math by board mident

  12. Continued… • Procmeansdata=datalib.boards noprintnway; • class board_mident; • var Math_score; • outputout=datalib.aggr_static(drop=_type_ _freq_) • Mean= Sum= N= STD= MIN= MAX= /Autoname; • run;

  13. Continued… • To run analysis by board_mident, we use a CLASS statement. A BY statement could also be used but data would have to be sorted by board_mident • NWAY suppresses grand total mean and all other totals so that output data contains only rows for 5 boards which are the analysis subjects • The NOPRINT in order to suppress the printed output from the log, which can be thousands of descriptive measures even for a small sample of 5 observations • In the OUTPUT statement we specify the statistics that will be calculated . The AUTONAME option creates the new variable names in the form of VARIABLENAME_ STATISTIC • If we want to calculate different statistics for different input variables we can specify it on the OUTPUT statement: e.g SUM(VARIABLE)=sum_variable • In the OUTPUT statement we drop the _TYPE_ and _FREQ_vaiables, although we could keep the _FREQ_ and omit N from the statistics list. • Chapter 18, Multiple Interval-Scaled Observations per subject, page 183.

  14. CONTINUED…

  15. Are there other books (or sources of information) available with similar content?  • Yes, but tend to present bits and pieces of information • E.g. Resources on the internet • The Little SAS Book by Delwiche and Slaughter • If so, how does this book compare? • Comprehensive, well illustrated presentation of material

  16. What will your SAS log look like?

  17. or

  18. or

  19. or

More Related