1 / 61

Statistics: Dangers and Wonders

Learn about the importance of knowing your data, analyzing datasets, and using statistics effectively using SAS. Discover how to estimate population values, handle messy data, and use survey procedures for accurate results.

sfackler
Télécharger la présentation

Statistics: Dangers and Wonders

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Dangers and Wonders of Statistics Using SASorYou Can Have it Right and You Can Have it Right Away AnnMaria De Mars, Ph.D. The Julia Group http://www.thejuliagroup.com Presentation at Los Angeles Basin SAS Users Group Meeting

  2. Statistics are Wonderful

  3. Three common statistical plots • Analyzing a large-scale dataset for population estimates • Small datasets for market information • Comparison of two groups to determine effectiveness

  4. How much time do people spend alone? 1. National Survey Example

  5. Very quick, wrong answer

  6. First problem: Data are wrong

  7. Know your data. Know your data. I said it twice because it was important. Get to intimately know your data before you do ANYTHING.

  8. American Time Use Survey • Conducted by U.S. Census Bureau • Study of how a nationally representative sample of Americans spend their time

  9. Common Survey Issue Samples are not simple random but often multi-stage stratified, meaning that … “Users need to apply weights when computing estimates with the ATUS data because simple tabulations of unweighted ATUS data produce misleading results. “

  10. Equation provided by ATUS Ti = ∑ fwgt Tij ------------ ∑fwgt In other words …. The average amount of time the population spends in activity j Tj is equal to • The sum of the weight for each individual multiplied by the individual responses of how much time they spend on activity j • Divided by the sum of the weights.

  11. Really easy answer PROC SORT ; BY sex child ; PROC MEANS DATA= in.atus ; BY sex child ; WEIGHT tufinlwgt ;

  12. Right procedure, wrong answer Data are coded with negative numbers, e.g., -1 = blank -2 = don’t know -3 = refused to answer With the result that for some procedures the means shown are actually negative time spent in an activity

  13. Fixing the data DATA atus ; SET mylib.atus ; ARRAY rec{*} _numeric_ ; DO i = 1 TO DIM(rec) ; IF rec{i} < 0 THEN rec{i} = . ; END ;

  14. How this impacts output# of minutes per day aloneWith and without weights

  15. Generalizing to the Population

  16. The Problem I would like to get an estimate of the population values. How many children are in the average household? How many hours does the average employed person work?

  17. A bigger problem It is not acceptable to just calculate means and frequencies, not even weighted for percent of the population, because I do not have a random sample. My sample was stratified by gender and education.

  18. Some common “messy data” examples • Small, medium and large hospitals in rural and urban areas • Students selected within classroms in high- , low- and average- performing schools

  19. Data requiring special handling • Cluster samples - subjects are not sampled individually, e.g., classrooms or hospitals are selected and then every person within that group is sampled. • Non-proportional stratified samples - a fixed number is selected from, e.g., each ethnic group

  20. SAS PROCEDURE FOR A STRATIFIED SAMPLE PROC SURVEYMEANS DATA=in.atus40 TOTAL = strata_count ; WEIGHT samplingweight ; STRATA sex educ ; VAR hrsworked numchildren ;

  21. PROC SURVEYMEANS DATA=in.atus40 TOTAL = strata_count ; Gives a dataset with the population totals for each strata WEIGHT samplingweight ; STRATA sex educ ; VAR hrsworked numchildren ;

  22. SURVEY MEANS OUTPUT

  23. Surveymean Output

  24. Dataset with total counts

  25. Answers Price List Answers $1 Answers, Correct $100 Answers, Requiring Thought -- $1,000

  26. Survey Procedures • Surveymeans - can provide estimates of means, standard errors, confidence intervals • Surveyfreq - provides estimates of population totals, standard errors, confidence limits

  27. And now for something completely different … 2. Using SAS Enterprise Guide to analyze target market survey data in the hour before your meeting

  28. It’s not always rocket science There may be a tendency to use the most sophisticated statistical techniques we can find when what the customer really wants is a bar chart

  29. Customer Need Our target market is Native Americans with chronic illness in the Great Plains region. We want to know how people get most of their information so that we can develop a marketing strategy.

  30. Questions • How often do people read the newspaper versus use the Internet? • Is it the same people who are using a lot of media, e.g. email, radio, Internet, or do different people use different sources of information?

  31. Creating Enterprise Graphs • Double-click on SAS dataset to open • Select Graph > Bar Chart > Colored Bars • Select Task Roles • Click on Internet_Use • Select Analysis Variable Repeat steps for second chart for newspaper readership

  32. Correlations • Select Analyze > Correlations • Select variables from list • Click RUN

  33. Frequency Distribution Select Describe > One-Way Frequencies

  34. Recommendations • Create a website and an email list to contact potential customers on the reservations • Advertise on the radio and in the newspaper That will be $4,000, please.

  35. Nice theory, but does it work? 3. Evaluating program effectiveness

  36. We changed something. Did it work ? A two-day staff training program was offered. A pre-test was given before training occurred and at the conclusion of training. The test consisted of multiple choice questions and case studies.

  37. Just for fun …. I decided to do the whole project using only two procedures, PROC CORR and PROC GLM

  38. Wonder of SAS: One step produces multiple steps in psychometric analysis PROC CORR DATA = tests ALPHA ; WHERE test_type = “pre” ; VAR q1 – - q40 ;

  39. Descriptive statistics output from PROC CORR Check for data entry errors, restriction in range, low variance

  40. My alpha is not very good and I am sad

  41. Item Analysis (continued)Are two different factors being measured?

  42. Inspect the correlation matrix Items with negative item-total correlations are not intercorrelated

  43. The General Linear Model It really is general. You may now jump for joy at this obvious revelation.

More Related