1 / 27

Better Than Average

Better Than Average. Finding Geometric Means Using SAS. What are we trying to say with an “average”?. Image: source. Expected Value. Common Types of “Averages”. Median: Middle element of ordered data Mode: Value most often seen in a data set

kdawson
Télécharger la présentation

Better Than Average

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Better Than Average Finding Geometric Means Using SAS

  2. What are we trying to say with an “average”? Image: source Expected Value

  3. Common Types of “Averages” Median: Middle element of ordered data Mode: Value most often seen in a data set • Main advantages: not influenced by extreme values, can be used for any type of distribution • Main disadvantage: Insensitive, gives no information about your distribution Images: source

  4. Common Types of “Averages” Arithmetic Mean: Calculated from the sum of values divided by the number of values in a data set • Main advantages: Easy to understand and uses all observations, can give information about your distribution • Main disadvantages: Easily skewed by outliers and inaccurate if working with non-Normal data Image: source

  5. What is a geometric mean? Geometric Mean: Calculated by taking the nth root of the product of n positive observations in a data set • Main advantages: Precise, but not influenced by extreme values • Main disadvantages: More difficult to understand, all values must be non-zero and positive Image: source

  6. Geometric Series: Each number increases by the same proportion (3) (3, 9, 27, 81, 243)

  7. When should I use the geometric mean? • Non-Normal/skewed data • Ratios or proportions/scaled data • Small sample sizes Source

  8. Bioassays/dose-response curves Population growth Image source Image source Compounding interest Image source Decay rates Image source Scaled bioequivalence Image source Survival analysis Image source

  9. ..but data is messy! Use your judgement in selecting your “average.” Image source Image source Image source

  10. What if my data contains zeroes? • Adjust your scale so that you add 1 to every number in the data set, and then subtract 1 from the resulting geometric mean. • Ignore zeros or missing data in your calculations. • Convert zeros to a very small number (often called “below the detection limit”) that is less than the next smallest number in the data set.

  11. What if my data contains negative numbers? • If all values are negative, simply convert all values to positive numbers before calculating the geometric mean. Then assign the resulting geometric mean a negative value. • If your data set contains both positive and negative values, you will have to separate them and find the geometric means for each group, and you can then find the weighted average of their individual geometric means to find the total geometric mean for the full data set.

  12. How can I use SAS to compute geometric means? • Geomean() or Geomeanz() functions • PROC SURVEYMEANS • Manual calculations Source

  13. Finding the geometric mean for an observation/row: Geomean() or Geomeanz() Functions Returns the geometric mean of a numeric constant, variable, or expression • If any arguments are negative, result is a missing value • If any arguments are zero, result is zero • Fuzzes the values of arguments that are extremely small and approximately zero—if you do not want this, use the geomeanz() function • Skips missing values

  14. DATA my_data; inputstudyid var1 var2 var3 var4 var5; geometric_mean = geomean(of var1-var5); *Calculates geometric mean; datalines; 1 102.3 96.2 88.9 100.4 101.7 2 87.6 85.4 88.3 89.9 82.3 3 100.5 72.9 95.6 98.7 89.2 4 101.1 102.8 101.7 100.9 100.5 5 95.6 92.4 96.7 95.9 98.1 ; run; PROCPRINTdata=my_data; idstudyid; run; Note that you can use “OF” for a list of variables

  15. Finding the geometric mean for a population/column: PROC SURVEYMEANS The geomean option within PROC SURVEYMEANS returns the geometric mean of the specified variables. *Values must be non-zero and positive You can also request confidence limits for the geometric mean • GMCLM requests the 2-sided confidence limits • LGMCLM requests the 1-sided lower confidence limit • UGMCLM requests the 1-sided upper confidence limit

  16. PROCSURVEYMEANSdata=my_data geomean; var var1 var2 var3 var4 var5; run;

  17. Which measure of variation should I use? • Standard error: how precise is the calculation of the geometric mean • Standard deviation: how spread out is the data around the geometric mean • Coefficient of variation: how does the variation for this geometric mean compare with another data set

  18. Geometric Standard Error Output by default in PROC SURVEYMEANS

  19. Geometric Standard Deviation • Find the natural log of your variable using the log() function in the DATA step: DATA my_data2; setmy_data; ln_var1 = log(var1); *Calculates the natural log of variable 1; run;

  20. Geometric Standard Deviation 2. Use PROC MEANS to find the arithmetic mean and standard deviation of your newly log-transformed variable: PROCMEANSdata=my_data2 meanstddev; *Specifies output; var ln_var1; outputout=meansout mean=a_mean stddev=a_stddev; *Creates new data set; run;

  21. Geometric Standard Deviation 3. Exponentiate the arithmetic mean and standard deviation to find the geometric mean and geometric standard deviation, using the EXP() function in the DATA step: DATA my_data3; setmeansout; geo_mean = exp(a_mean); *Converts to geometric mean; geo_stddev = exp(a_stddev); *Converts to geometric standard deviation; run; PROCPRINTdata=my_data3 noobs; vargeo_mean geo_stddev; run;

  22. Applying the Geometric Standard Deviation Geometric Mean ± Geometric Standard Deviation =INCORRECT!

  23. Applying the Geometric Standard Deviation The geometric standard deviation is multiplicative, NOT additive: Lower bound = geometric mean ÷ geometric standard deviation =97.2637 ÷ 1.06605 = 91.2375 Upper bound = geometric mean x geometric standard deviation =97.2637 x 1.06605 = 103.6880 Resulting range for one geometric standard deviation is (91.24, 103.69)

  24. Geometric Coefficient of Variation Reduce the geometric standard deviation to the power of the reciprocal of the geometric mean in the DATA step: DATA my_data4; setmy_data3; geo_cv = geo_stddev**(1/geo_mean); *Calculates geometric CV; run; PROCPRINTdata=my_data4 noobs; vargeo_cv; run;

  25. To sum up: • Use geometric means for data that is lognormal or uses ratios or proportions • Make sure your values are non-zero and positive • Use your judgement in choosing the mean to express your expected values when working with messy data Source

  26. Questions?

  27. Contact Information • Name:Kimberly Roenfeldt • Company: Henry M Jackson Foundation for the Advancement of Military Medicine • City/State: San Diego, CA • Phone: (619) 767-4584 • Email:kimroenfeldt@gmail.com

More Related