1k likes | 1.21k Vues
Evidence Based Medicine MDCN 440. Epidemiology Unit Biostatistics April 13, 2010 Jeffrey P Schaefer MSc MD FRCPC. The peril of teaching biostatistics…. Section 1. Types of Data Measures of Central Tendency Measures of Dispersion Expressing Results. Steven Wright.
E N D
Evidence Based MedicineMDCN 440 Epidemiology Unit Biostatistics April 13, 2010 Jeffrey P Schaefer MSc MD FRCPC
The peril of teaching biostatistics…
Section 1 • Types of Data • Measures of Central Tendency • Measures of Dispersion • Expressing Results
Steven Wright • I'm addicted to placebos. I'd give them up, but it wouldn't make any difference.
Types of Data • Nominal • Ordinal • Ranked • Discrete • Continuous
Nominal Data • Data is placed into ‘named’ categories. • E.g. • 1 = pneumonia • 2 = heart disease • 3 = abdominal pain Mathematical analysis usually inappropriate. (exception might be 0 = male, 1 = female)
Ordinal Data • Data relates to a logical order. • Example: • 5 = fatal • 4 = severe • 3 = moderate • 2 = mild • 1 = none • Mathematical analysis usually inappropriate. Does mild + moderate = fatal?
Ranked Data • Data relates to position within a sequence. E.g. Causes of death… • 1 = cardiovascular disease • 2 = neoplasm • Mathematical analysis usually inappropriate. However, information is usually useful and often quoted.
Discrete Data • Data represents ‘counts’. • E.g. • number of children • number of accidents • number dying of heart failure • Mathematics are appropriate although result may not be. e.g. 2.4 children / family
Continuous Data • Data has any numerical value (ratio data) • E.g. • cholesterol values • blood pressures • Mathematics is usually appropriate. e.g. Average hemoglobin was 120 g/l
Who cares? • Mathematical (biostatistical) analysis requires that we know the nature of the data. • Reminds us about the nature of scoring systems.
e.g. Chi Square • Cross-Sectional survey: • Exercise Stress Test Status (counts) • Sex (counts)
difference IS statistically significant • may not use t-test for this situation.
Staging of Heart Failure NYHA Cardiac Status • Class I: uncompromised • Class II: slightly compromised • Class III: moderately compromised • Class IV: severely compromised • updated from old NYHA Classification • ‘usual activities’ ‘minimal exertion’
Specific Activity ScaleGoldman Circulation 64:1227, 1981 Stage I • patients can perform to completion any activity requiring 7 metabolic equivalents • can carry 24 lb up eight steps • carry objects that weigh 80 lb • do outdoor work [shovel snow, spade soil] • do recreational activities [skiing, basketball, squash, handball, jog/walk 5 mph]
Specific Activity ScaleGoldman Circulation 64:1227, 1981 Stage II • patients can perform to completion any activity requiring 5 metabolic equivalents • have sexual intercourse without stopping • garden, rake, weed, roller skate • dance fox trot, walk at 4 mph on level ground • but cannot and do not perform to completion activities requiring 7 metabolic equivalents
Specific Activity ScaleGoldman Circulation 64:1227, 1981 Stage III • patients can perform to completion any activity requiring 2 metabolic equivalents • dress, shower without stopping, strip and make bed, clean windows • walk 2.5 mph, bowl, play golf, dress without stopping • but cannot and do not perform to completion any activities requiring 5 metabolic equivalents
Specific Activity ScaleGoldman Circulation 64:1227, 1981 Stage IV • patients cannot or do not perform to completion activities requiring 2 metabolic equivalents • CAN’T: • dress without stopping • shower without stopping • strip and make bed • walk 2.5 mph • bowl, play golf
Prognosis varies with ClassStage IV NOT 4 X more serious than stage I heart failure.
Steven Wright • Boycott shampoo! Demand the REAL poo!
Measures of Central Tendency • Mean • Median • Mode • others exist • truncated mean • geometric mean • weighted mean
Mean • Average sum of all observations -------------------------------------- number of observations 2, 3, 6, 8, 10, 12 41 / 6 = 6.83333
Truncated Mean • Truncated Mean sum of all observations (restricted in some way) --------------------------------------------------------------- number of permitted observations 42, 56, 69, 43, 53, 55, 56, 99 (mean = 59.1) e.g. remove highest and lowest number 56, 69, 43, 53, 55, 56 (truncated mean = 55.3)
Note: I hate this nomenclature and will avoid its use. We are doctors; we have our own code!
Median • The 50th percentile (or ‘middlemost’ value). 3, 6, 7, 19, 10, 13, 2, 1, 21, 4, 22 1, 2, 3, 4, 6, 7, 10, 13, 19, 21, 22 Median = 7 (Use Average of the Two Middle Values if Even Number of Observations) 1, 2, 3, 4, 6, 6, 7, 10, 13, 19, 21, 22 Median = (6 + 7)/2 = 6.5
Mode • Most common value. 3, 6, 7, 4, 19, 4, 10, 13, 10, 2, 1, 21, 4, 22 Mode = 4
Measures of Central Tendency • Medicine and Health • mainly mean and median • Mean: • sensitive to outliers • does not convey multimodal distributions • Median: • less intuitive • less suitable for mathematical analysis
Hospital Length of Stay: typical example of where a few patients (e.g. complication of surgery) requires longer stays
Normal Distribution • mean = median = mode • bell shaped (single peak) and symmetrical
Steven Wright • If at first you don't succeed, destroy all evidence that you tried.
Measures of Dispersion (variability) • Range • Variance • Standard Deviation • Standard Error • Confidence Intervals
Range • The difference between largest and smallest values. (Usually expressed as smallest to largest) 2, 4, 6, 10, 12, 14, 17, 20 range = 18 The range was 2 to 20.
Interquartile Range • the distance between the 25th percentile and the 75th percentile 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 IQR = 4 to 9
Variance (sample) 115, 116, 118, 114, 117 mean = 116 range = 3 44, 80, 110, 180, 166 mean = 116 range = 136 Range is helpful but depends only on two numbers.
Variance (sample) 115, 116, 118, 114, 117 mean = 116 range = 3 observations 115 116 118 114 117 mean 116 116 116 116 116 difference - 1 0 2 -2 1 sum = 0 diff squared 1 0 4 4 1 sum = 10 divide by obs 10 / (5-1) = 2.5 = variance take square root of variance = √2.5 = 1.58 std dev
Variance (sample) 44, 80, 110, 180, 166 mean = 116 range = 3 observations 44 80 110 180 166 mean 116 116 116 116 116 difference -72 -36 -6 64 50 sum = 0 diff squared 5184 1296 36 4096 2500 sum=13,112 divide by obs 13,112 / (5-1) = 3,278 = variance take square root of variance = √3,278 = 57.3 std dev
Normal Distribution • +/- 1 sd 66% +/- 2 sd 95% +/- 3 sd 99.7%
Variance (Population) • Variance of a Population • population is where everyone is measured • denominator = number of observations • Variance of a Sample • a sample of the population is selected • denominator = number of observations - 1
Standard Error • Imagine a data set with 1,000 values • Select 100 values, calculate mean • Select 100 values, calculate mean • Select 100 values, calculate mean • Select 100 values, calculate mean • and so on, and so on… • Plot the means • Calculate the standard deviation of these means
Standard Error Another method: Standard Dev / √ sample size
Confidence Interval • General Formula: 95% Confidence Interval = mean – (1.96 x Standard Error) to mean + (1.96 x Standard Error)
So what does this actually mean? • Confidence Interval • the range over which the TRUE VALUE is covered 95% of the time.
Steven Wright • Everyone has a photographic memory. Some just don't have film.
Expressing Our Results • Outcome Measures • Point Estimate • Confidence Interval
Medical Outcomes • Harm? • Diagnosis? • Therapy? • Prognosis? • Prevention?