Demystifying Statistical Biases in Animal Populations

The Myth of Small Numbers (and Other Sources of Bias)
Bob L. Larson, DVM, PhD Kansas State University

I. Normal Variation

½ below the mean ½ above the mean Mean ± standard deviation What does Normal variation mean to animal populations?

What does Normal variation mean to animal populations? 68% within 1 SD of the mean Mean ± 1 standard deviation

What does Normal variation mean to animal populations? 95% within 2 SD of the mean Mean ± 2 standard deviations

What does Normal variation mean to animal populations? 99.7% within 3 SD of the mean Mean ± 3 standard deviations

Something is wrong ! I must do something different Everything is OK I must be doing everything right Goal Tempting Way To Think ! (time to raise fees)

Product / Output Naïve View of Biology (Animal Population) People Methods Material Environment Equipment

Real View of Biology (Animal Population ) People Methods Material Environment Equipment Product / Output

What is the mean? 2. What is the standard deviation? 3. Where does this data fall? X Alternate Way To Think

Law of Large Numbers With increasing data points, the sample mean and distribution approaches the true population mean and distribution. 1,500 data points randomly drawn from a larger population will give a mean and distribution that is basically equivalent to that entire population (used by pollsters).

Results of Demonstration n=50 Mean = 48.88% SD = 6.82 Mean +/-1 SD = 35.2-62.5%

Myth of Small Numbers Myth = a small data set tells me something about the population Reality = small data sets can be very misleading

Results of Demonstration n=5 Mean = 44.00% A small number of samples is not Normally distributed A small number of samples gives no indication of Mean or Standard Deviation

Results of Demonstration n=15 Mean = 46.53% SD = 6.91 Mean +/-1 SD = 32.7-60.4%

Myth of Small Numbers So how do I make a wise conclusion without data or with a small amount of data? You are guessing You can’t… Luckily, we are seldom proven wrong. Unfortunately, guessing doesn’t differentiate you from other people providing the same service

Errors in Research Findings (and Reasoning)

Errors in Research Findings “Mistakes” Wrong study design, wrong analysis strategy, conclusions that do not follow from results, etc. Extremely common in medical research literature Statistical (chance) errors Type I and Type II errors Bias A systematic error that causes a conclusion to be incorrect Myth of Small Numbers & Other Issues Myth of Small Numbers & Other Issues

Bias & Confounding (a form of bias) Issues of internal validity This is important ! So you don’t get fooled by wrong information from research studies due to bias or confounding

Bias & Confounding (a form of bias) Issues of internal validity Bias Systematic error (vs. random error) that results in mistaken conclusions regarding the relationship between the exposure (or explanatory factors) and the outcome Random (non-systemic) errors not bias – these errors are randomly distributed amongst groups/observations Lack of bias →internal validity

Bias & Confounding (a form of bias) Issues of internal validity Bias Confounding The mixing of the effects of one risk factor with another Identifying a spurious relationship between a risk factor and a disease that is due to the effects of a separate factor

Bias & Confounding (a form of bias) Why is this important to understand? To understand what research studies mean (and don’t mean) – to be an educated consumer of research information Clinical practice – To better understand the causes of disease and appropriate treatments Research – To use appropriate study design, analysis, and interpretation

Bias & Confounding (a form of bias) Why is this important to understand? Bias and confounding can (and do!) completely distort study results and can lead to interpretations that are completely wrong!! Why does this happen? Multi-factorial nature of disease Lack of understanding of the roles of bias and confounding by researchers and clinicians

Bias & Confounding (a form of bias) Challenge for researchers and health practitioners? Obtain valid study results i.e. results that represent the true nature of the relationship between exposure and disease This requires consideration of all possible errors due to bias and/or confounding

Bias & Confounding (a form of bias) How to control for bias and confounding Appropriate study design Statistical analytic techniques Understanding and accommodating for limitations (don’t over-interpret!)

Bias & Confounding (a form of bias) Take Home Beware! When you read results from a health study…an apparent link between a risk factor and a disease may be real, or just an anomaly of how the study was done.

Bias Three main types of bias Selection bias Information bias Confounding (not always considered ‘bias’)

Bias Threemain types of bias Selection bias Information bias Confounding (not always considered ‘bias’) Note: lack of generalizability not usually considered type of bias “Bias” usually related to internal validity Generalizability related to external validity

Bias Three main types of bias Selection bias Information bias Confounding (not always considered ‘bias’) Note: lack of generalizability not usually considered type of bias “Bias” usually related to internal validity Generalizability related to external validity Distinctions between types of biases not always clear-cut

SelectionBias

Selection Bias Distortion in the estimate of a relationship between exposure and disease that is the result of how subjects are selected for the study Distortions that arise from... The procedures used to select subjects Factors that influence study participation

Selection Bias Distortions that arise from… The procedures used to select subjects Factors that influence study participation Systematic error in selecting subjects… If relationship between exposure/explanatory factor and disease is different… Between cases and controls Between participants and those who should be eligible for the study but don’t participate (are not selected) Examples are…

Selection Bias Self-selection bias Self-selection may be associated with the outcome under study Volunteers may be more likely to have disease you are interested in e.g. If one did a survey on dogs with leg pain – would owners who have recognized lameness in their dogs be more likely to respond to the survey?

Selection Bias Problematic in selecting control group Want them to differ only on the exposure (for cohort and some cross-sectional studies) Want them to differ only on outcome (for case-control and some cross-sectional studies) By excluding animals that don’t have the exposure or outcome of interest – we are at risk of creating selection bias

Selection Bias Diagnostic bias Also occurs before subjects are identified for study Diagnoses may be influenced by veterinarians’ knowledge of exposure Legitimate for process of diagnosis and treatment, but inconvenient for research This makes medical records and data-bases less valuable for research than one might think (case-control and retrospective cohort)

Selection Bias Response bias Differential loss to follow-up Differential consent rates Especially problematic in prospective cohort studies Retrospective cohorts also require ascertainment of outcome in cohort

Selection Bias vs. Selective Sample Selection bias Selective differences between groups that impacts the relationship between explanatory factors/exposure and outcome Violates internal validity Selective sample Strict inclusion / exclusion criteria Not a threat to internal validity (may enhance internal validity) Not necessarily representative of population as a whole Potentialthreat to external validity

Controlling Selection Bias Appropriate study design Random selection of subjects from subject “pool” Once study is completed (or even started) – it is impossible to correct for selection bias Study should be thrown out/destroyed/ignored But it probably won’t be

Information Bias

Information Bias Method of gathering information which yields systematic errors regarding exposures and outcomes Using an “invalid” measure e.g., database that has not been validated Is this information bias? Yes, if information is more likely to be wrong for one group than for another Some would consider “not biased” if inaccuracies randomly distributed

Information Bias Method of gathering information which yields systematic errors regarding exposures and outcomes Using an “invalid” measure e.g., database that has not been validated Is this information bias? Yes, if information is more likely to be wrong for one group than for another Some would consider “not biased” if inaccuracies randomly distributed Either way, study is seriously flawed

Information Bias Examples: Misclassification bias Observer or Interviewer bias Recall bias Reporting bias (wish bias) Surveillance bias Observer bias

Information Bias Examples: Misclassification bias Observer or Interviewer bias Recall bias Reporting bias (wish bias) Surveillance bias Observer bias Measurement error Loss to follow-up

Information Bias Misclassification of exposures A problem that occurs when study subjects are erroneously categorized according to the disease and/or the exposure being studied

Information Bias Misclassification of exposures Differential Proportion of misclassified depends on exposure e.g., exposure = exposure to “kennel cough” Owners of dogs who develop kennel cough are more likely to identify any potential exposure than those who do not (recall bias)

Information Bias Misclassification of exposures Differential Proportion of misclassified depends on exposure Non-differential Misclassification independent of exposure (same between treatment and control) Acts to dilute true effects Can also act to inflate effects (non-effects)

Information Bias Misclassification of outcome e.g., outcome = failure to thrive exposure = use of animal health products Owners asked about “failure to thrive” Dogs conscientiously treated with animal health products may be misclassified more often as failure to thrive (intensity of owner interaction) Differential misclassification of outcome

Information Bias Observer or Interviewer Bias Recall bias Problem for retrospective studies (case-control and cohort). Study subjects are required to report specific experiences or exposures that happened in the past Cases are more likely to recall potential exposures than are controls Higher OR than the true association - can result in a study that shows an exposure ‘causing’ an outcome even if it does not

Information Bias Observer or Interviewer Bias Recall bias Problem for retrospective studies (case-control and cohort). Study subjects are required to report specific experiences or exposures that happened in the past Cases are more likely to recall potential exposures than are controls 1999 Febreze question (or just internet hoax)

Information Bias Observer or Interviewer Bias Recall bias Reporting bias (aka Wish bias) Owners and veterinarians my see what they want to see (and not report what they are embarrassed to share) Important concern in case-control studies and poorly blinded experimental trials

Information Bias Observer or Interviewer Bias Recall bias Reporting bias (aka Wish bias) Surveillance bias Animals with a particular exposure may be more closely monitored than animals without the exposure e.g. Women with a familiar history for breast cancer may be more closely monitored for breast cancer than the general population – therefore, even if no familial risk exists – the closely monitored population is more likely to be diagnosed

Information Bias Observer or Interviewer Bias Recall bias Reporting bias (aka Wish bias) Surveillance bias Animals with a particular exposure may be more closely monitored than subjects without the exposure Important concern in cohort studies

Information Bias Observer or Interviewer Bias Recall bias Reporting bias (aka Wish bias) Surveillance bias Observer bias Potential problem where judgment is required in assessing exposure or outcome Important concern anytime medical records are used (veterinary personnel are not ‘blinded’ to history, other exposures, etc.)

Information Bias Observer or Interviewer Bias Recall bias Reporting bias (aka Wish bias) Surveillance bias Observer bias Measurement bias (error) Use of invalid or poorly validated measurements (i.e. diagnostic test for either the exposure or outcome)

Controlling Information Bias Appropriate study design Careful collection of information/data Blinding

Blinding of Treatment Groups Single-blinded trial The person giving the treatment (owner, technician, veterinarian) is blinded (does not know which treatment the animal received) Often by use of placebo

Blinding of Treatment Groups Single-blinded trial Double-blinded trial Both the person giving the treatment and the person evaluating the animal are unaware of which treatment each animal received

Blinding of Treatment Groups Single-blinded trial Double-blinded trial Triple-blinded trial The person giving the treatment, the animal evaluator, and the diagnostician or statisticianare all unaware of which treatment is received

Blinding of Treatment Groups Assures that patients in different treatment groups are not assessed differently – a potential source of bias (“wish” bias) Errors due to patient pre-conceptions or investigator bias will be avoided or equally distributed between treatment groups

Blinding of Treatment Groups It is not always possible to blind treatments e.g. if the ‘treatment’ is surgery Fatal error non-blinded + subjective outcome

Controlling Information Bias Appropriate study design Careful collection of information/data Blinding Limit interpretation of flawed study VERY WEAK evidence of anything

Bias & Confounding (a form of bias) Take Home Beware! When you read results from a health study…an apparent link between a risk factor and a disease may be real, or just an anomaly of how the study was done.

Confounding

Confounding A third factor which is related to both exposure and outcome, and which accounts for some/all of the observed relationship between the two Confounder is not a result of the exposure e.g. association between birth rank and Down syndrome confounder = mother’s age

Confounding Exposure ? Disease

Confounding ANOTHER PATHWAY TO GET TO THE DISEASE (a mixing of effects) Exposure ? Confounding Variable Disease

Confounding Confounder is distributed differently between exposed and un-exposed populations Exposure ? Confounding Variable Disease

Confounding Exposure ? Confounding Variable Confounder must be a risk factor or surrogate for a cause of the disease, independent of the exposure of interest Disease

Confounding Confounder can not be in the causal pathway between the exposure and disease (that is Interaction) Exposure ? Confounding Variable Disease

Birth order & Down Syndrome Risk

Maternal age & Down Syndrome Risk

Birth order & Down Syndrome Risk

Confounding Exposure= lighter in front pocket Outcome= lung cancer Smoking a confounder? Is smoking associated with lung cancer (outcome)? Y/N Is smoking associated with carrying a lighter (i.e. is smoking unequally distributed between people who do and don’t carry lighters)? Y/N If yes to both – smoking is a potential confounder

Confounding Exposure= fed colostrum/milk replacer Outcome = diarrhea Dystocia a confounder? Is dystocia associated with diarrhea (outcome)? Y/N Is dystocia associated with colostrum feeding (i.e. is dystocia unequally distributed between colostrum-fed and non-colostrum-fed animals) Y/N If yes to both – dystocia is a potential confounder

Methods to Prevent Confounding Exposure X ? Confounding Variable X Disease

Methods to Prevent Confounding In RCT, random allocation controls for confounding If it is possible to randomize – do it….it is the best method to reduce confounding Distribution of any variable is theoretically the same in the exposed and unexposed groups This is almost always true with large samples (but may violated with small sample size)

Methods to Prevent Confounding Exposure X ? Confounding Variable Disease

Randomization to Reduce Confounding Exposed Applicable only for intervention (experimental) studies Randomization controls for both known and unknown confounding factors! Because distribution of any variable theoretically the same across randomization groups Other methods to control confounding can only deal with known (suspected) confounders Randomize All subjects Unexposed

Randomization to Reduce Confounding Exposed Applicable only for intervention (experimental) studies Randomization controls for both known and unknown confounding factors! Does not, however, always eliminate confounding! By chance alone, there can be imbalance Less of a problem in large studies Techniques exist to ensure balance of certain variables Randomize All subjects Unexposed

Methods to Prevent Confounding In RCT, random allocation controls for confounding In other study types - control confounding by: Exclusion criteria (aka Restriction or Specification) Restrict enrollment to only those subjects who have a specific value/range of the confounding variable e.g. when age is confounder – include only subjects of same narrow age range

Methods to Prevent Confounding In RCT, random allocation controls for confounding In other study types - control confounding by: Exclusion criteria (aka Restriction or Specification) Restrict enrollment to only those subjects who have a specific value/range of the confounding variable Not always effective, reasonable, practical, or useful to exclude all potential confounders

Restriction to Reduce Confounding Exposure X ? Confounding Variable X Disease

Methods to Prevent Confounding In RCT, random allocation controls for confounding In other study types - control confounding by: Exclusion criteria (aka Restriction or Specification) Advantage – very straight forward Disadvantages Reduces the number of animals who are eligible Takes more work to sift through animals to find those with the level of confounder you want (inefficient) Moderate restriction may not work Reduce the generalizability of the study

Methods to Prevent Confounding In RCT, random allocation controls for confounding In other study types - control confounding by: Exclusion criteria(aka Restriction or Specification) Matching Advantages Good for complex nominal variable (e.g. herd/kennel) Statistical precision because number of cases and controls is balanced

Methods to Prevent Confounding In RCT, random allocation controls for confounding In other study types - control confounding by: Exclusion criteria(aka Restriction or Specification) Matching Disadvantages Finding matches may be difficult or time-consuming May have to throw out a case if an appropriate match cannot be found In a case-control study – the factor used to match subjects cannot be evaluated as a risk factor Decisions are irrevocable – if you match on an intermediary factor, you lose ability to evaluate it

Methods to Prevent Confounding In RCT, random allocation controls for confounding In other study types - control confounding by: Exclusion criteria Matching Statistical analysis

Statistical Control of Confounding Multivariable analyses Any analysis technique that simultaneously adjusts for several variables Known potential confounders should be included in the model ANCOVA, MANCOVA Generalized Linear Models Multiple linear regression Multivariate logistic regression Multivariate Cox Proportional Hazards Regression etc.

Multivariable results Provides relationship between outcome and exposure, adjusted for the potential confounders such as: Gender Age Herd Diet Health related factors (body condition score, etc.) Other diseases, health conditions etc. Must have gathered the information during the experiment!

Statistical Control of Confounding Multivariable analyses ANCOVA, MANCOVA Generalized Linear Models Multiple linear regression Multivariate logistic regression Multivariate Cox Proportional Hazards Regression Etc. Stratification Stratify results by confounder

Statistical Control of Confounding Stratification Stratify results by confounder Create strata that are homogenous with respect to the different levels of the confounder Results in a mini-restriction within each strata Effective for small number of confounders

Stratification of Results Example: Exposure = gender Outcome = acceptance to professional school At a particular university, 38.5% of women applying to the professional schools (medicine, veterinary medicine, and law) and 47.9% of men applying to professional schools are admitted. Is there evidence for a gender-bias lawsuit (prior cases used a 15% difference between genders to establish sex-bias)?

Stratification of Results Example: Exposure = gender Outcome = acceptance to professional school Non-stratified Results Gender % Acceptance Male 47.9% Female 38.5% 19.6% reduction in acceptance risk

Stratification to Reduce Confounding Gender Types of Professional Schools ? Professional School Acceptance

Stratified Results

Statistical Control of Confounding Stratification Stratify results by confounder

Stratification: Gender, Application School, & Likelihood of Acceptance

Stratification: Gender, Application School, & Likelihood of Acceptance Interpretation: Men are 48% more likely to be admitted to professional school compared to women (statistically significant since 95% CI does not include 1)

Stratification: Gender, Application School, & Likelihood of Acceptance Interpretation: No statistical difference between men and women for admission to medical school (women with numerical advantage)

Stratification: Gender, Application School, & Likelihood of Acceptance Interpretation: No statistical difference between men and women for admission to vet school (women with numerical advantage)

Stratification: Gender, Application School, & Likelihood of Acceptance Interpretation: No statistical difference between men and women for admission to law school (women with numerical advantage)

Statistical Control of Confounding Stratification Stratify results by confounder Create a single un-confounded (adjusted) estimate for the relationship in question Summarize the un-confounded estimates from the two (or more) strata to form a single overall un-confounded “summary estimate”

Stratification: Gender, Application School, & Likelihood of Acceptance

Stratification: Gender, Application School, & Likelihood of Acceptance Interpretation: No statistical difference between men and women for admission to professional school

Controlling Confounding Study design Randomization (in clinical trials) Restriction (known confounders) Matching (known confounders) Data analysis Multivariate analysis (appropriate models) Stratification (analyze data by subgroups) Studies often use a combination of these

Bias, Confounding (a form of bias), and Interaction Take Home Beware! When you read results from a health study…an apparent link between a risk factor and a disease may be real, or just an anomaly of how the study was done.

Demystifying Statistical Biases in Animal Populations