230 likes | 349 Vues
This guide provides an overview of essential statistical terms and methods used in data analysis. It explains the differences between categorical and continuous variables, as well as concepts such as explanatory and response variables. Various statistical tests, including T-tests and ANOVA, are discussed, highlighting their applications for continuous and categorical data. We also cover hypothesis testing, significance levels, and regression analysis, including assumptions and interpretation of results. This resource is ideal for those seeking to enhance their understanding of statistical analysis.
E N D
A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements along a continuum, such as Flow Velocity What type of variable would “Mottled Sculpin /meter2” be? What type of variable is “Substrate Type”? What type of variable is “% of bank that is undercut”?
A Few Necessary Terms Explanatory Variable: Independent variable. On x-axis. The variable you use as a predictor. Response Variable: Dependent variable. On y-axis. The variable that is hypothesized to depend on/be predicted by the explanatory variable.
Statistical Tests: Appropriate Use For our data, the response variable will always be continuous. T-test: A categorical explanatory variable with 2 options. ANOVA: A categorical explanatory variable with >2 options. Regression: A continuous explanatory variable
Statistical Tests Hypothesis Testing: In statistics, we are always testing a Null Hypothesis (Ho) against an alternate hypothesis (Ha). Test Statistic: p-value:The probability of observing our data or more extreme data assuming the null hypothesis is correct Statistical Significance: We reject the null hypothesis if the p-value is below a set value, usually 0.05.
Student’s T-Test Tests the statistical significance of the difference between means from two independent samples
Compares the means of 2 samples of a categorical variable Mottled Sculpin/m2 Cross Plains Salmo Pond
Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution (histogram) • Samples are independent • Assumed equal variance (boxplot) • No other sample biases • Interpreting the p-value
Analysis of Variance (ANOVA) Tests the statistical significance of the difference between means from two or more independent samples Grand Mean Mottled Sculpin/m2 Riffle Pool Run ANOVA website
Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution • Samples are independent • Assumed equal variance • No other sample biases • Interpreting the p-value • Pairwise T-tests to follow
Simple Linear Regression • What is it? Least squares line • When is it appropriate to use? • Assumptions? • What does the p-value mean? The R-value? • How to do it in excel
Simple Linear Regression Tests the statistical significance of a relationship between two continuous variables, Explanatory and Response
Precautions and Limitations • Meet Assumptions • Observations from data with a normal distribution • Samples are independent • Assumed equal variance • Relationship is linear • No other sample biases • Interpret the p-value and R-squared value.
Residual Plots Residuals are the distances from observed points to the best-fit line Residuals always sum to zero Regression chooses the best-fit line to minimize the sum of square-residuals. It is called the Least Squares Line.
Residual vs. Fitted Value Plots Observed Values (Points) Model Values (Line)
0 Residual Plots Can Help Test Assumptions 0 “Normal” Scatter Curve (linearity) Fan Shape: Unequal Variance 0
R-Squared and P-value High R-Squared Low p-value (significant relationship)
R-Squared and P-value Low R-Squared Low p-value (significant relationship)
R-Squared and P-value High R-Squared High p-value (NO significant relationship)
R-Squared and P-value Low R-Squared High p-value (No significant relationship)
P-value indicates the strength of the relationship between the two variables You can think of this as a measure of predictability R-Squared indicates how much variance is explained by the explanatory variable. If this is low, other variables likely play a role. If this is high, it DOES NOT INDICATE A SIGNIFICANT RELATIONSHIP!