1 / 8

Logical Line Fitting: One Step in the EDA Process

Logical Line Fitting: One Step in the EDA Process. by Shannon Guerrero Northern Arizona University NCTM 2008 Annual Meeting & Exposition Salt Lake City, UT April 2008. EDA (Exploratory Data Analysis). Mostly graphical approach to data analysis

tevin
Télécharger la présentation

Logical Line Fitting: One Step in the EDA Process

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logical Line Fitting: One Step in the EDA Process by Shannon Guerrero Northern Arizona University NCTM 2008 Annual Meeting & Exposition Salt Lake City, UT April 2008

  2. EDA(Exploratory Data Analysis) • Mostly graphical approach to data analysis • Emphasizes uncovering underlying structure of data, extract important variables, detect outliers/anomolies, test underlying assumptions, maximize insight into data set • Graph the data, graph the data, graph the data • Focus on sense-making rather than theory

  3. Why curve fitting? • Applications in data analysis & algebra • “Analyses of the relationships between two sets of measurement data are central in high school mathematics” (p. 328 NCTM PSSM) • modeling, prediction, symbolic representation, correlation, regression, residuals

  4. “Line of Best Fit” • Explains relationship between two variables with a straight line that “best fits” the data • Line may pass through some, none, or all of the points • Used to predict future values from existing values (interpolate vs extrapolate)

  5. Outliers • An observation that lies outside the overall pattern of a distribution • For one variable, a convenient def’n is a point that falls more than 1.5 times the IQR above the 3rd quartile or below the 1st quartile • Examine outliers carefully and understand their appearance in your data set • Need to decide what to do with outliers – include or discard?

  6. Curve Fitting vs. Regression • Power of curve fitting often lost as we revert right to regression calculations • Curve fitting is more general and an approximation • Equation found (using either method) can help uncover underlying structure of data, predict future values from past ones, model causal relationships, and maximize insight into a data set

  7. Linear Regression • Statistical approach to finding relationship between two variables • Least squares regression attempts to minimize the squared residuals (residual – difference between observed value and value given by model) • Assumption: for a fixed value of x the value of y is normally distributed with equal variations across x

  8. r2 and residuals • residual – difference between an observed value and value predicted by regression line • residual plot is a scatterplot of regression residuals against the explanatory variable • helps us assess fit of regression line • r2 is another way to assess how well the line fits the data (the closer to 1 the better the fit)

More Related