1 / 10

R Programming

R Programming. Data Analysis Module: Correlation and Regression. Data Analysis Module. Basic Descriptive Statistics and Confidence Intervals Basic Visualizations Histograms Pie Charts Bar Charts Scatterplots Ttests /Bivariate testing One Sample Paired Independent Two Sample ANOVA

wmetz
Télécharger la présentation

R Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R Programming Data Analysis Module: Correlation and Regression

  2. Data Analysis Module • Basic Descriptive Statistics and Confidence Intervals • Basic Visualizations • Histograms • Pie Charts • Bar Charts • Scatterplots • Ttests/Bivariate testing • One Sample • Paired • Independent Two Sample • ANOVA • Chi Squareand Odds • Regression Basics

  3. Data Analysis Module: Correlation and Regression

  4. Data Analysis Module: Correlation and Regression • Correlation coefficients assess strength of linear relationship between two quantitative variables. • The correlation measure ranges from -1 to +1. • A negative correlation means that X and Y are inversely related. • A positive correlation means that X and Y are directly related. • zero correlation means that X and Y are not linearly related. • A correlation of +1 indicates X and Y are directly related and that all the points fall on the same straight line. • A correlation of -1 indicates X and Y are inversely related and that all the points fall on the same straight line • Plot Scatter Diagram of Each Predictor variable and Dependent Variable • Look of Departures from Linearity • Look for extreme data points (Outliers) • Examine Partial Correlation • Can’t determine causality, but isolate confounding variables

  5. Data Analysis Module: Correlation and Regression For example, lets take two variables and evaluate their correlation…open the stats98 dataset in Excel… What would you expect the correlation of the Verbal SAT scores and the Math SAT scores to be? Why? What would you expect the correlation of the Math SAT scores and the percent taking the test to be? Why?

  6. Data Analysis Module: Correlation and Regression What would you expect the correlation of the Verbal SAT scores and the Math SAT scores to be? Why?

  7. Data Analysis Module: Correlation and Regression What would you expect the correlation of the Math SAT scores and the Percent of HS students that took the test? Why?

  8. Data Analysis Module: Correlation and Regression

  9. Data Analysis Module: Correlation and Regression From the previous slide, the “regression line” has been imposed onto the relationship between Price and Age of car. The equation of this line takes the general form of y=mx+b, where: • Y is the dependent variable (Price) • M is the slope of the line • X is the independent variable (Age) • B is the Y-intercept. When we discussion regression models, we transform this equation to be: Y = bo + b1x1 + …bnxn Where bo is the y-intercept and b1 is the slope of the line. The “slope” is also the effect of a one unit change of x on y.

  10. Data Analysis Module: Correlation and Regression • From the previous slide, the model equation is presented in the form of the equation of a line: y=-924x + 12320. • From this, we would say: • For every 1 year of a car’s age, there is a $924 decrease in theprice of the car. • Every car “starts” at $12,320. • If a car is 2 years old, the expected price is $10,472. • That R2 value of .8937 is interpreted as “89.37% of the change in price of the cars can be explained by this linear model, where age is the only predictor”.

More Related