160 likes | 177 Vues
Learn how scatterplots can effectively display data and visually identify associations between two sets of variables. Understand the direction, form, and strength of the association, and explore the concept of correlation. Discover how to straighten scatterplots and interpret the correlation coefficient. Analyze real-world examples of roller coaster heights and ride durations, as well as the positions and distances of planets in the solar system.
 
                
                E N D
Chapter 7Scatterplots, Association, and CorrelationStats: modeling the worldSecond edition Raymond Dahlman IV
Scatterplots • Scatter plots are an effective way of displaying data, and lets us to visibly see if there is an association between 2 quantitative sets of variables. • When looking at a scatter plot, we look for the direction, form, strength, and any unusual features like outliers in the data
Features of Scatterplots • Direction • The direction of data determines whether the variables have a positive or negative correlation
Features of Scatterplots • Form • The form, or the overall shape of the data, tells us how the two data sets are related, either linearly or by a non-linear function.
Features of Scatterplots • Strength • The strength of the data is the relationship of each data point to the overall expected value, the line of best fit.
Determining Variables • When presented with 2 sets of data, it is necessary to assign each variable to its respective category. The data set that you believe that determines the other is the independent, or explanatory variable. The other variable, is the dependent, or response variable. • When graphing, the explanatory variable is the x-axis, while the response variable is the y-axis.
Correlation • Correlation is the measurement of the linear strength between 2 quantitative sets of data. • To find the measure of correlation, or , we use the correlation coefficient.
Correlation Conditions • Because the correlation coefficient is only useful for linear trends, the data must follow these conditions. • The Quantitative Variables Condition: The 2 variables must be quantitative data in the appropriate units • The Straight Enough Condition: The data must have a linear trend. If it does not, there are methods to straighten the data. • The Outlier Condition: Be aware of outliers that could potentially distort the correlation coeficant.
Correlation Properties • Correlation is always with the sign of determining if is a negative or positive correlation. • When r is equal to 1 or -1, all the data points fall along a single straight line, a rare occurrence. • A correlation near 0 corresponds to a weak linear association. • Correlation treats the variables symmetrically, where x:y is equal to y:x • Correlation only depends on z-scores, and will not change unless the individual z-scores are changed.
Straightening Scatterplots • When a scatterplot shows a bent from that consistently increases or decreases, we can often straighten the form of the plot by re-expressing one or both variables, enabling us to apply the correlation coefficient.
Problem #33 People who responded to a July 2004 Discovery Channel poll named the 10 best roller coasters in the United States. The table below shows the length of the initial drop (in feet) and the duration of the ride (in seconds). What do these data indicate about the heght of a roller coaster and the length of the ride you can expect?
After making a scatterplot, we must determine the mean and standard deviation. Then we will take these values and change them into z-scores using z = Now, we can plug this into our correlation coefficient equation, and we come up with a r value of 0.35 The overall all trend is a weak positive correlation between the drop length and the duration of the ride.
Problem #35 Is there any pattern to the locations of the planets in our solar system? The table shows the average distance of each of the nine planets from the sun. a) Make a scatterplot and describe the association. (Remember: direction, form, and strength!) b) Why would you not want to talk about the correlation between planet position and distance from the sun? c) Make a scatterplot showing the logarithm of distance vs. position. What is better about this scatterplot?
A: The relation between the position and distance is non-linear with a positive association. There is very little scatter in the data. • B: The relation is not linear.
C: The relation between the position and the log of the distance appears to be roughly linear.