Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Another Regression Line PowerPoint Presentation
Download Presentation
Another Regression Line

Another Regression Line

157 Vues Download Presentation
Télécharger la présentation

Another Regression Line

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Another Regression Line

  2. Statistics 1 EDEXCEL

  3. The reason we had to keep using the phrase “ y on x ” is that there are two regression lines. e.g. If we had we used So far, we have calculated the regression line for y on x. If we want to estimate x for a given y, we use the x on y regression line. It may seem strange to have 2 regression lines depending on which quantity we want to estimate. Previously, in Pure Maths, if we wanted to find x for a given y we just turned the equation around. However, in Statistics we have data spread around a line and we want to estimate with as little uncertainty as possible.

  4. The sum of the squares of these lengths is made as small as possible. For the height and foot length data that we used before, the x on y regression line is given by

  5. Foot length and height of UK children y on x regression line Foot length (cm) This point,the point of intersection of the lines, is the mean x on y regression line Height (cm) y on x regression line: x on y regression line: So, the two regression lines are If the length of a child’s foot was 20cm we would use the x on y regression line to estimate the child’s height.

  6. For y on x we had where and where We can easily adapt the previous calculations in order to find the least squares regression line for x on y: Swapping x and y gives

  7. e.g. The following data gives the weights and lengths of a sample of beans: Weight (g) 0·7 1·2 0·9 1·4 1·2 1·1 1·0 0·9 1·0 0·8 Length (cm) 1·7 2·2 2·0 2·3 2·4 2·2 2·0 1·9 2·1 1·6 Source: O.N.Bishop (a) Taking the weight to be x and length as y, calculate both least squares regression lines. (b) Use the appropriate line to estimate the weight of a bean of length 1·5 cm. (c) Comment on your answer to (b). (a) Using the calculator functions for the y on x regression line, Solution:

  8. Weight (g) 0·7 1·2 0·9 1·4 1·2 1·1 1·0 0·9 1·0 0·8 Length (cm) 1·7 2·2 2·0 2·3 2·4 2·2 2·0 1·9 2·1 1·6 If your calculator doesn’t give the constants for the x on y line, then use the formula booklet as follows: Summary data:

  9. Weight (g) 0·7 1·2 0·9 1·4 1·2 1·1 1·0 0·9 1·0 0·8 Length (cm) 1·7 2·2 2·0 2·3 2·4 2·2 2·0 1·9 2·1 1·6 Summary data:

  10. Weight (g) 0·7 1·2 0·9 1·4 1·2 1·1 1·0 0·9 1·0 0·8 Length (cm) 1·7 2·2 2·0 2·3 2·4 2·2 2·0 1·9 2·1 1·6 Summary data:

  11. Weight and Length of beans ( x on y ) ( y on x ) The two regression lines look like this:

  12. Weight (g) 0·7 1·2 0·9 1·4 1·2 1·1 1·0 0·9 1·0 0·8 Length (cm) 1·7 2·2 2·0 2·3 2·4 2·2 2·0 1·9 2·1 1·6 ( x on y ) ( y on x ) The answer is unreliable as the values lie outside the range of the data. (b) Use the appropriate line to estimate the weight of a bean of length 1·5 cm. We are given y and want to find x so we use the x on y regression line: (c) Comment on your answer to (b).

  13. Both regression lines pass through the mean, SUMMARY There are 2 regression lines: • The y on x regression line is used to estimate y for a given x. • The x on y regression line is used to estimate x for a given y. • If the data have a high degree of scatter, the regression lines are further apart than for closely clustered data. • For data lying entirely on a line, the 2 regression lines coincide.

  14. 1. The following summary data relates to the population of woodland birds (x) and farmland birds (y) between 1970 and 2002 ( 33 years ). The index for both was taken as 100 in 1970. Source: Social Trends ( from Br. Trust for Ornithology and RSPB ) Summary data: Exercise Find the equation of the x on y regression line.

  15. The x on y regression line is given by Solution:

  16. Data for 1970 Data for 2002 The full data set together with the x on y regression line looks like this: What do you notice about the data? ANS: Low levels of farmland species occur with low levels of woodland species. ( This doesn’t mean that one causes the other. They could both, for example, be linked to availability of food. ) Only 2 dates are shown but they suggest that both types have declined.

  17. The following slides contain repeats of information on earlier slides, shown without colour, so that they can be printed and photocopied. For most purposes the slides can be printed as “Handouts” with up to 6 slides per sheet.

  18. SUMMARY There are 2 regression lines: • The y on x regression line is used to estimate y for a given x. • The x on y regression line is used to estimate x for a given y. • If the data have a high degree of scatter, the regression lines are further apart than for closely clustered data. • For data lying entirely on a line, the 2 regression lines coincide. • Both regression lines pass through the mean,

  19. For the height and foot length data that we used before, the x on y regression line is given by The sum of the squares of these lengths is made as small as possible.

  20. We can easily adapt the previous calculations in order to find the least squares regression line for x on y: For y on x we had where and Swapping x and y gives where

  21. e.g. The following data gives the weights and lengths of a sample of beans: Weight (g) 0·7 1·2 0·9 1·4 1·2 1·1 1·0 0·9 1·0 0·8 Length (cm) 1·7 2·2 2·0 2·3 2·4 2·2 2·0 1·9 2·1 1·6 Source: O.N.Bishop (a) Taking the weight to be x and length as y, calculate both least squares regression lines. (b) Use the appropriate line to estimate the weight of a bean of length 1·5 cm. (c) Comment on your answer to (b). (a) Using the calculator functions for the y on x regression line, Solution:

  22. If your calculator doesn’t give the constants for the x on y line, then use the formula booklet as follows: Summary data:

  23. (b) We are given y and want to find x so we use the x on y regression line: (c) The answer is unreliable as the values lie outside the range of the data.