Starting point Overall visual non-numerical comparisons • Overlap • Shift • Unusual features
Starting point Overlap: I notice there is a lot of overlap between the boys’ and girls’ foot lengths.
Starting point Shift: The boys’ foot lengths are shifted further up the scale than the girls’ foot lengths.
Starting point Unusual features: There is an unusually low foot size of 15 cm in the girls’ data. I suspect that this data is a mistake as it seems too low in comparison with the other data for girls. OR one of the girls has a recorded foot length far shorter than any other girl
After the initial overall visual non-numerical comparisons: • Make more detailed comparative descriptions of the features including use of summary statistics and specific observation values where appropriate. • Reflect and perhaps comment on some of the features using “I wonder . . .” and “I expect . . .” type statements, i.e., comment on any inferential thoughts.
comparativedescriptions of the features including use of summary statistics The median foot length of the boys (25cm) is 3cm longer than the median foot length of the girls (22cm). The mean foot length of the boys (25.5cm) is 2.1cm longer than the mean foot length of the girls (23.4cm).
comparativedescriptions of the features including use of summary statistics The range of foot lengths for the boys (9cm) is the same as the range of foot lengths for the girls (9cm) if we ignore the unusual value. Also the interquartile range for the foot lengths for the boys (3cm) is the same as for the girls (3cm).
comparativedescriptions of the features including use of summary statistics The most common result for the foot length of boys was 25cm but for the girls it was 22 and 23 cm. In all these cases, the boys seem to have higher values of foot length than the girls by about 2cm.
comparativedescriptions of the features including use of summary statistics The median foot length for the boys is the same as the UQ value for the girls (25cm)
Make Comparisons • Between the groups (e.g., overlap, shift, spread and shape statements) • Within each group (e.g., unusual observations)
Overlap Be aware of sampling variation: Sampling alone can produce shifts These shifts are small in large samples They can be large in small samples.
Overlap There is some overlap of the boxes but the median of the girls’ foot length is outside the boys’ box and the median of the boys’ foot length is the same as the UQ of the girls’ foot length.
Overlap OR There is some overlap for the middle 50% of the boys’ right foot lengths and the middle 50% of the girls’ right foot lengths.
Shift The boys values are shifted to the right of the girls values for maximum and minimum values and median and UQ and LQ foot lengths.
Shift The middle 50% of the boys’ foot lengths (the box) is shifted much further along the scale than the middle 50% of the girls’ foot lengths.
Spread The spread for both boys’ foot lengths and girls’ foot lengths are the same i.e. range is 9cm in both cases and IQR is 3cm for both.
Spread The middle 50% of boys have a right foot measuring between 24cm and 27cm (IQR = 3cm) whereas the middle 50% of the girls are between 22 and 25cm (IQR = 3cm). This means that the foot lengths for these boys vary by about the same amount as these girls’ do.
Spread I expect that the boys’ and girls’ foot length distributions back in the two populations have similar variability.
Note: The range should not be used as it is very inclined to be an unstable estimate of the population spread. The range is highly likely to vary greatly from sample to sample for samples of these sizes. The range is also prone to be severely affected by the occasional extreme observation. This is why we use other more resistant measures of spread such as the IQR. The IQR is not disturbed by the presence of a few very large or very small observations.
From the dot plot: Some of the boys have bigger right foot lengths than some of the girls and vice versa
Shape The shape of the distributions is not clear from the dot plots but appears to be unimodal as would be expected and maybe slightly skewed to the right as indicated by the box plots. To get a more accurate view, we would need to increase the sample size.
Shape OR The sample distribution for the boys’ foot lengths is roughly symmetrical with a mound around 24 to 27cm, i.e., unimodal The sample distribution for the girls’ foot lengths shows a large mound around 22 to 24 cm.
Shape I wonder if boys’ and girls’ foot length distributions back in the two populations are roughly symmetric and unimodal. I expect so for a body measurement such as foot length for both girls and boys.
Unusual value I notice one of the girls has a foot length (15cm) far smaller than any other girl I worry that this may be a mistake. It could be a measurement or recording mistake or perhaps this girl is much younger than 13 years. I wouldn’t expect a 13 year-old girl to have a foot size this small. I need to check her other measurements such as age, height etc. to further investigate this extreme value.
Gaps and clusters I notice the dots are stacked on whole numbers. This is because the foot lengths are measured to the nearest cm.
Gaps and clusters There is a gap in the girls’ group at 28cm and gaps in the boys’ group at 22 and 29cm.
Gaps and clusters Boys’ and girls’ foot length distributions back in the two populations would not have gaps at these same values. The gaps are in the sample due to the small size of the sample.
Sampling If a new random sample of 24 13-year-old boys and a new random sample of 22 13-year-old girls were taken I would expect the plots to look different because of sampling variability. With these sample sizes, I would expect each IQR spread to change slightly and that each box would be slightly further down or up the scale.
I wonder: • if I repeated this sampling process many times the boys’ foot lengths would, just about always, be shifted further up the scale than the girls’ • if boys tend to have a greater foot length than girls back in the two populations • if the median foot length of boys really is greater than that of girls back in the two populations
Conclusion I notice that the distance between the medians is greater than 1/3 of the “overall visible spread”
Conclusion I am going to claim that the right foot lengths of 13 year-old New Zealand boys tend to be longer than the right foot lengths of 13 year-old New Zealand girls back in the two populations. I am prepared to make this call because, in my data, the distance between the boys’ and the girls’ median foot lengths is big relative to the overall visible spread. To make this call, with sample sizes of around 30, the difference between the two foot length medians needs to be more than about 1/3 of the overall visible spread. This is true for my data.
Conclusion I don’t believe that the pattern in my data of the boys tending to have longer foot lengths than the girls is just due to who happened to be randomly selected in the girls’ group and who happened to be randomly selected in the boys’ group, i.e., I don’t believe this data pattern has just happened by chance. I am prepared to claim that this pattern in the data is real, i.e., that this pattern persists back in the two populations.
Notes: We use ‘… right foot length …’ because the investigative question asks about the right foot length.
Notes: Using statistics there is always the possibility that the calls (decisions) that we make are wrong, i.e., we are making calls in the face of uncertainty. For example, we want to make a call on who tends to be taller (back in the two populations), 13 year-old boys or 13 year-old girls. We may make the call that it’s 13 year-old boys when in fact it’s girls who tend to be taller. Or, we may not want to make a call even though boys tend to be taller than girls..
Explanatory I expected that boys tend to have bigger feet than girls back in the populations and the information I collected (my data) supports this belief. • I can’t think of any other factor which can explain the difference in foot size other than gender.
Notes: In this explanatory element we ask ourselves if our conclusion makes sense with what we know, i.e., whether our contextual knowledge matches our conclusions. We must try to think of other factors which may lead to alternative explanations when measuring foot lengths. These suggestions should also be present in the conclusion.