540 likes | 758 Vues
Analysis of Alpaca Fiber Data. By Ying Luo and Kristen Swinton. Background. Alpacas are Llama-like animals bred in South America from which fibers are collected to make wool. Different properties of the fibers affect the quality of the wool. Response Variables.
E N D
Analysis of Alpaca Fiber Data By Ying Luo and Kristen Swinton
Background • Alpacas are Llama-like animals bred in South America from which fibers are collected to make wool. • Different properties of the fibers affect the quality of the wool.
Response Variables • Our client was interested in both the tensile strength and the scale of the fibers. • Tensile strength is a measure of the breaking strength of the fiber. • Scale is a measure of the distance between the “scales,” or cells on the surface of the fiber.
More About Scales • The scales are important as they help the fibers interlock to form felt. • Fibers with more scales—and consequently a shorter distance between them—are more likely to have been damaged. • Hence, a higher scale measure indicates healthier fibers.
Explanatory Variables • The client suspected that the breed of Alpaca used and the diet it is fed might affect the tensile strength and scale of the fibers. • Data was collected for 22 Alpacas of two breeds—Suri and Huacaya.
Explanatory Variables • One of the diets of interest is a diet meant to simulate the diet of an animal in the wild. This diet is referred to as the “low nutrition diet.” • The other diet is a more typical diet of animals raised in captivity. This is the “high nutrition diet.”
Covariates • Data was also collected on the animals’ gender, age at the beginning of the study, and color. • These factors were included in the analysis along with the original explanatory variables.
Complications • Many observations were taken on each animal. To simplify the analysis, we took the mean tensile strength and mean scale for each animal. • Breed and gender were confounded—there were no female Suris in the sample. • Unbalanced data further complicated the analysis.
Time Periods • Data was collected one year after the study began. This is called period four. • More data was collected two years into the study. This is period eight. • We analyzed each set of data separately and compared the results.
Strategy of Analysis • We started with the tensile strength data. • After analyzing the complete set of data, we analyzed two subsets—males only and whites only. • The males only analysis was done to eliminate the breed and gender confounding. • The whites only analysis was suggested by the client in an attempt to balance the data.
Tensile Strength AnalysisComplete Data • We started with the data from period four. • We first tried to fit models with only one factor and assess their significance. • Two factors had significant F-tests: breed and gender. • A gender only model does not give any information about the factors of primary interest.
Complete Data • Diet was not significant alone, but did have a significant interaction with gender. • It appears that the low nutrition diet produced higher tensile strengths for both genders.
Complete Data • As with diet, color did not have a significant effect on tensile strength by itself. • It did, however, interact with breed.
= m + + e Y B ij i ij Possible Models Model 1 F = 21.59 p-value = 0.0002 Model 2 F = 6.42 p-value = 0.0024 Model 3 F = 8.02 p-value = 0.0006
Period Eight • The period eight data produced the same three models. • The p-values were a bit larger, but still significant at the 0.05 level. • This suggests no significant effect of the passage of one year.
Males Only—Period Four • Model 1 (breed only) had a p-value of 0.0036. • Because this data only has one gender, there is no longer any confounding of breed and gender. • Since this model is significant, there may be a true breed effect.
Other Males Only Models • Model 2 is no longer appropriate since it includes the gender effect. • In examining Model 3, we noticed that the breed by color interaction was no longer significant. • This is possibly because nearly all the Huacaya males were white, while there was only one white Suri male.
Period Eight • Similarly to period four, only Model 1 fit the data reasonably well. • Its p-value was 0.0755, which is greater than 0.05, but the sample size was rather small (14).
Whites Only—Period Four • Model 1 was again highly significant with a p-value of 0.0009. • In fitting Model 2, we discovered that the diet by gender interaction was no longer significant. • The reason for this is unclear, but could be due to confounding of color with either diet or gender. • Model 3 is clearly not applicable here as it contains a color main effect.
Whites Only—Period Eight • Results were similar to those of period four. • The p-value for Model 1 was 0.0085.
Tensile Strength Conclusions • Breed appeared to be an important factor in all the models. • A boxplot of the distribution of tensile strength for each breed illustrates this apparent effect.
Breed or Gender? • Because of the confounding, we cannot be sure that the apparent effect is truly due to breed. • A plot of the tensile strength for each breed of the males only data can clarify this issue.
Summary of Tensile Strength Conclusions • Breed seems to have a significant effect on tensile strength, but it is confounded with gender. • In looking at the males only, this effect is still present. • This effect is the same when white animals are examined alone. • Huacayas produce fibers with higher tensile strength on average than do Suris.
Analysis of Scale Data • Analysis was more difficult for the scale data, with different subsets producing different models. • We began our analysis as before by looking for single factor models for the period four data first. • The only single factor with a significant F-test was age.
Scale Data—Complete • As with the tensile strength data, there was a significant interaction between diet and gender. • The low nutrition diet produces higher scale for males, but lower scale for females.
Scale Data—Complete • Color was not significant by itself, but it was significant when added to the age only model. • There was no significant interaction between color and age.
Possible Models Model 4 F = 5.35 p-value = 0.0315 Model 5 F = 3.48 p-value = 0.0300 Model 6 F = 4.44 p-value = 0.0085
Period Eight • Model 4 (age only) fit well with a p-value of 0.0018 • Model 5 no longer fit because there was no longer a significant diet by gender interaction. • Gender, however, still had a significant effect when added to the age only model, but there was no age by gender interaction. • This may be another side effect of the unbalanced data.
Period Eight • Although it did not fit well for period four, the breed only model (Model 1) fit the period eight data quite well. • Another significant model contained the main effects for age and breed, but not their interaction.
Period Eight Models Model 4 F = 12.99 p-value = 0.0018 Model 7 F = 9.39 p-value = 0.0015
Period Eight Models Model 1 F = 6.67 p-value = 0.0163 Model 8 F = 11.19 p-value = 0.0006
Data Problems • There were a couple of rather large observations, which upon investigation appeared to be mistakes. • The client agreed that they were probably recording mistakes, so we threw them out.
Other Problems • The standard deviation of the measurements in period eight was half that of period four (after outliers were discarded). • This could be due to the data collector becoming more skilled in operating the machinery.
Males Only—Period Four • Neither Model 4 nor 6 fit the data well. • A diet only model became significant for the males only. • A diet by breed interaction was also significant for the males only.
Males Only Models—Period Four Model 9 F = 8.05 p-value = 0.0150 Model 10 F = 8.06 p-value = 0.0048
Males Only—Period Eight • In period eight, the diet only model (Model 9) was no longer significant. Model 10 was not significant either. • The age only model (Model 4) had a p-value of 0.0276. • Model 1 (breed only) had a p-value of 0.0954 which is not strictly significant, but is worth noting.
Whites Only—Period Four • The only model that had a significant F -test for the white animals only was a model containing the main effects of age and diet along with their interaction.
Whites Only—Period Eight • Curiously, the model that fit the period four data did not fit the period eight data very well. • The only model that had a reasonable fit was the breed only model (Model 1) with a p-value of 0.0513.
Whites Only Models Model 11 (Period 4) F = 6.88 p-value = 0.0132 Model 1 (Period 8) F = 4.9 p-value = 0.0513
Scale Conclusions • It is difficult to make overall conclusions about the factors affecting scale. • One factor that appeared in most models was age. • This suggests that age does have some effect on scale. • How age interacts with the other factors is not clear.
Effect of Age • It appears that older animals have lower scale measurements than younger animals. • This suggests that older animals have more damaged fibers.
Scale Conclusions • The other effect worth mentioning is the breed effect. • It was not significant in period four, but had moderate significance in period eight. • Suris seem to have higher scale measurements than Huacayas, on average.
Diagnostics • It would be impractical to do residual analyses for all eleven models fit to all six subsets of data. • Since the client was primarily interested in the white animals, we only checked the residuals for the four models fit to the whites only data.
Model 1—Tensile StrengthPeriod Four • A normal probability plot of the data had one unusual observation with a residual of –3.59. • The animal (W-35) that generated this observation had an unusually small tensile strength. • There was no reason to eliminate the point. • Our assumption of normally distributed errors, therefore, may not be valid. Conclusions should be accepted with caution.
Model 1—Tensile StrengthPeriod Four • A plot of residuals vs. fits also indicated this unusual observation. • If that point is ignored, the variance appears constant.
Model 1—Tensile StrengthPeriod Eight • The normal probability plot for period eight looked much better than the one for period four. • The residual for W-35 was –2.98. • The scale measurement for this observation was similar for both time periods. • The residuals vs. fits plot also looked better.
Model 11—ScalePeriod Four • The normal probability plot had a possible outlier. • The residual for this outlier was only –1.67 —not too worrisome. • The residual vs. fits plot looked okay.
Model 1—ScalePeriod Eight • The normal probability plot was not quite linear, but had no obvious outliers. • Since the F-test is robust for violations of the normality assumption, we feel that this is not a problem. • The residual vs. fits plot showed no evidence of non-constant variance.
Final Conclusions • It appears that breed is a significant predictor of tensile strength, despite its confounding with gender. • Huacayas generally produce fibers of higher tensile strength than Suris.
Final Conclusions • Conclusions for the scale data are not definite. • Age seemed to be a significant predictor, with older animals producing fibers with lower scale. • Breed was somewhat significant for period eight, but not for period four. • For period eight, Suris have higher scale measurements than Huacayas.