## Distribution and Outliers

**Screening**(Significant Effects)**Hadlum vs Hadlum**A univariate example that illustrates deviation from a normal pattern.**Normal duration**Percentage (n=13634) Duration of Pregnancy Bannet (1978) Appl. Statist. 27, 242-250**Comparison of Hadlum Jr. to normal pattern**Normal duration Percentage (n=13634) Hadlum Jr.**Deviation = observed value - predicted value**residual measurement Model ^ y y Model validation**Normal Population - Cumulative plots**Traditional Graphical paper Normal distribution paper**Normal plot**1) Sort the observations in increasing order 2) Let each observation present a percent interval that equals of the normal distribution If the observations are normally distributed, they plot like a straight line in the normal plot! Deviation from straight line implies outlying observations or non-normal distribution**Sculls from a cemetery**maximum Karl Pearson (1931) Tables for Statisticans and Biometricans, Biometric Lab., London**Is the largest scull from a Maori?**Hypothesis: The Maoris have less scull capacity than the whites - the largest scull is a contaminant shipwrecked sailor or missionary?**Probability plot**Scull Capacity**Example**P. Garrigues R. De Sury M. L. Angelin J. Bellocq J. L. Oudin M. Ewald Geochemica et Cosmochimica Acta, 52, (1988) 375-384**Data**? ?**Robust regression?**Two outliers Useful tool to avoid thinking? Sloppy data analyst can find relief in robust regression**Result of “pooled” regression**r=0.995**Observation**r=0.865 Two phenomena influencing the ratio (predictor) No prediction possible!**Parallel displacement**- perfect result for the one who wants to be “straight-lined”