170 likes | 312 Vues
Analysis of Chromium Emissions Data. Nagaraj Neerchal and Justin Newcomer, UMBC and OIAA/OEI and Mohamed Seregeldin, Office of Air Quality Planning and Standards, EPA, RTP. Objective.
E N D
Analysis of Chromium Emissions Data Nagaraj Neerchal and Justin Newcomer, UMBC and OIAA/OEI and Mohamed Seregeldin, Office of Air Quality Planning and Standards, EPA, RTP
Objective • To develop a protocol (methodology) for obtaining confidence bounds for the “Mean Chromium Emissions” for each welding process and rod type combination. • Incorporate all the data, including the averages, to the best of our ability.
About The Data • Three Welding Processes • GMAW, SMAW, FCAW • Three Rod Types • E308, E309, and E316 • Multiple Sources of Data • Some report individual measurements • Some report only averages without the original observations. • Units of reporting vary—all are converted to g/kg
Summary Statistics Note: Summary Statistics based only on observations with single measurement.
Combining Rod Types • Combine E308+E316 because of the similar technology and small sample size • Sample Sizes:
Summary Statistics After Combing Data for Rod Types Note: Summary Statistics based only on observations with single measurement.
Traditional Approaches • Assume Normality? • Normality is not a good assumption for this data set at all • Sample sizes are very small for certain combinations • Bounds obtained assuming normality give meaningless results (e.g. negative bounds) when the data does not follow normality • 95% Confidence Intervals for the Mean: Note: Summary Statistics based only on observations with single measurement.
Traditional Approaches • Transform the data to normality • Optimal transformation for Total Chromium data is different from optimal for Chrom6 data. • It is hard to transform the confidence bounds back to the original scale (mean of the log is not the same log of the mean!) • Box-Cox Log-Likelihood Plots:
Traditional Approaches • Weighted regression to incorporate the averages
Traditional Approaches • Weighted Regression • Estimates have good properties (such as BLUE) in general—not only for normal data • But the confidence bounds are sensitive to the normality assumption, especially when the sample sizes are small as in our case.
Traditional Approaches • Nonparametric Approaches? • Nonparametric approaches usually use ranks. When only averages are reported we completely lose the information regarding ranks. Therefore, means can not be incorporated into nonparametric approaches. • Bootstrapping? • Made popular by Bradley Efron in the 1980’s • Efron and Tibshirani (1993) • Millard, S. P. and Neerchal, N. K. (2000)
Bootstrapping • What is Bootstrapping? • Resampling the observed data • It is a simulation type of method where the observed data (not a mathematical model) is repeatedly sampled for generating representative data sets • Only indispensable assumption is that “observations are a random sample from a single population” • There are some fixes available when the single population assumption is violated as in our case. • Can be implemented in quite a few software packages: e.g. SPLUS, SAS • Millard and Neerchal (2000) gives S-Plus code
Bootstrapping - The Details Bootstrapping inference is based on the distribution of the replicated values of the statistic : T*1,T*2,….T*B. For example, Bootstrap 95% Upper Confidence Bound based on T is given by the 95th percentile of the distribution of T*s.
Bootstrapping Single Tests Data Note: Columns in yellow represent the 95% upper confidence bound
Bootstrapping the Combined Data • Group the data points according to the number of tests used in reporting the average, within each welding process and rod type combination. Then bootstrap within each such group. • i.e. for GMAW and E316: Note: Each color represents a separate group
Bootstrapping - Results Note: Columns in yellow represent the 95% upper confidence bound
Final Remarks • Normality assumption is not appropriate for either Total Chromium or Chromium6 data. • Weighted regression model can accommodate the averages into the estimates. • Bootstrapping the data seems to be a way to ensure that meaningful confidence bounds are obtained • More work is needed to study the robustness of Bootstrapping results with respect to some extreme values in the data