Combining DOD Test Information from Disparate Test Events

Combining DOD Test Information from Disparate Test Events Mark London May 12, 2012 NAVAIR Public Release YY-2012-530

Contents • Introduction • Preliminary Comments • Problem Statement • Proposed Solution: Meta-Analysis • Test Setup and Data Set • Data Analysis • Data Results • Conclusions • Summary • References NAVAIR Public Release YY-2012-530

Introduction • Declining DOD budgets require improvements in DOD T&E Acquisition processes • DT&E, IT&E, and OT&E need to provide system performance results in a more efficient manner • Design of Experiments is useful but doesn’t solve all problems • Methods of combining information from multiple test sources must be developed • Meta-Analysis • Bayesian Analysis • Bayesian Meta-Analysis (combination of both) NAVAIR Public Release YY-2012-530

Preliminary Comments Cost Influence of T&E: Early detection of system issues can dramatically influence total program expenditures. (Image courtesy of DAU Test and Evaluation Management Guide 2005) Defense Acquisition University. Test and Evaluation Management Guide. The Defense Acquisition University Press, Ft. Belvoir, VA. 2005. NAVAIR Public Release YY-2012-530

Preliminary Comments SE Realization Processes (right side of the “V”)) Testers verify that products at each level meet their requirements Before integration at next higher level SE Design Processes (left side of the “V”) Testers are involved in writing the verification procedures for requirements at each level (Image adapted from URL source: http://ops.fhwa.dot.gov/publications/seitsguide/section3.htm) US Dept. of Transportation Federal Highway Administration, Web. URL: http://ops.fhwa.dot.gov/publications/seitsguide/section3.htm) NAVAIR Public Release YY-2012-530

Preliminary Comments Product Lifecycle Test Phases (Image adapted from Systems Engineering Guide (2011), Mitre Corporation.) • Multiple test phases • Test phases provide different sets of test data • Different sets of data answer different questions about the system Mitre Corporation. (2011). Systems Engineering Guide: Test and Evaluation. Web. URL: http://www.mitre.org/work/systems_engineering/guide/se_lifecycle_building_blocks/test_evaluation/. NAVAIR Public Release YY-2012-530

Problem Statement • We need to find ways of combining test data from disparate test events and test phases • Two “normative” inferential statistical methods available • Meta-Analysis • Bayesian Estimation • We will focus on Meta-Analysis • Purpose of Study:Determine utility of Meta-Analysis for simple analysis of multiple flight test data sets. NAVAIR Public Release YY-2012-530

Problem Statement What’s our goal?... To integrate multiple test data sets into (hopefully) amore statistically significant set of data results NAVAIR Public Release YY-2012-530

Proposed Solution: Meta-Analysis • What IS Meta-Analysis? • Combines results from several studies to address a set of related research hypotheses • The statistical synthesis of results from a series of studies (Borenstein, 2009) • Where is Meta-Analysis used? • Health (Sandelowski, 2000), Medicine, Pharmacology, Education, Psychology, Business, Finance, Computer Simulations (Reese, 1996) • Almost anywhere there is a need to assemble a summary of research studies on a given topic Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK. Reese, C. S., Wilson, A. G., Hamada, M. S., Martz H. F., & Ryan, K. J. (1996). Integrated Analysis of Computer and Physical Experiments. Los Alamos National Labs, Report No. LA-UR-00-2915. Sandelowski (M. (2000). Focus on Research methods: Combining Qualitative and Quantitative Sampling, Data Collection, and Analysis Techniques in Mixed-Method Studies. Research in Nursing & Health, 23, 246-258. NAVAIR Public Release YY-2012-530

Proposed Solution: Meta-Analysis • What can Meta-Analysis provide? • Way to combine results from multiple studies • Way to broadly cover large amounts of studies/tests • Limitations of Meta-Analysis? (Aguinas, 2011) • Viewed with suspicion in technical fields • File drawer problem • Mixing apples & oranges • Some studies may be ignored Aguinas, H., Pierce, C. A., Bosco, F. A., Dalton, D. R., & Dalton, C. M. (2011). Debunking Myths and Urban Legends about Meta-Analysis. Organizational Research Methods, 14(2), 306-331. NAVAIR Public Release YY-2012-530

Proposed Solution: Meta-Analysis • Different “flavors” of Meta-Analysis • Fixed Effects Model • Random Effects Model • “Effect Sizes” - measure the strength of relationship between variables and are the summary statistic in Meta-Analysis (Shelby, 2008) • Effects may be use different models: • d-family (Hedges, g) – compares mean difference • r-family –compares correlation coefficients Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110. NAVAIR Public Release YY-2012-530

Proposed Solution: Meta-Analysis (Adapted from Shelby & Vaske, 2008 and Lipsey & Wilson, 2001) Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage. Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110. NAVAIR Public Release YY-2012-530

Test Setup and Data Set • Laser spot tracking “miss” distance • Airborne FLIR/Laser system • Tested at NAS Patuxent River • 5 separate data sets collected • “Effect” is impact of flight environment on mean of laser spot placement. • Fixed Effects model IAW (Ruzni, 2010) Improved Mobile IR Signature Target System Ruzni, N., Idris, N., & Saidin, N. (2010). The Effects of the Choice of Meta-Analysis Model in the Overall Estimates for Continuous Data with Missing Standard Deviations. 2nd International Conference on Computer Engineering and Applications, 369-373. NAVAIR Public Release YY-2012-530

Test Setup and Data Set Improved Mobile IR Signature Target System (IMISTS) Table: IMISTS parameters NAVAIR Public Release YY-2012-530

Test Setup and Data Set Ground Test Measure Static Laser System Boresight Error (“Control”) Flight Test Measures pointing accuracy under flight conditions (“Experiment”) Flight Test Approach Video of Laser Spot on IMISTS Sample Data Points NAVAIR Public Release YY-2012-530

Test Setup and Data Set • Each of 5 ground data sets are “Control” group • Ground data simulated in Matlab • Data radial offset distance simulated as N(0,0.1) • Data polar angle simulated as U(0,2π) • # simulated points for each set matched corresponding # measured data points • Each of 5 flight data sets are “Experimental” group to consider effect of flight environment NAVAIR Public Release YY-2012-530

Data Analysis PLOTS OF SIMULATED GROUND TEST DATA • All simulated data modeled as N(0,0.1) in radius and U(0,2π) in polar angle. • Results shown are averaged over 1000 simulations. NAVAIR Public Release YY-2012-530

Data Analysis PLOTS OF FLIGHT TEST DATA • Note difference of grouping for each separate flight test event • Most data contained within 2 radius “units” NAVAIR Public Release YY-2012-530

Data Analysis • Descriptive Statistics of Simulated data sets Table: Simulated data set statistics for average of 1000 simulations. • Descriptive Statistics of Flight data sets Table: Flight test data Statistics NAVAIR Public Release YY-2012-530

Data Analysis Combining data sets into a Summary Table Table: Summary table of all data sets. NAVAIR Public Release YY-2012-530

Data Analysis Calculating the Cohen Effect Size, d, using Direct Calculation Method for each of the data sets we use the standard formulas (Borenstein, 2009): the variance, Vd,, and Standard Error, SEd, are given by where: i = data set number (i=1,2,…,5); XF,XS = sample means sF, sS = sample SDs; nF, nS = 300 = # samples for each set. Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK. NAVAIR Public Release YY-2012-530

Data Analysis Our resulting table of Cohen effect sizes becomes: Table: Calculated Cohen d effect size parameter. These Effect Sizes d are VERY Small! (d < 0.1) NAVAIR Public Release YY-2012-530

Data Analysis But, the Cohen d effect size parameter tends to overestimate our effect size so we apply the Hedges J conversions using: Table: Bias conversion using Hedges J parameter. Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK. NAVAIR Public Release YY-2012-530

Data Analysis So we calculate our Hedges g effect size parameter using the following formula: And the variance, Vg,, and Standard Error, SEg, are given by Table: Calculation of Hedges g parameter using J conversion. NAVAIR Public Release YY-2012-530

Data Results • But, for our Fixed Effects model we also need the respective weighting effects of each data set using: • The weighting factor, Wi: • The relative weighting factor, Wr: • Product of Wi and Effect Size parameter, g • Sum of Wi and Wi *g Table: Calculation of Weighting Factors. Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK. NAVAIR Public Release YY-2012-530

Data Results • Finally to compute our Summary Effect statistics we use the following: • And calculate our upper and lower 95% confidence levels as: • Producing the summary effects of the flight vs. simulated data Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK. NAVAIR Public Release YY-2012-530

Data Results • A forest plot of our g effect sizes and Summary M effect produces: NAVAIR Public Release YY-2012-530

Data Results • But…we need to confirm Homogeneity of data sets using Cochrane’s Q statistic: • Produces a Q value of Q=3002!! • Our Chi-Square Critical Value (p=0.05, df=5-1=4) is: 9.488 • Since our Q is inside the CV (9.488 < 3002) we reject the null hypothesis that our variability is due to sampling error • Homogeneity is NOT confirmed! • To continue we would consider Meta-Regression analysis. Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK. NAVAIR Public Release YY-2012-530

Data Results (Adapted from Shelby & Vaske, 2008 and Lipsey & Wilson, 2001) Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage. Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110. NAVAIR Public Release YY-2012-530

Conclusions • Results of preliminary Meta-Analysis: • Very small effect sizes (d, g, M all < 0.1) • Flight data does not produce significant statistical difference from Ground data • Large data sets of same dimensions • Significant overlap between Flight vs. Ground • Homogeneity NOT confirmed via Q test • Random Effects models probably more accurate • Meta-Regression probably needed • Application to flight test data problematic • Future Work should include: • Complete full Meta-Regression for Random Effects model • Explore analysis of other flight test regimes • Compare & Contrast with Bayesian methods NAVAIR Public Release YY-2012-530

Summary Purpose of Study:Determine utility of Meta-Analysis for simple analysis of multiple flight test data sets. Did our study succeed?—Not as originally planned! Additional Observations: • Need to ensure sufficient number of data sets • Meta-Analysis more complicated than initially thought • Homogeneity of data sets of primary importance • Advanced methods (e.g. Meta-Regression) start to look more like conventional ANOVA or Multiple-Regression • Application to Flight Test Data still unclear NAVAIR Public Release YY-2012-530

References Anderson-Cook, C. M. (2009). Opportunities and issues in Multiple Data Type Meta-Analyses. Quality Engineering, 21, 243-253. Aguinas, H., Pierce, C. A., Bosco, F. A., Dalton, D. R., & Dalton, C. M. (2011). Debunking Myths and Urban Legends about Meta-Analysis. Organizational Research Methods, 14(2), 306-331. Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to Meta-Analysis. John Wiley & Sons, West Sussex, UK. Defense Acquisition University. Test and Evaluation Management Guide. The Defense Acquisition University Press, Ft. Belvoir, VA. 2005. Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage. Mitre Corporation. (2011). Systems Engineering Guide: Test and Evaluation. Web. URL: http://www.mitre.org/work/systems_engineering/guide/se_lifecycle_building_blocks/test_evaluation/. Reese, C. S., Wilson, A. G., Hamada, M. S., Martz H. F., & Ryan, K. J. (1996). Integrated Analysis of Computer and Physical Experiments. Los Alamos National Labs, Report No. LA-UR-00-2915. Ruzni, N., Idris, N., & Saidin, N. (2010). The Effects of the Choice of Meta-Analysis Model in the Overall Estimates for Continuous Data with Missing Standard Deviations. 2nd International Conference on Computer Engineering and Applications, 369-373. Sandelowski (M. (2000). Focus on Research methods: Combining Qualitative and Quantitative Sampling, Data Collection, and Analysis Techniques in Mixed-Method Studies. Research in Nursing & Health, 23, 246-258. Shelby, L. B. & Vaske, J. J. (2008). Understanding Meta-Analysis: A Review of the Methodological Literature. Leisure Sciences, 30, 96-110. US Dept. of Transportation Federal Highway Administration, Web. URL: http://ops.fhwa.dot.gov/publications/seitsguide/section3.htm) NAVAIR Public Release YY-2012-530

Combining DOD Test Information from Disparate Test Events

Combining DOD Test Information from Disparate Test Events

Presentation Transcript

Test Test PPT

DoD Automatic Test Systems Past, Present, Future

From Test Adequacy To Test Oracle

Test… what test?

COLOR TEST COLOR TEST COLOR TEST COLOR TEST

Allied Victory Key Events Test

COLOR TEST COLOR TEST COLOR TEST COLOR TEST

5.2 Combining Events

PMC Test Information

AP Biology Test Information

MC simulation of test beam events

DoD Installation Energy Test Bed

Combining Information from Related Regressions

Muon detector performance test for DOD

Final Test Information

Information Gathered From Test

The TOEFL Test Information

Selective School Test Information

MC simulation of test beam events

Test results from Imperial