310 likes | 433 Vues
This document addresses the complexities of conducting sensitivity analysis on high-dimensional probabilistic environmental models. It emphasizes decision modeling, utility functions, and the importance of understanding scenario and model uncertainties. The article discusses various sensitivity analysis methods, including One-At-A-Time (OAT) and global approaches using statistical techniques like Sobol and FAST. It also highlights the significance of data mining techniques, such as Multivariate Adaptive Regression Splines (MARS) and Multiple Additive Regression Trees (MART), in improving the efficiency and effectiveness of sensitivity analysis.
E N D
Data Mining of Environmental Models for Sensitivity Analysis re Knowledge Discovery Tom Stockton Paul Black, Andy Schuh, Kate Catlett, John Tauxe Neptune and Company, Inc. www.neptuneandco.com
Issue How to conduct a sensitivity analysis of a complex high dimensional probabilistic environmental model?
Decision Modeling • Decision Model, build and solve • Decision Actions and Outcomes • Utility (costs, liabilities, desires) • Probabilistic model • Scenario • Model • Parameter • Sensitivity analysis (knowledge re-discovery) • Value of information analysis (OUT-path) • Data collection • Update model (Bayesian or ad hoc)
Decision Modeling U(d | I) = supdòQ´S´M´Y U(d | y, S, M,qM) utility function p(S) scenario uncertainty p(M|S)model uncertainty p(qM | S) parameter uncertainty p(I | qM ,M, S) data likelihood p(y | qM , M,S) risk predictive dist dydS dM dqM where: U = utility, loss, cost M = model structure d = decision qM = model parameters I = information/data S = scenario y = risk
Sensitivity Analysis Given a model: Y = f (X) [Y = GoldSim(X)] Sensitivity analysis is aimed at describing the influence of each input variable Xi on the model response Y
Sensitivity Measures • One-At-A-Time (OAT) • Differential Analysis • Global • Statistical • scatter plots, correlation, regression, rank transformations • Data mining • Sobol, FAST, MARS, MART
Desirable Propertiesof a SA Measure • Efficiency • account for all effects while being computationally affordable • Simplicity • implementable and interpretable • Model Independent • The method can handle non-linearity, non-monotonicity (across time and space) K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.
Sensitivity Measures • OAT and Differential Analysis, for complex probabilistic models, often are • not efficient, and • notmodel independent
Global Sensitivity Measures • Sensitivity Measure • Build a statistical model of the model response and the model inputs using the Monte Carlo simulation results • Decompose variance of the output and attribute to input variables
Standardized Rank Regression SRR • Rank Y and Xi and scale the ranks to mean of 0 and variance of 1 for convenience Based on the ranks of Y and Xi Assuming the Xi are independent
Fourier Amplitude Sensitivity Test FAST • Explores the multidimensional input space of the input factors by a search curve using Fourier transform function. • Handles main and interaction effects K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.
Issues • Differential Analysis • not feasible: derivatives of complex models • SRR and OAT • notmodel independent: trouble with nonmonotonic nonlinear models. • not efficient: trouble with interaction effects in high dimensional models • FAST • not efficient: Separate model runs
Possible Solutions • Data mine the probabilistic model output • Multivariate Adaptive Regression Splines (MARS) • Multiple Additive Regression Trees (MART)
Data Mining • MARS • Non-parametric recursive partitioning approach that fits separate splines to distinct intervals of the predictor variables. • MART • Explores the multidimensional input space of the input factors using gradient boosting of additive regression models. • Advantages • Search for interactions between variables, allowing any degree of interaction to be considered. • Tracks very complex data structures in high-dimensional data.
Sensitivity Indices viaANOVA decomposition Sensitivity indices are calculated using basis functions not including xs
Analytical Example Sobol’ g-function Saltelli A., Tarantola S., and Chan K.P.-S. (1999), “A Quantitative Model-Independent Method for Global Sensitivity Analysis of Model Output,” Technometrics, 41, 39-55.
Example: Sobol’ g-function Saltelli A., Tarantola S., and Chan K.P.-S. (1999), “A Quantitative Model-Independent Method for Global Sensitivity Analysis of Model Output,” Technometrics, 41, 39-55.
1 Management Options - Institutional Controls - Site Maintenance - Waste Acceptance - Closure - Monitoring/Surveillance 2 Future Inventory Existing Inventory Fate & Transport Research, Monitoring, Information & Data Collection Occupational MOP & IHI Cumulative (CA) Ecosystem 7 • Maintenance Review • Periodic Review • Waste Acceptance Decision • Closure Decision 3 Value of Information 6 4 Contamination Risk Budgets Management Disposal Costs Closure Costs Cost Potential Liabilities Monitoring Costs ALARA Costs Disposal Fees NO 8 Analysis Costs Public Benefit Can the risk be managed to regulatory thresholds at an acceptable cost with an acceptable level of uncertainty? 5 Regulations & Guidance YES Uncertainty analysis Choose Management Options & Update Management Plan Legend Sensitivity analysis Sequence number Iteration loop 1
Simulation Results • Model Inputs ( X ) • Inventory • Fate and transport • Upward advection • Biotic transport • Model response ( Y ) • “EPA-SUM”
Variation Explained MART/ Time SRR MARSGCD 10,000 0.91 0.99 LANL 50 0.87 0.94 100 0.86 0.96 500 0.75 0.91 1,000 0.71 0.95 10,000 0.71 0.93
Summary • MART and MARS appear to provide an • Efficient • Simple (?) • Model Independent approach to data mining probabilistic model results for sensitivity analysis
Finally… • The decision context: • Is the uncertainty in the model response too high? • Is there value in reducing input uncertainty? • SA and cost used to estimate the value of collecting additional information.
MARS • Non-parametric recursive partitioning approach that fits separate splines to distinct intervals of the predictor variables. • Both the selected variables and the knots are found via a brute force, exhaustive search procedure optimized simultaneously by evaluating a "loss of fit" criterion. • Searches for interactions between variables, allowing any degree of interaction to be considered. • Tracks very complex data structures in high-dimensional data. J.H. Friedman, (1991), “Multivariate Adaptive Regression Splines,” The Annals of Statistics, 19, 1-14 Software: Trevor Hastie and Robert Tibshirani, MDA Library for R (‘GNU S’). Ross Ihaka and Robert Gentleman, (1996) R: A Language for Data Analysis and Graphics, Journal of Computational and Graphical Statistics, 5, 3, 299-314. www.r-project.org.
MART • Multiple Additive Regression Trees • Explores the multidimensional input space of the input factors using gradient boosting of additive regression models. • Handles main and interaction effects. • Fast K. Chan, S. Tarantola and A. Saltelli, 2000, Variance-Based Methods, in Sensitivity Analysis, A. Saltelli, K. Chan, E.M.Scott.John Wiley and Sons.