SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT

Photo: Kendra Holt SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT For i = minE To maxE Step Estep Cells(Erow, Ecol) = i SolverOk SetCell:="$B$26", MaxMinVal:=2, SolverSolve True neg_log_L(i) = Cells(L_row, L_col) If neg_log_L(i) <= min_L Then min_L = neg_log_L(i) Ebest = i EndIf Next i for (i=1;i<=nobs;i++) { ad_begin_funnel(); x=S(i); y=R(i); lim1=x*mfexp(-6*sx); lim2=x*mfexp(8*sx); p=adromb(&model_parameters::fz,lim1,lim2,nsteps); resid+=log(p+1.e-300); }

Statistical Simulation Statistics Bootstrapping Power analysis Randomization Fitting models to data Simulation/Estimation for testing statistical methods

Statistics The ability to simplify means to eliminate the unnecessary so that the necessary may speak Hans Hoffman Unfortunately, acquisition of statistical knowledge is blocked by a formidable wall of mathematics It is equally unfortunate that much of this mathematical basis relies on relatively strong assumptions about such things as large sample sizes and asymptotic normality On the positive side, modern computing power allows us to relax many of these assumptions, particularly those that commonly arise in natural resource applications

Statistics Three main objectives of Statistics: Data Collection 1. How should data be collected in order to: a. test hypotheses? b. estimate parameters for a model? c. make decisions? Analysis 2. How should data be summarized and analyzed? Inference 3. How accurate are the summaries and analyses? Do they adequately reflect the “truth”?

Statistical Inference A sample of 15 (X,Y) pairs from the universe of 100 possible pairs

Statistics tries to infer the picture on the right from the picture on the left Statistical Inference A sample of 15 (X,Y) pairs from the universe of 100 possible pairs The complete universe of 100 possible pairs “The Truth”

Statistical Inference – the mean X = {76.3, 74.2, 90.2, 83.9, 89.8, 89.1, 94.3, 62.3, 87.9} mean(X) = 96.7 sd(X) = 10.6 How accurate is the mean of X? Accuracy formula for the mean (and only the “mean”)

Bootstrapping That was easy! But try that for the correlation coefficient A statistical formula might be available, but it will require assumptions about the distribution of the data

Bootstrap replications of the function s(X*) s(X*2) s(X*3) s(X*1) Bootstrap samples X*1 X*2 X*3 X=(x1,x2,…xn) Original Dataset Bootstrapping – what’s the big idea? There is no real limit to what the function s() represents!!

Bootstrapping There is no real limit to what the function s() represents!! For example, s() could be the mean or other summary statistic. It could be a correlation coefficient, a regression estimate, parameter estimates for a non-linear model The data could be multivariate in which more than one response variable is observed for each sample. Thus, s() could be a Principal Component statistic. Most, if not all, statistical functions are automatically generated in software like R, SAS, Systat, S-plus, etc. Therefore, you can make inferences just by (1) resampling the data, (2) repeating the analysis, (3) summarizing your estimates, and (4) drawing inferences. All without much mathematical or statistical training.

Bootstrapping - Example Suppose we wish to infer the regression slope for our hypothetical universe of possible data using only the sample 1. Resample the 12 data points 1000 times 2. Compute the regression slope for each 3. Compute the mean 4. Compute 95% confidence limits by a. sorting the slope estimates b. L95 is 25th estimate c. U95 is 975th estimate

Bootstrap reps of X*,Y* The function s(X,Y) The distribution F * The inference Bootstrapping – R code for regression #initialise vector to hold slope estimates slope<-0. #generate 1000 bootstrap replications of X,Y, slope for(i in 1:1000){ #randomly choose nobs = 12 index values boot<-sample(seq(1, nobs, 1), size = nobs, replace=T) #create X,Y bootstrap pair Xboot<-X[boot] Yboot<-Y[boot] #perform regression using lm() function reg.boot<-lm(Yboot~Xboot) #extract slope from resulting list slope.boot[i]<-reg.boot$coeff[2] } #generate histogram hist(slope.boot, main="") #generate summary stats print(mean(slope.boot)) slope<-sort(slope.boot) L95<-slope.boot[25]; U95<-slope.boot[975] print(c(L95,U95))

Bootstrapping - Example The Distribution F * Original Data

Model fitting Curve fitting and Model fitting including some important definitions Estimation Linear vs. non-linear estimation Probability models and Likelihood Functions Likelihood functions that obey constraints “Safe” parameterization of non-linear models

She selects from a class of functions, e.g., Curve fitting A toxicologist has performed an experiment on accumulation of a chemical pollutant in tissue where she obtained the following data: She then wishes to summarize this data into a more comprehensible and reduced form.

Curve fitting A toxicologist has performed an experiment on accumulation of a chemical pollutant in tissue where she obtained the following data: She then wishes to summarize this data into a more comprehensible and reduced form. She selects from a class of functions, e.g., the curve and parameters that best fit her data. This is called CURVE FITTING

predicted predicted observed observed Curve fitting Curve fitting typically involves polynomials Values for the parametersq0, q1,… qm, are chosen so as to get the best possible fit to the data In curve fitting, the most common objective function used to judge the fit between curve and data is a least-squares criterion

Curve fitting Curve fitting involves two levels of arbitrariness: 1. The function used to predict the data is arbitrary, being dictated only to a minor extent by the process from which the data came 2. The best fit criterion is arbitrary, being independent of statistical considerations (e.g., sums-of-squares is not probability distribution?)

Curve fitting Curve fitting as described is also easy because equations like: are linear functions of the parameters that can solved analytically To see why this is linear, examine the independent variables in tabular or experimental design form

to Curve fitting The original equation can now be rewritten from which is just a linear model (e.g., multiple linear regression) that can be solved by setting all derivatives And solving the set of simultaneous linear equations for q

She may then choose to derive an equation that obeys these laws, e.g., The parametersq1 and q2 now have a biophysical interpretation Model fitting Suppose now that our toxicologist friend is familiar with the biophysical laws governing accumulation of contaminants in tissue The function also obeys the biological constraint C ≥ 0.

Model fitting When an equation is derived based on theoretical considerations, the procedure of finding the best fitting parameters is called MODEL FITTING She may choose similar goodness of fit criteria, but the form of the equation is no longer guided by computational convenience In this case, the model is no longer linear in the parameters Computing the “best-fit” must involve some numerical procedure

The process of finding parameter values that • Fit the data well • Come close on average to the true values • Do not vary excessively from one experiment to the next is called MODEL ESTIMATION Model estimation is a critical component of Simulation Modelling! Model estimation Because parameters from model fitting have some natural interpretation, we may wish to ask: “What is the true value of q2 in nature?” The imprecise nature of the measurements means that we can never answer this question with absolute certainty Also, if she performed this experiment on a new set of subjects, she may get a different best-fitting q2 value

If we assume that h has a normal distribution with mean h0 and standard deviation s, then the probability density function (pdf) of h is STATISTICAL ESTIMATION Statistical estimation Determining the parameters of a probability distribution is called STATISTICAL ESTIMATION For example, the observed value of a random variable h may be the height of trees from an even-aged stand of lodgepole pine with the usual estimates Statistical estimation is also a critical component of Simulation Modelling!

Parameter estimation Model estimation can be combined with Statistical estimation in the following way: Suppose the measured concentration of pollutant Ci taken at body size Bi is a random variable whose mean is given by If many measurements were taken at body size Bi we would expect C values to fluctuate around this mean with standard deviation s. If we assume that these variations have a normal distribution, then the probability density function for Ci is

Ci Bi Parameter estimation Each value of Ci will have a mean that depends on Biand the spread of Ci values depends on standard deviation s

Parameter estimation For several measurements (Ci, Bi) where i={1,2,…n}, we have a pdf for each:

Parameter estimation Although each value of Ci will have a mean that depends on Bi, the set of parameters are common to all measurements! Estimating parameters that are common to both the model and the probability density function of the observations is called PARAMETER ESTIMATION Because parameter estimation encompasses all other forms of estimation, it is the most critical component of Simulation Modelling!

Initial Conditions t=0 Input parameters of function equations Set t=t+1 Spring Juvenile Production Echo parameter input Output initial conditions Parameters in: Summer Juvenile Survival Update Fall Abundance Hunting effort yr t Calculate Harvest Winter Survival Observations out: Adult Spring Abundance Output results yr t No Yes End t=T? Simulation vs. Estimation Simulation generates predicted values of state variables (observations) from a known set of parameter values

Survival, production rates, error variances, etc. Parameters Duck abundance, or index State Variables* Hunting effort, harvest rates Controls to state that The Simulation Problem is to go from known Parameters and Controls to unknown State Variables: *”State Variable” in this context includes a measurement of state such as index. In that case, the parameter set includes parameters of the measurement system Simulation vs. Estimation The key elements of simulation models are Parameters, State Variables, and Controls We can use the notation:

Initial Conditions t=0 Input parameters of function equations Set t=t+1 Spring Juvenile Production Echo parameter input Output initial conditions Parameters in: Summer Juvenile Survival Update Fall Abundance Hunting effort yr t Calculate Harvest Winter Survival Observations out: Adult Spring Abundance Output results yr t No Yes End t=T? Simulation vs. Estimation Simulation generates predicted values of state variables (observations) from a known set of parameter values

The Estimation Problem is to go from known (observed) State Variables and Controls to unknown Parameters: Simulation vs. Estimation Estimation is concerned with finding the models and parameter values that generate the observed data. In this case the “known” state variables Y represent the data or observations

Simulation Estimation Simulation vs. Estimation

State Variables Controls Model Parameters Yeah, but how does estimation work? For the contaminant problem, we have

1. Guess initial parameters at iteration i=0 2. Use simulation step to “predict” state variables given parameters at current iteration 3. Calculate likelihood that data would have arisen if these parameters were true Yeah, but how does estimation work? Estimation 4. Repeat 1-3 after adjusting parameters in “best-fit” direction

The simulated predictions and likelihood function

The total likelihood function Optimization procedure used to find the value q*2 that maximizes the likelihood Log-Likelihood

Controls Simulation Model Predicted Data Ypred Observed Data Yobs Likelihood Function Confrontation between Model and Data The “Model” can be anything used to predict data Parameters

The Likelihood Function The likelihood function used depends on the type of data/observations Most stats books have Appendices listing distributions for particular data types, pdfs, expected values (means, variances), and random generation The likelihood can only tell you what hypotheses/parameters are more likely than others. It cannot tell you the probability that a given hypothesis is true…that requires a Bayesian Approach

SIMULATION MODELLING IN NATURAL RESOURCE MANAGEMENT