1 / 249

Minnesota AD Model Builder Short Course October 22-24, 2007

Minnesota AD Model Builder Short Course October 22-24, 2007. Thanks to Jim Bence, Brian Linton, and Brian Irwin for providing materials used in previous courses QFC Supporting Partners – MSU, GLFC, Michigan DNR, Minnesota DNR, Ohio DNR, New York DEC, Illinois DNR, Ontario MNR.

Télécharger la présentation

Minnesota AD Model Builder Short Course October 22-24, 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Minnesota AD Model Builder Short CourseOctober 22-24, 2007 • Thanks to Jim Bence, Brian Linton, and Brian Irwin for providing materials used in previous courses • QFC Supporting Partners – MSU, GLFC, Michigan DNR, Minnesota DNR, Ohio DNR, New York DEC, Illinois DNR, Ontario MNR

  2. Quantitative Fisheries Center (QFC) • Created July 2005 • Co-directors: Jim Bence and Mike Jones • Staffing: • Associate Director • Computer Programmer • Post-Docs (2) • Graduate students (3 - PhD; 3 - MS)

  3. Quantitative Fisheries Center (QFC) • Provide research, outreach, and educational services to supporting partners • Outreach examples • Computer programming support to Michigan DNR inland creel database • SCAA consultation for Lake Erie percid assessments • River classifications in MI, WI, NY, PA • Power analysis for OhDNR Lake Erie gill net surveys

  4. Quantitative Fisheries Center (QFC) • Education • AD Model Builder short courses taught in East Lansing (2006, 2007) and Cornell Biological Field Station (2007) • Online Maximum Likelihood Estimation course (launched October 16, 2007) • Introduction to R short course (currently being converted to an online format) • Online Resampling Approaches to Data Analysis course (planned for summer 2008)

  5. How this course will differ from previous offerings • More emphasis on straightforward applications • More hands on programming (coding the whole program rather than only bits and pieces) • Less emphasis on coding efficiency (comes with practice)

  6. What is AD Model Builder and why should you use it? • Auto Differentiation Model Builder • Software for creating computer programs to estimate parameters of statistical models

  7. What are the advantages of using it? • Fast and accurate • Flexible • Designed for general maximum likelihood problems • Libraries for Bayesian and robust estimation methods • Includes many advanced programming options (estimation in phases) • Multi-dimensional arrays

  8. How fast is it? • Evaluation by Schnute and Olsen • 100 parameter catch-at-age model from Schnute and Richards (2005)

  9. Why is it so fast? • Auto differentiation – a method for approximating derivatives to within numerical precision • Most other computer programs actually calculate derivatives with respect to every parameter (finite differences) • Newton-Raphson – requires first and second order derivatives • Levenberg-Marquardt – requires first order derivatives

  10. What are some of the most noticeable differences with other software packages? • Users must specify the objective function to be minimized (Note: ADMB only does minimization)

  11. Objective function Parameter value

  12. ADMB Differences with SAS data lenweight; input length weight; datalines; 358 212 360 242 382 402 388 285 394 325 . . . • 12542 • 15909 ; Run;

  13. ADMB Differences with SAS procnlin data=lenweight; parameters a=0 b=3; model weight=a*length**b; run; Proc NLIN estimates parameters by (weighted) least squares; minimize the sum of square errors

  14. ADMB Differences with SAS procnlmixed data=lenweight; ypred=alpha*length**beta; parms alpha=0.001, beta=3, sigma=1; model weight~normal(ypred,sigma); run; Proc NLMIXED estimates parameters by maximum likelihood

  15. ADMB Differences with SAS procnlp data=lenweight tech=newrap inest=par1 outest=opar1 maxiter=1000; parms a, b, sigma; ypred=a*Length**b; nlogl = log(sigma)+0.5*((weight-ypred)/sigma)**2; min nlogl; run; Proc NLP (NonLinear Programming) in SAS/OR is an estimation method similar in vein to that of ADMB in that analysts must specify their objective function

  16. What are the most striking differences with other packages? • Users specify the objective function to be minimized • Steps to running • Create an ADMB template • Convert template to C++ code • Compile – convert from programming code to machine code (creates an executable file) • Link the executable file to C++ libraries • Run your executable file • Resulting executable can be run on similar datasets on any computer

  17. What are the difficulties associated with using ADMB? • Requires a more intimate knowledge of statistical theory (probability distributions, likelihoods, Hessians) • Some knowledge of C++ is required • Code can be a little quirky (as you will soon see)

  18. ADMB Files Input .tpl – make the model .dat – input data .pin – initial values (optional; need to specify for all parameters) Output .par – parameters estimates .cor – correlation of parameters .std – parameter estimates with std. deviations .rep – user-defined outputs (optional)

  19. ADMB Files Input ADMB will expect .dat and .pin files to have same name as .tpl e.g., MilleLacs.tpl, MilleLacs.dat (this can be overridden) Output • By default, output files will have same file name e.g., MilleLacs.rep, MilleLacs.par (this can be overridden) • Note: In the project folder, • ignore the files with the extra ~ on the extension… • e.g., Oneida.tpl~ • they are temporary files (so be sure you open the right file).

  20. .dat file • Simply contains the data you will use when fitting your model Simple.dat #Simple linear regression example #For ADMB Short Course 1, August 2007 #Created by D. Fournier, modified by B. Linton #Any text after "#" is ignored # number of observations 10 # observed Y values 1.4 4.7 5.1 8.3 9.0 14.5 14.0 13.4 19.2 18 # observed x values -1 0 1 2 3 4 5 6 7 8

  21. Each must be written just like that .tpl Sections • DATA_SECTION • PARAMETER_SECTION • INITIALIZATION_SECTION • PROCEDURE_SECTION • REPORT_SECTION • Other commonly used section • PRELIMINARY_CALCS_SECTION • LOCAL_CALCS

  22. Keep in mind • Different sections use different programming languages • Data, Parameter, Initialization sections used ADMB code • Procedure, Report, Local Calcs, Preliminary Calcs sections use C++ code • Lines typically must end with ; • Not absolute as in SAS (loops, conditional statements)

  23. Keep in mind • Comments in .dat file are specified with ‘#” • Comments in .tpl are specified with ‘//’

  24. Keep in mind • Section heads (DATA_SECTION, PARAMETER_SECTION) must be left justified • Except LOCAL_CALCS section, requires one space before typing LOCAL_CALCS • All other lines should have two spaces before the text

  25. .tpl Sections • DATA_SECTION Identify values that will be read-in from .dat file Need to consider the order of numbers in your .dat file Can read your data in as integers, real numbers, matrices, arrays,… DATA_SECTION init_int first_year init_int last_year init_int first_age init_int last_age init_number lambda init_matrix obs_length(first_year,last_year,first_age,last_age)

  26. .tpl Sections • DATA_SECTION Also where you can declare your looping variable; valid throughout your entire code DATA_SECTION init_int first_year init_int last_year init_int first_age init_int last_age init_number lambda init_matrix obs_length(first_year,last_year,first_age,last_age) int i int j

  27. .tpl Sections If .dat doesn’t have the same name as .tpl • DATA_SECTION • Assume program is MyModel.tpl • Then, default search is for MyModel.dat • Code below will read-in a file named ControlFile.dat: • !!ad_comm::change_datafile_name("ControlFile.dat"); • Can also go back: • !!ad_comm::change_datafile_name(“MyModel.dat"); !! – tells ADMB that what follows is C++ code

  28. .tpl Sections Always a good idea to verify that your data have been read in correctlyIn .dat file, have -8888 as your last entryIn Data_section, specify init_int test as the last read in variable and type!!cout << test << endl;!!exit(99); • DATA_SECTION

  29. .tpl Sections DATA_SECTION//Read data in from simple.dat init_int nobs //number of observations init_vector Y(1,nobs) //observed Y values init_vector x(1,nobs) //observed x values init_int test //test variable !!cout << test << endl; !!exit(99); • DATA_SECTION

  30. .tpl Sections • DATA_SECTION • PARAMETER_SECTION • Define Parameters – the values to be estimated (must have at least 1) • use loge scale, if only interested in non-negative parameter space • Identified by the prefix init_ • Intermediary Variables - quantities that will change as a result of parameter estimation • Can also declare index variables here. • Also, if “containers” are needed just for output and not for calculations, then put those here too. • Name your Objective Function – the quantity to be minimized

  31. .tpl Sections • DATA_SECTION • PARAMETER_SECTION PARAMETER_SECTION //Parameters to be estimated init_number a //slope parameter init_number b //intercept parameter //Quantities calculated from parameters vector pred_Y(1,nobs) //predicted Y values //Value to be minimized by ADMB objective_function_value rss //residual sum of squares

  32. Keep in mind • Init_ in DATA_SECTION indicates a value that will be read in from the .dat file • Init_ in PARAMETER_SECTION specifies a variable that will be estimated

  33. .tpl Sections • DATA_SECTION • PARAMETER_SECTION • INITIALIZATION_SECTION Set Initial values for parameters - use in place of .pin file log_F -1.0 log_M -1.6

  34. .tpl Sections • DATA_SECTION • PARAMETER_SECTION • INITIALIZATION_SECTION • PROCEDURE_SECTION Back transform parameters for use in functions (if needed) e.g., F = exp(log_F) Construct Functions Specify the equation for your Objective function Must have a PROCEDURE_SECTION for model to compile

  35. .tpl Sections • PROCEDURE_SECTION DATA_SECTION init_int nobs //number of observations init_vector Y(1,nobs) //observed Y values init_vector x(1,nobs) //observed x values PARAMETER_SECTION init_number a //slope parameter init_number b //intercept parameter vector pred_Y(1,nobs) //predicted Y values objective_function_value rss //residual sum of squares PROCEDURE_SECTION //Simple linear model gives predicted Y values pred_Y=a*x+b; //Parameter estimates obtained by minimizing //objective function value (residual sum of squares) rss=norm2(Y-pred_Y); //norm2(x)=x1^2+x2^2+...+xn^2

  36. .tpl Sections • DATA_SECTION • PARAMETER_SECTION • INITIALIZATION_SECTION • PROCEDURE_SECTION • REPORT_SECTION Specify output to go to .rep file Be sure to end .tpl with an empty line (hard return)

  37. .tpl Sections • Report section useful for reporting values not otherwise needed in the model • Can be organized in many ways • Can still do calculations in REPORT_SECTION • e.g., report<< “S: ” << exp(-Z) <<endl; • Results (.rep file) can be read into other programs

  38. Create an Output .dat file Append to file • Use an output file stream ofstream ofs(“MyOutput.dat”,ios::app); { ofs << “Output variable x: “ << x << endl; ofs << “Output variable y: “ << y << endl; } • Also can delete a file system(“del MyOutput.dat); Note: different system command for Linux

  39. Other .tpl Sections • PRELIMINARY_CALCS_SECTION Uses C++ code Can do some preliminary calculations and manipulations with the data before getting into the model proper e.g., pi = 3.14; • RUNTIME_SECTION • Change behavior of function minimizer • TOP_OF_MAIN_SECTION • Change AUTODIFF global variables

  40. Compare with SAS code DATA GROWTH; INPUT AGE LENGTH GENDER; DATALINES; PROCNLIN DATA=GROWTH METHOD=MARQUARDT; PARAMETERS LINF1 = 1100 K1=0.4 T01=0.0; YPRED= LINF1*(1-EXP(-K1*(AGE-T01))); MODEL LENGTH = YPRED; OUTPUT OUT=DATA_OUT PRED=PP RES=RR; RUN; Data Section Runtime Section Initialization Section Prelim Calcs Section Procedure Section Report Section

  41. .tpl File • General rule: make .tpl file as general as possible (try to avoid hard coding) – will allow you to analyze future datasets • Must be “compiled” into C++ code 1) tpl2cpp (makes .cpp file) 2) compile (makes .exe file) 3) link (connects libraries) • We’ll use Emacs (more later)

  42. Compiling your .tpl • Need a C++ compiler to run your code • After it is compiled, model will be a .exe • (so can be run on machines without ADMB) • If you change the .tpl file, it must be recompiled… • If you change and save data (values, sometimes dimensions), the existing model will still be ready to go… • So, advantage to putting starting values, ect…, into .dat or .pin files.

  43. How should I build my tpl Suggestions • Keep projects in separate folder • Name, describe, and date each file at the top • Start with a simple working program • Be sure data get read in correctly • Use unique names for files and parameters (don’t use “catch” as a variable name) • Avoid “hard coding” … make it flexible • Build it one step at a time • COMMENT, COMMENT, COMMENT

  44. About Emacs • For this class, you will Emacs to construct your .tpl file • A highly customizable text editor • We have modified Emacs so that an ADMB .tpl file is automatically linked to a C++ compiler • MINGW32 is a freeware C++ compiler – don’t need to buy both ADMB and Visual Studio

  45. Using Emacs • Refer to Emacs Basics handout • Hotkeys are different • e.g., “control-v” will not paste • Highlighting text will automatically copy it • Remember to save files and recompile .tpl

  46. Let’s Try an Example Simple linear regression model Estimation by least squares

  47. Let’s Try an Example • Start Emacs by double clicking the Emacs icon on the desktop

  48. Let’s Try an Example • Open the simple.tpl and simple.dat files in the MNADMB folder located on your desktop

More Related