1 / 21

Development of large-scale applications with Stata

Development of large-scale applications with Stata. Michael Lokshin, Sergiy Radyakin and Zurab Sajaia World Bank. Analytical work at the World Bank. Each year World Bank produces: 10-15 poverty assessments 5-10 Labor market studies 10 Education and Health assessments Gender studies

teenie
Télécharger la présentation

Development of large-scale applications with Stata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Development of large-scale applications with Stata Michael Lokshin, Sergiy Radyakin and Zurab Sajaia World Bank

  2. Analytical work at the World Bank • Each year World Bank produces: • 10-15 poverty assessments • 5-10 Labor market studies • 10 Education and Health assessments • Gender studies • Nutritional Studies • Reports on Social protection and Benefit-Incidence analysis, etc. • Most analytical work for these reports is done in Stata • Research Department (DECRG) of the World Bank develops new methods and tools that are used in these reports and need to be make accessible to a wide audience of practitioners of applied economic analysis

  3. Stata in the World Bank • Stata is the main statistical package used in the Bank • Hundreds of users both in the HQ and regional offices • Many users are short-term consultants with limited skills in Stata programming • Consultants are hired on a project and leave the Bank after the project is completed • Difficult to impose rules of a programming style, code documentation, archiving • Many Stata programs are lost or undocumented and are difficult to reuse • There is a need to automate the analytical work conducted in the Bank

  4. Stata routines developed in DECRG • Poverty analysis toolkit: • Growth-inequality decomposition (gedecomposition.ado) • Sectoral poverty decomposition (sedecomposition.ado) • Growth-incidence curves (gicurves.ado) • Stochastic dominance analysis (pov_robust.ado) • egen extension for inequality and poverty measures • Fast algorithm for calculation of Gini coefficients (fastgini.ado) • Applied Economic Research: • FIML algorithm of two-equation ordered probit models with endogeneity • FIML estimation of the endogenous switching regression model • Selection models based on ordered probit • Semi-parametric difference-based estimation of partial linear regression models • Selecting a subset of variables providing the model’s best fit • Efficient estimation of regressions based on pseudo-panel data • LOOKFOR_ALL - an extention of a Stata program lookfor • xml_tab.ado: Saving the outputs from Stata estimation procedures in Microsoft Excel • usespss.ado; use10.ado – read SPSS files into Stata; read Stata 10 files in Stata 9. • Many other Stata routines

  5. Automated Economic Analysis • Speed-up production of basic (required) results • Minimize human errors • To free resources for more meaningful and interesting tasks. • Easily introduce new techniques and methods • Allow easy replication of previous results • Generate standard, comparable results across the countries/years. • A tool for simulations • A tool for sensitivity analysis and training. • Helpful in situation of limited data access • Simple checking of previous reports/results • Minimize training time and skills requirements

  6. ADePT: Software platform for automated economic analysis Request for computations Stata Computation Kernel ADePT User Interface Output in XLS or PDF format xml_tab.ado Version 3: Customized Stata dialogs, classes Version 4: User interface in C# Set of Stata and MATA routines; plug-ins ~100,000 lines of code Multiple version support Team Development

  7. ADePT Solutions: • ADePT offers users a solution of a particular problem. • Modules of ADePT: set of analytical results (tables, graphs) sufficient to give an answer to a particular question. • Combination of software tools and the substantive contributions from the experts in a field. • Garry Fields (Cornell) : Labor • Martin Ravallion (WB) : Poverty • Adam Wagstaff (WB) : Health • Two main directions of ADePT: • Assessments of the current situation • Projections and simulations

  8. ADePT V4.0 • Accepts individual-level and household data in Stata and SPSS format. Uses Stata for computations. • Possibility of remote computing • No prior knowledge of Stata is required • Minimal data preparation • Extensive checks on possible problems with the data • Control for influential outliers • Tested on the datesets from more than 50 countries: LSMS, HBS, DHS • Estimated 500 users in the WB, international research institutions, universities, government agencies. • Expected increase in the number of users when new modules are released

  9. ADePTV4.0: The roadmap • ADePTPoverty: Public Release – June 2007 • ADePTMAPS: Public Release – October 2007 • ADePTLabor: Public Release – November 2007 • ADePTGender: Public Release – November 2008 • ADePTSocial Protection: Public release – June 2009 • ADePTEducation: Public Release – June 2009 • ADePTTargeting: Planned Release – August 2009 • ADePTPLINES: Development stage • ADePTHEALTH: Planned Release – August 2009 • ADePTInequality: Planned Release – August 2009

  10. ADePT: Website www.worldbank.org/adept Download: installation and updates, documentation, examples.

  11. Practical issues • Interface • Performance (-ftabstat2-) • Interaction/communication with other programs (IniFile.class, -smtp-) • Graphics (-twoway parea-, -amap-) • Custom file formats (-usespss-, -use10-) • Installation and updates (-pkg2script-) • Certification

  12. Practical issues: Interface • Dialogs in Stata can be created to facilitate the use of custom written commands. But they are highly oriented on forming a command line: command with parameters and options, not the full application interface. • Some additional features were added in Stata 10 to expand the dialog possibilities, but they are still very limited, and we had a constraint to remain compatible with Stata9.2. • After exhausting standard dialogs features of Stata we decided to remove the interface part into an external application written in C# (Microsoft Visual Studio). Released version 3.0 of ADePT used Stata dialogs

  13. Practical Issues: Interface Current version 4.0 of ADePT uses Windows forms for dialogs

  14. Practical Issues: Performance • Stata’s built in routines seem to be very efficient, but the code implemented in *.ado files is often quite slow. • In particular, -tabstat- has shown inadequate performance for our tasks despite of its simple nature. • It was rewritten as a plugin -ftabstat2- in C++ (Microsoft Visual Studio) and modified to suit our particular needs: it now returns means, totals, counts, and various proportions matrices for each specified variable with support of by()-rows and by()-cols • Trade-off: no MP because plugins are (currently?) single-threaded.

  15. Practical Issues: Communication Interaction/communication with other programs: we needed to solve two problems: • To provide an easy to handle job-file, which would contain the description of all the parameters and options for a large project (not possible to fit everything in command line). Transition from txt to ini-files. IniFiles.class • To provide communication between Stata and another program: while the computations are performed in Stata, the external interface part needs to be updated about the status of calculations. We solved this by writing a C++ plugin –smtp- (SendMessageToPipe), which utilizes Windows pipes for IPC

  16. Practical Issues: Graphics • We have faced some limitations of the Stata graphics. Some of them were circumvented with custom graphics commands or adaptations of existing commands (-twoway parea-). • We didn’t find any way to interact with the mouse in Stata graphics (version 9.2). • We decided to move our mapping program –amap- out of Stata to external program and communicate with it seamlessly via ini-files. Demonstration only, not actual data

  17. Practical Issues: File Formats • We needed to have a support of SPSS files in ADePT • We developed –usespss- plugin to import SPSS data to Stata • -usespss- was presented at SNASUG 2008 in Chicago and made available to the public immediately afterwards • We needed to provide Stata 9 users possibility to process datasets saved in Stata 10 format. • We developed (using Mata) a new command –use10- for this purpose. Available at SSC. http://repec.org/snasug08/radyakin_usespss.ppt findit usespss findit use10

  18. Practical Issues: Installation and Updates • We have experienced problems with installing and updating packages from our web site into Stata. • The problem was not due to Stata, but we received a number of very helpful responses from the StataCorp’s Tech Support Team on this issue. • Effectively, this problem ruled out -net install- • We have developed a tool -pkg2script- to create autonomous installations from one or more Stata packages with the help of NSIS installation system. • The tool will work in Windows only; empty path – take package from SSC • In theory, all SSC could be packed into one distributive like the one shown here:

  19. Practical Issues: Certification • We have faced the problem of verification of results. Checking the numbers by hand is long and unreliable. • We have included a test-mode for ADePT, where it: • launched from an external application (tests manager), • runs requested jobs, and • verifies the output against a predefined set of benchmarks, which were verified (confirmed by non-team members). • We monitor: whether the test succeeds (results are produced), whether the results are correct, and what time does it take to produce them. If the benchmark for the current test does not exist, ADePT will generate them from the current results, and verify against this saved output next time.

  20. Practical Issues: Wishes for Stata12 • Access to registry (at least read-only) to detect presence of other programs, their versions, and location. (Currently solved with a plugin). • IPC – pipes (currently solved with a plugin). • Preserve/restore to RAM (currently solved with a RAMDrive). • Extend plugins possibilities: allow execute commands like Mata can do it: stata(“command”). • Support of Cyrillics/Local fonts • Unicode??

More Related