220 likes | 379 Vues
On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models. Doug White Argonne Lab contributors: Tom Uram , Lukasz Lacinski , and Rachana Ananthakrishnan , Implementing R code by Anthon Eff and mathematical modeling by Malcolm Dow
E N D
On-line Classrooms with Gateway R Interfaces, Open-access Data, and Sharing Models Doug White Argonne Lab contributors: Tom Uram, Lukasz Lacinski, and RachanaAnanthakrishnan, Implementing R code by Anthon Eff and mathematical modeling by Malcolm Dow Models exemplified for Binford foragers (LRB), the comparison of variables from two separate Moral Gods models (SCCS), and Reincarnation beliefs (SCCS)
Later versionwith a new project youtube, bearing instructions Clicking the bolded option “DEf01D Dow Eff” brings up the menu for modeling
Access to CoSSci • We urge conferees to hear about sharing results from the youtube at http://SocSciCompute.ss.uci.edu/by • LACINSKI, Łukasz (ARGONNE) How to share CoSScihistories • and http://socscicompute.ss.uci.edu/history/list_published by the fall 2014 class taught by RenFeng, Xiamen University • … and the youtube talks for the the SASci session at http://SocSciCompute.ss.uci.edu/ by • URAM Thomas, LACINSKI Łukasz, ANANTHAKRISHNAN Rachana (ARGONNE) and WILKINS-DIEHR Nancy (SDSC) Present and Future of High Performance Computing in Anthropology and the Social Sciences (this talk will be available by March 21) • ROBERTS Wesley, Evolution of Religion
Access to Data and Software • The Dow-Eff functions are written in the R language, and are available in an R workspace which can be loaded in an R GUI such as RStudio. Based on ideas developed by Malcolm M. Dow and E. Anthon Eff, the functions will estimate OLS, logit, and multinomial logit models, using multiple imputation to handle the problem of missing data, and network lag terms to handle Galton’s Problem. • Using R scripts with the SCCS Using R scripts with the LRB Using R scripts with the EA Using R scripts with the WNAI • Using R scripts with the XC (merged 371 society dataset) • Dow’s initial work on network lag models: • Dow, M. M., Burton, M. L., White, D. R., & Reitz, K. (1984). Galton’s Problem as network autocorrelation. American Ethnologist, 11, 754-770. (Link) • Dow, M. M. (2007). Galton’s Problem as multiple network autocorrelation effects. Cross-Cultural Research, 41, 336-363. (Link) • Dow and Eff on the prevalence of autocorrelation in cross-cultural data: • Eff, E. Anthon. 2004. “Does Mr. Galton Still Have a Problem?: Autocorrelation in the Standard Cross-Cultural Sample.” World Cultures. 15(2):153-170. (Link) • Eff, E. Anthon. Spatial and Cultural Autocorrelation in International Datasets. MTSU Department of Economics and Finance Working Papers. February 2004. (Link) • Eff, E. Anthon. Spatial, Cultural, and Ecological Autocorrelation in U.S. Regional Data. MTSU Department of Economics and Finance Working Papers. September 2004. (Link) • Dow, Malcolm M., and E. Anthon Eff. 2008. “Global, Regional, and Local Network Autocorrelation in the Standard Cross-Cultural Sample.” Cross-Cultural Research. 42(2):148-171. (Link) • Etc. The CoSSci Gateway also accesses the R code, data, variable labels, etc. for classroom use • The intersciwiki is also a repository for all these materials and for sharing results. • abbreviation dataset codebook • WNAI Western North American Indians codebook • SCCS Standard Cross-Cultural Sample codebook • EA Ethnographic Atlas codebook • LRB Lewis R. Binford's forager data codebook • XC Merged 371 society data codebook
Access to Variables and Codebook http://capone.mtsu.edu/eaeff/downloads/LRBcodebook.html Example: Lewis R Binford (LRB) Foragers sample with DEf software at CoSSci (or home R gui) load(url("http://dl.dropbox.com/u/9256203/DEf01d.Rdata"), .GlobalEnv) setDS("LRB") names(dx) … e.g., portion of list of variables for Lewis R. Binford data [429] "s_bulk_density" "s_oc" "s_ph_h2o" "s_cec_clay" [433] "s_cec_soil" "s_bs" "s_teb" "s_caco3" [437] "s_caso4" "s_esp" "s_ece" "su_symbol" [441] "su_value" "sq1" "sq2" "sq3" [445] "sq4" "sq5" "sq6" "sq7" [449] "dicgsh1a" "dicgsh1a.flag" "etmnts2a" "etmnts2a.flag" [453] "g12igb3a" "g12igb3a.flag" "twisre3a" "twisre3a.flag" [457] "l3pobi3b" "l3pobi3b.flag" "l3pobi3b.navn" "opisre2a" [461] "opisre2a.flag" "geaisg3a" "geaisg3a.flag" "geaisg3a.navn" [465] "glcjrc3a" "glcjrc3a.flag" "glcjrc3a.navn" "inmsre3a" [469] "inmsre3a.flag" "inssre2a" "inssre2a.flag" "evmmod2a" [473] "evmmod2a.flag" "lammod3a" "lammod3a.flag" "anntotprecip" [477] "anntotprecip.flag" "avgannrh" "avgannrh.flag" "avgannrunoff" [481] "avgannrunoff.flag" "evapotrans" "evapotrans.flag" "gdd" [485] "gdd.flag" "npp" "npp.flag" "pevapotrans" [489] "pevapotrans.flag" "potentialveg" "potentialveg.flag" "potentialveg" [493] "snowdepth" "snowdepth.flag" "soilmoisture" "soilmoisture" [497] "suit" "suit.flag" "eaid" "lrbid" [501] "sccsid" "wnaiid" "xcid" "awc" [505] "society" "dxid"
CoSSciAccess Window for LRB at http://socscicompute.ss.uci.edu Choose DEf01d Choice of LRB (Binford Hunters and Gatherers) 75: Maps: Dummy Variables (such as dx$v279.d1) 74: Cases: Dep Variable dens1 = 142 forager subsample 73: h[ ]: in CSV Indep Variables fishing,gathering,rlow,temp,lbio5 diskette: CSV Unrestricted Vars dspmov,numfam,numg3 (helped to start with covariates) (i) errors/R code New variables and their Definitions: None below o green circle = numbering, e.g., 70-73-76 are successive runs Repeat with changes
CoSSci Access Window for LRB at http://socscicompute.ss.uci.edu This screen appears after pressing the o green circle = Repeat with changes from the preceding window
Defining variables in CoSSci windows • Making variables into dichotomies (at top of previous screen) • Enter DUMMY VARIABLES v279.d1,v213.d3,v279.d5,v1127.d2 • New variables(at bottom of previous screen) VARIABLE dx$v2013pos (drops -1 value in Wallace/Roberts/EvoReligion) COMPUTE df <- dx$v2013 ; df[df == -1] <- NA ; dx$v2013pos <- df VARIABLE dx$plow COMPUTE dx$plow ; dx$plow <- (dxv243>1)*1 (2=at period of observation 3=aboriginal) • Defining variables by interactions (at bottom of previous screen) VARIABLE dx$AnimXbwealth COMPUTE dx$v206*(dx$v208==1)*1 (product of two variables) • Squared variables VARIABLE dx$Sqv206 COMPUTE dx$v206^2 (square of one variables) • Subsetting a Sample (e.g., LRB) • dx$fish1<-dx$fishing ; z<-which(dx$hg142!=1) ; is.na(dx$gath1[z])<-TRUE
UCI VM – quick turnaround & myOutput(time to construct the R Gui code is much longer than CoSSci): - Names for Access to Specific Sets of Output at CoSSci and the DEf Laptop R Gui • h[1] depvar • H[2] UR Unrestricted model variables (may be needed for covariates in regression) • H[3] UR model.varbs • h[4] R Restricted model • H[5] Endogeneity tests in Endogeneous variables specified • h[6] Diagnostics: Heteroskedasticity, Hausman, and other tests • H[7] autocorrelation percentages (distance, language, ecology) and R squareds • H[8] descriptive statistics • h[9] totry (possibly add1 to model) important to improving model • h[10] didwell • H[11] Used these • H[12] dfbetas for analysis of outliers • H[13] imputed data used in second-level analysis (e.g., moral gods model)
The http://SocSciGate.oit.uci.eduDEfWindow@VM runs in 2 minutes; while SDSC is slow it does complex operations. Bothreturn myOutput.csv For large or small online or in-class courses, instructors receive a history of the student’s runs on CoSSci (The yellow colorshows the student has clicked the blue EXECUTE button of this model, in white letters, and that the computer is now starting to run the model. If it turns to RED there’s an error)
Using “To Try” h[10] iteratively • http://socscicompute.ss.uci.edu/history/list_publishedshows the use of “To Try” by the fall 2014 Xiamen University students taught by RenFeng. Each iteration is a choice of which variables in the output “To Try” list are likely to be good choices to test towards a finished model.
Bayesian Network Learning Results using library(bnlearn) in comparing two Moral Gods models Bayesian Network Learning Results p.13 AnimXbwealth HiGod0 1 2 3 4 5 7 8 9 1 54 7 6 1 0 0 0 1 0 2 40 6 5 0 0 0 0 0 0 3 13 1 4 3 1 0 1 0 0 4 21 2 0 9 3 1 0 3 4 HiGod FxCmtyWages 1 2 3 4 0 69 51 10 43 1 0 0 13* 0 3=*unconcerned with humans (neither Islam nor Xianity) 4=supportive of morality White, Oztan & Snarey (2014) Brown & Eff (2010) Writing & Records HiGod1 2 3 4 5 1 35 16 10 0 8 2 25 17 6 0 3 3 7 9 3 2 2 4 6 7 2 10 18
Paul Rodriguez SDSCBioconductor.blocLite.Rlibrary(bootstrap) blocLite(Rgraphviz)V=letters[1:10]M=1:4g1=randomGraph(V,M,0.2)plot(g1)Probabilities generated by bootstrap run on SDSC Trestles supercomputer 1695=No Scarification, 270=Class stratification
CoSSciCSV output = R guioutput h[4], h[6], h[7] for reincarnation model Screen_Shot_2014-02-21_at_2.51.04_PM.png
CoSSci Maps, e.g., Reincarnation beliefs Make with new CoSSci so name will show
CoSSci Maps, e.g., Reincarnation beliefsclassical Karmic reincarnation in yellow, red: 5,6
geographic maps of background variables with White-Murdock alignments (intersciwiki) Moral Gods HiGod4 SCCS numbering Female Equality v626 Plow v243
Otherfeatures in CoSSci, e.g., library(psych), Matrix Algebra in R p14-15 https://personality-project.org/r/sem.appendix.1.pdfpairs.panels(vars) These variables are ones used by Amber Johnson in analysis of the LRB data on 399 hunters and gatherers (Lew Binford 2001)