140 likes | 330 Vues
Collaborative Data Management for Longitudinal Studies. Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported by National Institute on Aging Grant P01 AG18911-01A1). Agenda. 1. Background on Study. 2. Problem – Data Management Deficiencies.
E N D
Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported by National Institute on Aging Grant P01 AG18911-01A1)
Agenda 1. Background on Study 2. Problem – Data Management Deficiencies 3. Solution – Collaborative Data Management 4. STATA Programs – maketest & makedata
Background on Study • NIH-funded Longitudinal Study • Loneliness & Health • Thousands of Measures • Loneliness • Depression • 230 subjects • Repeated Yearly
Problem – Data Management Deficiencies • Code Not Modular …Difficult to manage the data cleaning code …Limited code reuse from year to year …Difficult to collaborate among interns • No Established Set of Data Cleaning Steps …Difficult for research assistants (turn-over) …Inconsistent data cleaning techniques …Data cleaning code difficult to read
Problem – Data Management Deficiencies Research Assistant Research Assistant Research Assistant Core File Set Research Assistant Research Assistant
Solution – Collaborative Data Management • Process • Established Steps • File System Layout • Automated Tests • Collaboration • Concepts • Module • Batch • “Data Certification” • STATA Programs • maketest • makedata
Solution – Collaborative Data Management • Process • Established Steps • File System Layout • Automated Tests • Collaboration • Concepts • Module Ex:loneliness • Batch • “Data Certification” • STATA Programs • maketest • makedata
Solution – Collaborative Data Management • Process • Established Steps • File System Layout • Automated Tests • Collaboration • Concepts • Module Ex:loneliness • Batch Ex:yr1, yr2, yr3 • “Data Certification” • STATA Programs • maketest • makedata
Solution – Collaborative Data Management Set of Files for Each Module acquire-[module].do & fix-[module].do test-[module].do derive-[module].do label-[module].do Year-Specific 60% Code Reuse – Files Shared Between Years Acquire & Fix Test Derive Label
STATA Program – maketest • Purpose: • Auto-generation of Data Certifying Tests • Functionality: • Tests Variable Type • Checks Consistency of Value Labels • Verifies Existence of Variable
STATA Program – maketest • Syntax: • maketest [varlist] using, [REQuire(varlist) append replace] • Example: • maketest using filename.do, replace • Options: • using: specifies file to write • REQ: requires presence of variables in list • append: add to existing test .do file • replace: overwrite existing .do file
STATA Program – makedata “Bringing it all together”
STATA Program – makedata • Syntax: • makedata [namelist], Pattern(string) [replace clear Noisily Batch(namelist) TESTonly] • Example: • makedata ats, p("acquire-*.do") b(yr1) clear replace • Options: • p: pattern – file naming convention • replace: overwrite existing data file • clear: clear current data in memory • Noisily: full output (default = summary) • b: batch – year, wave, center • TESTonly: only run tests step
Other Applications • Beyond Longitudinal Data • Teaching Data Cleaning with STATA • Contact Information • Stephen Brehm: sbrehm@uchicago.edu • L. Philip Schumm: pschumm@uchicago.edu • Ronald A. Thisted: thisted@health.bsd.uchicago.edu • Supported by National Institute on Aging Grant P01 AG18911-01A1