Use of Large Databases for Research Reaping the benefits of your tax dollars Jeff Coben, MD December 12, 2007
Learning Objectives • Understand the strengths and limitations of using existing large databases for research • Gain exposure to the Healthcare Cost and Utilization Project databases through a review of several examples of prior research • Understand the process of accessing, obtaining, and analyzing existing large databases
Federal Investment in Research Databases • National or regional in scope • Rigorous and well-defined sampling and/or data collection methodologies • Often longitudinal, with ongoing data collection using standardized instruments • Public domain, with “wrap-around” services • Surveys, administrative data, census of providers
Behavioral Risk Factor Surveillance Survey National Survey on Drug Use and Health Healthcare Cost and Utilization Project National Survey of Child and Adolescent Well-Being National Health Interview Survey National Hospital Discharge Survey National Health and Nutrition Examination Surveys National Vital Statistics System Common Examples
Federal Investment • Federal intramural research staff are devoted to maintaining these databases • Intramural staff also increasingly involved in database dissemination activities
Reasons for Using Large Federal Databases for Research • National scope • Can study trends over time • Large sample sizes permit sub-analyses and multivariate analyses • Can obtain population-based estimates of disease • COST & EFFICIENCY
Research Process Choosing the research question Developing the protocol Pre-testing and revising the protocol Carrying out the study Analyzing the findings Drawing and disseminating the conclusions
Research Question • Do children with traumatic brain injury (TBI) benefit from “aggressive” intensive care management?
Management of Pediatric TBI • TBI is a leading cause of death among children • Variation in the management of critically ill TBI patients • Concerns over costs of aggressive management
Research Question - Do children with traumatic brain injury (TBI) benefit from “aggressive” intensive care management? Develop the Protocol • Operationalize terms • Study Design • Subjects • Variables (Predictor/Outcome) • Statistical Issues
Protocol Development • TBI case definition = severe brain injury requiring endotracheal intubation and mechanical ventilation • Aggressive management = insertion of intracranial pressure (ICP) monitor • Study Design = randomized trial
Study Design = RCT Child meets inclusion criteria ICU management ICU management + ICP • Outcomes • Mortality • Morbidity • Costs
Problems with RCT Design • Ethical? • Number of cases needed for prospective study (multi-site) • Time required to enroll sufficient sample • Cost of the study
Using Secondary Analysis • Secondary analysis is the reanalysis of data collected by another researcher or organization • The shortcut
Research Process Choosing the research question Pre-testing & revising the protocol Carrying out the study Drawing and disseminating the conclusions Developing the protocol Secondary data analysis Analyzing the findings
Variation in therapy and outcome for pediatric head trauma patientsTilford JM, et al. Crit Care Med 2005 • Study examined the incidence, use of procedures, and outcomes of critically ill children with TBI between 1988-1999 to describe the benefits of improved treatment • Hypothesis: more aggressive treatment (ICP monitoring) over time is associated with improved survival
Methods • Used the Nationwide Inpatient Sample database to identify all children 0-21 with TBI requiring endotracheal intubation • Used ICD-9-CM codes to identify use of ICP monitoring, calculate injury severity scores, and describe consciousness level
Changes in ICP Monitoring and Outcome: 1988-1999 ICP Monitoring Mortality Injury Severity Score 1988-1989-1990-1991-1992-1993-1994-1995-1996-1997-1998-1999
Secondary Analysis • Advantages: Speed and economy • Disadvantages: • No control over data variables • Compatibility between the available data and the research question
Compatibility Challenge • Since data already collected, can’t specify what you want • May require some modification of the original research question – or…. • May need to work backwards • Compatible with the researcher?
Primary Data Collection Research Question • Develop Protocol • Design • Subjects • Measures • Instruments Secondary Data Analysis • Data Source • Design • Subjects • Measures • Instruments Research Questions What questions could these data answer?
Finding Research Questions to Fit an Existing Data Base • Become familiar with the data content • Identify pairs or groups of variables whose association may be of interest • Review the literature to determine if these research questions are novel and important • Formulate specific hypotheses and statistical methods • Analyze the data
HEALTHCARE COST AND UTILIZATION PROJECT A Family of Databases, Tools and Products
Understanding Hospital Discharge Data • Hospitals create “discharge abstracts” on every patient seen • Original purpose was billing/reimbursement • Includes valuable information (>100 variables) • Patient demographics • Diagnoses, procedures, complications • Charges, length of stay, ICU days
Hospital Discharge Data • Individual discharge abstracts are computerized • State regulatory agencies require all hospitals to submit all discharge abstracts on a regular basis • Edit checks routinely performed, quality assurance, penalties for non-compliance
HEALTHCARE COST AND UTILIZATION PROJECT Partners Providing Data
HEALTHCARE COST AND UTILIZATION PROJECT HCUP Process HCUP Uniform Data
HEALTHCARE COST AND UTILIZATION PROJECT State Inpatient Databases (SID) Uniform Comprehensive hospital discharge data HCUP Uniform Data
State Inpatient Database (SID) • Complete data from 37 states • 90% of all hospital discharges in U.S. (N>30 million) • Example of research using the SID • Characteristics of motorcycle-related hospitalizations: Comparing states with different helmet laws • Coben, Steiner, and Miller. Accident Analysis & Prevention, 2007
Abstract This study compares U.S. motorcycle-related hospitalizations across states with differing helmet laws. Cross-sectional analyses of hospital discharge data from 33 states participating in the Healthcare Cost and Utilization Project in 2001 were conducted. Results revealed that motorcyclists hospitalized from states without universal helmet laws are more likely to die during the hospitalization, sustain severe traumatic brain injury, be discharged to long-term care facilities, and lack private health insurance. This study further illustrates and substantiates the increased burden of hospitalization and long-term care seen in states that lack universal motorcycle helmet use laws.
HEALTHCARE COST AND UTILIZATION PROJECT State Inpatient Databases (SID) Nationwide Inpatient Sample (NIS) • Sample of community hospitals from SID • Approximates 20% sample of community hospitals in the U.S. Uniform Comprehensive hospital discharge data HCUP Uniform Data
Nationwide Inpatient Sample (NIS) • Stratified sample of 994 hospitals from the 37 states contributing data to HCUP (N>7 million) • Designed for national and regional estimates • Example of research using the NIS • Rural-urban Differences in Injury Hospitalizations • Coben, Tiesman, Bossarte, and Furbee (in progress)
Unadjusted Injury Hospitalization for Selected Causes of Injury by Urbanicity, U.S. 2004
HEALTHCARE COST AND UTILIZATION PROJECT State Inpatient Databases (SID) Kids’ Inpatient Data Base (KID) • Sample of pediatric discharges from community hospitals in the SID Uniform Comprehensive hospital discharge data HCUP Uniform Data Nationwide Inpatient Sample (NIS) • Sample of community hospitals from SID • Approximates 20% sample of community hospitals in the U.S.
Kids’ Inpatient Database (KID) • Stratified sample of pediatric discharges from the SID (N=3 million) • Allows national and regional studies of inpatient hospital utilization and charges for children and adolescents • Example of research using the KID • National estimates of ATV injury hospitalizations in Children • Killingsworth JB, et al. Pediatrics, 2005
HEALTHCARE COST AND UTILIZATION PROJECT Kids’ Inpatient Data Base (KID) • Sample of pediatric discharges from community hospitals in the SID State Outpatient Databases (SOD) • State Ambulatory Surgery Data (SASD) • State Emergency Department Data (SEDD) HCUP Uniform Data State Inpatient Databases (SID) Nationwide Inpatient Sample (NIS) • Sample of community hospitals from SID • Approximates 20% sample of community hospitals in the U.S. Comprehensive hospital discharge data from states
State Ambulatory Surgery Databases (SASD) • Ambulatory surgery data provided by 19 states • Example of research using SASD • The Impact of Endometrial Ablation on Hysterectomy Rates in Women with Benign Uterine Conditions in the United States • Farquhar CM, et al. 2002
State Emergency Department Databases (SEDD) • Statewide ED data from 17 states • Example of research using SEDD • Hospital and Demographic Influences on the Disposition of Transient Ischemic Attack • Coben, Owens, Steiner, and Crocco. Academic Emergency Medicine, in press.
Objective: Determine factors responsible for the variation in Emergency Department disposition of TIA cases. Methods: All ED-treated TIA cases from hospitals in eleven states were identified from the Healthcare Cost and Utilization Project. Descriptive analyses compared admitted and discharged cases. Based on the results of the bivariate analyses, logistic regression models of the likelihood of hospital admission were derived, using a stepwise selection process. Adjusted risk ratios and 95% confidence intervals were calculated from the logistic regression models. Results: A total of 34,843 cases were identified in the 11 states, with 53% of cases admitted to the hospital. In logistic regression models differences in admission status were found to be strongly associated with clinical characteristics such as age and co-morbidities. After controlling for co-morbidities, differences in admission status were also found to be associated hospital type and with socio-demographic characteristics, including county of residence and insurance status. Conclusions: While clinical factors predictably and appropriately impact the ED disposition of patients diagnosed with TIA, several non-clinical factors are also associated with differences in disposition.
HEALTHCARE COST AND UTILIZATION PROJECT State Inpatient Databases (SID) State Ambulatory Surgery Data (SASD) AHRQ Central Distributor Data Use Agreement Public Researchers KID CD-SASD NIS CD-SID
HEALTHCARE COST AND UTILIZATION PROJECT HCUP Tools HCUP Research Products HCUPnet: An interactive, on-line query tool for HCUP data Clinical Classification Software (CCS): Clinical grouper of ICD-9-CM and ICD-10 codes AHRQ Quality Indicators: Measures of health care quality based on hospital inpatient data Comorbidity Software: Identifies comorbidities in hospital discharge records using ICD-9-CM codes and DRGs Products include: Research Studies Statistics and Fact Books on HCUP Data
Secondary Analysis of Large Research Databases Can… • Be used to test specific hypotheses • Improved outcomes with ICP monitoring • Be used for descriptive, epidemiological studies • Large (faculty): Firearm-related hospitalizations • Small (students): Rotavirus admissions • Generate pilot data for future investigations • ED prospective study on TIA disposition
Steps in the Process • Determine interest area • Search for existing databases • Learn the database • Data documentation manuals, CDs, web • Derive research question(s) • Conduct analyses • Statistical consultation, programming
Additional Tips • Contact intramural staff for advice • Be thorough with literature searches • Understand the limitations of the database • Find other publications using the database