Introduction to Statistical Computing in Clinical Research
390 likes | 495 Vues
Learn data management, basic statistical analysis, and data presentation in clinical research using STATA and Excel. Course includes lectures, labs, final project, and overview of statistical software packages.
Introduction to Statistical Computing in Clinical Research
E N D
Presentation Transcript
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1
Today... • Course overview • Course objectives • Course details: grading, homework, etc • Schedule, lecture overview • Where does Stata fit in? • Basic data analysis with Stata • Stata demos • Lab
Course Objectives • Introduce you to using STATA and Excel for • Data management • Basic statistical and epidemiologic analysis • Turning raw data into presentable tables, figures and other research products • Prepare you for Fall courses • Start analyzing your own data
Course details Introduction to Statistical Computing - 1 unit Schedule – 7 lectures, 7 lab sessions, on 7 Tuesdays in a row Dates: August 3 – September 14 Lectures 1:15-2:45 Labs 3:00-4:00 All in China Basin, CBL 6702 (+ 6704 for lab) Final Project Due 9/21/10
Course details Introduction to Statistical Computing Grading: Satisfactory/Unsatisfactory Requirements: -Hand in all six Labs (even if late) -Satisfactory Final Project -80% of total points Reading: Optional
Course Director Mark Pletcher Teaching Assistants Elizabeth Mileti – Section 1 Raman Khanna – Section 1 David Moskowitz – Section 2 Yvette Wild – Section 2 Lecturers Andy Choi Jennifer Cocohoba Lab Instructor Alan Bostrom Mandana Khalili Course details, cont
Lecture Extra-full this year! Labs PC vs. Mac (Section 1 and Section 2) All of Section 2 won’t fit into 6704… Course details, cont
Overview of lecture topics • 1- Introduction to STATA • 2- Do files, log files, and workflow in STATA • 3- Generating variables and manipulating data with STATA • 4- Using Excel • 5- Basic epidemiologic analysis with STATA • 6- Making a figure with STATA/Advanced Programming Topics • 7- Organizing a project, making a table
Overview of labs • Lab 1 – Load a dataset and analyze it • Lab 2 – Learn how to use do and log files • Lab 3* – Import data from excel, generate new variables and manipulate data, document everything with do and log files. • Lab 4 – Using and creating Excel spreadsheets • Lab 5* – Epidemiologic analysis using Stata • Lab 6 – Making a figure with Stata Last lab session will be dedicated to working on the Final Project * - Labs 3 and 5 are significantly longer and harder than the others
Overview of labs, cont • Official Lab time is 3:00-4:00, but we will start right after lecture, and you can leave when you are done.
Overview of labs, cont • Labs are due the following week prior to lecture. Labs turned in late (less than 1 week) will receive only half credit; after that, no points will be awarded. However, ALL labs must be turned in to pass the class (even if no points are awarded). • Lab 1 is paper • Labs 2-6 are electronic files, and should be emailed to your section leader’s course email address: biostat212_section1@yahoo.com (Elizabeth/Raman) or biostat212_section2@yahoo.com (David/Yvette)
Final Project • Create a Table and a Figure using your own data, document analysis using Stata. • Due 1 week after last lab session, 20 points docked for each 1 day late.
Course Materials • Online Syllabus (http://rds.epi-ucsf.org/ticr/syllabus/display.asp?academic_year=2010-2011&courseid=38) • Course Overview • Final Project • Miscellaneous handouts • Lectures and Labs/Datasets (“just in time”)
Getting started with STATA Session 1
Types of software packages used in clinical research • Statistical analysis packages • Spreadsheets • Database programs • Custom applications • Cost-effectiveness analysis (TreeAge, etc) • Survey analysis (SUDAAN, etc)
Software packages for analyzing data • STATA • SAS • S-plus, and R • SPS-S • SUDAAN • Epi-Info • JMP • MatLab • StatExact
Why use STATA? • Quick start, user friendly • Immediate results, response • You can look at the data • Menu-driven option • Good graphics • Log and do files • Good manuals, help menu
Why NOT use STATA? • SAS is used more often? • SAS does some things STATA does not • Programming easier with S-plus and R? • R is free • Complicated data structure and manipulation easier with SAS? • Epi-info (free) is even easier than STATA?
STATA – Basic functionality • Holds data for you • Stata holds 1 “flat” file dataset only (.dta file) • Listens to what you want • Type a command, press enter • Does stuff • Statistics, data manipulation, etc • Shows you the results • Results window
Demo #1 • Open the program • Entering vs. loading data • Look at data • Run a command • Orient to windows and buttons
Two basic windows Command Results Optional windows Variable list History of commands Other functions Data browser/editor Do file editor Viewer (for log, help files, etc) STATA - Windows
STATA - Buttons • The usual – open, save, print • Log-file open/suspend/close • Do-file editor • Browse and Edit • Break
STATA - Menus • Almost every command can be accessed via menu
Menu advantages Look for commands you don’t know about See the options for each command Complex commands easier – learn syntax Command line advantages Faster (if you know the command!) “Closer” to the program Only way to write “do” files Document and repeat analyses Menu vs. Command line
Demo #2 • Load a STATA dataset • Explore the data • Describe the data • Answer some simple research questions • Gender, BMI, blood pressure
STATA commandsDescribing your data • describe [varlist] • Displays variable names, types, labels • list [varlist] • Displays the values of all observations • codebook [varlist] • Displays labels and codes for all variables
STATA commandsDescriptive statistics – continuous data • summarize [varlist] [, detail] • # obs, mean, SD, range • “, detail” gets you more detail (median, etc) • ci [varlist] • Mean, standard error of mean, and confidence intervals • Actually works for dichotomous variables, too.
STATA commandsGraphical exploration – continuous data • histogramvarname • Simple histogram of your variable • graph box varlist • Box plot of your variable • qnorm varname • Quantile plot of your variable to check normality
STATA commandsDescriptive statistics – categorical data • tabulate [varname] • Counts and percentages • (see also, table - this is very different!)
STATA commandsAnalytic statistics – 2 categorical variables • tabulate [var1] [var2] • “Cross-tab” • Descriptive options , row (row percentages) , col (column percentages) • Statistics options , chi2 (chi2 test) , exact (fisher’s exact test)
Getting help • Try to find the command on the pull-down menus • Help menu • If you don’t know the command - Search... • If you know the command - Stata command... • Try the manuals • more detail, theoretical underpinnings, etc
STATA commandsAnalytic statistics – 1 categorical, 1 continuous
STATA commandsAnalytic statistics – 1 categorical, 1 continuous • bysortcatvar: summarize [contvar] • mean, SD, range of one in subgroup • ttest [contvar], by(catvar) • t-test • oneway [contvar] [catvar] • ANOVA • table [catvar] [, contents(mean [contvar]…) • Table of statistics
STATA commandsAnalytic statistics – 2 continuous • scatter [var1] [var2] • Scatterplot of the two variables • pwcorr [varlist] [, sig] • Pairwise correlations between variables • “sig” option gives p-values • spearman [varlist] [, stats(rho p)]
In Lab Today… • Expect some chaos! • IT will be here to help with wireless, logins, etc • Familiarize yourself with Stata • Load a dataset • Use Stata commands to analyze data and fill in the blanks
Next week • Do files, log files, and workflow in Stata • Find a dataset!
Website addresses • Course website • http://www.epibiostat.ucsf.edu/courses/schedule/biostat212.html • Computing information • http://www.epibiostat.ucsf.edu/courses/ChinaBasinLocation.html#computing • Download RDP for Macs (for Stata Server) • http://www.microsoft.com/mac/otherproducts/otherproducts.aspx?pid=remotedesktopclient • Citrix Web Server • http://apps.epi-ucsf.org/