Introduction to Statistical Computing in Clinical Research Biostatistics 212 Lecture 1
Today... • Course overview • Course objectives • Course details: grading, homework, etc • Schedule, lecture overview • Where does Stata fit in? • Basic data analysis with Stata • Stata demos • Lab
Course Objectives • Introduce you to using STATA and Excel for • Data management • Basic statistical and epidemiologic analysis • Turning raw data into presentable tables, figures and other research products • Prepare you for Fall courses • Start analyzing your own data
Course details Introduction to Statistical Computing - 1 unit Schedule – 7 lectures, 7 lab sessions, on 7 Tuesdays in a row Dates: August 3 – September 14 Lectures 1:15-2:45 Labs 3:00-4:00 All in China Basin, CBL 6702 (+ 6704 for lab) Final Project Due 9/21/10
Course details Introduction to Statistical Computing Grading: Satisfactory/Unsatisfactory Requirements: -Hand in all six Labs (even if late) -Satisfactory Final Project -80% of total points Reading: Optional
Course Director Mark Pletcher Teaching Assistants Elizabeth Mileti – Section 1 Raman Khanna – Section 1 David Moskowitz – Section 2 Yvette Wild – Section 2 Lecturers Andy Choi Jennifer Cocohoba Lab Instructor Alan Bostrom Mandana Khalili Course details, cont
Lecture Extra-full this year! Labs PC vs. Mac (Section 1 and Section 2) All of Section 2 won’t fit into 6704… Course details, cont
Overview of lecture topics • 1- Introduction to STATA • 2- Do files, log files, and workflow in STATA • 3- Generating variables and manipulating data with STATA • 4- Using Excel • 5- Basic epidemiologic analysis with STATA • 6- Making a figure with STATA/Advanced Programming Topics • 7- Organizing a project, making a table
Overview of labs • Lab 1 – Load a dataset and analyze it • Lab 2 – Learn how to use do and log files • Lab 3* – Import data from excel, generate new variables and manipulate data, document everything with do and log files. • Lab 4 – Using and creating Excel spreadsheets • Lab 5* – Epidemiologic analysis using Stata • Lab 6 – Making a figure with Stata Last lab session will be dedicated to working on the Final Project * - Labs 3 and 5 are significantly longer and harder than the others
Overview of labs, cont • Official Lab time is 3:00-4:00, but we will start right after lecture, and you can leave when you are done.
Overview of labs, cont • Labs are due the following week prior to lecture. Labs turned in late (less than 1 week) will receive only half credit; after that, no points will be awarded. However, ALL labs must be turned in to pass the class (even if no points are awarded). • Lab 1 is paper • Labs 2-6 are electronic files, and should be emailed to your section leader’s course email address: email@example.com (Elizabeth/Raman) or firstname.lastname@example.org (David/Yvette)
Final Project • Create a Table and a Figure using your own data, document analysis using Stata. • Due 1 week after last lab session, 20 points docked for each 1 day late.
Course Materials • Online Syllabus (http://rds.epi-ucsf.org/ticr/syllabus/display.asp?academic_year=2010-2011&courseid=38) • Course Overview • Final Project • Miscellaneous handouts • Lectures and Labs/Datasets (“just in time”)
Getting started with STATA Session 1
Types of software packages used in clinical research • Statistical analysis packages • Spreadsheets • Database programs • Custom applications • Cost-effectiveness analysis (TreeAge, etc) • Survey analysis (SUDAAN, etc)
Software packages for analyzing data • STATA • SAS • S-plus, and R • SPS-S • SUDAAN • Epi-Info • JMP • MatLab • StatExact
Why use STATA? • Quick start, user friendly • Immediate results, response • You can look at the data • Menu-driven option • Good graphics • Log and do files • Good manuals, help menu
Why NOT use STATA? • SAS is used more often? • SAS does some things STATA does not • Programming easier with S-plus and R? • R is free • Complicated data structure and manipulation easier with SAS? • Epi-info (free) is even easier than STATA?
STATA – Basic functionality • Holds data for you • Stata holds 1 “flat” file dataset only (.dta file) • Listens to what you want • Type a command, press enter • Does stuff • Statistics, data manipulation, etc • Shows you the results • Results window
Demo #1 • Open the program • Entering vs. loading data • Look at data • Run a command • Orient to windows and buttons
Two basic windows Command Results Optional windows Variable list History of commands Other functions Data browser/editor Do file editor Viewer (for log, help files, etc) STATA - Windows
STATA - Buttons • The usual – open, save, print • Log-file open/suspend/close • Do-file editor • Browse and Edit • Break
STATA - Menus • Almost every command can be accessed via menu
Menu advantages Look for commands you don’t know about See the options for each command Complex commands easier – learn syntax Command line advantages Faster (if you know the command!) “Closer” to the program Only way to write “do” files Document and repeat analyses Menu vs. Command line
Demo #2 • Load a STATA dataset • Explore the data • Describe the data • Answer some simple research questions • Gender, BMI, blood pressure
STATA commandsDescribing your data • describe [varlist] • Displays variable names, types, labels • list [varlist] • Displays the values of all observations • codebook [varlist] • Displays labels and codes for all variables
STATA commandsDescriptive statistics – continuous data • summarize [varlist] [, detail] • # obs, mean, SD, range • “, detail” gets you more detail (median, etc) • ci [varlist] • Mean, standard error of mean, and confidence intervals • Actually works for dichotomous variables, too.
STATA commandsGraphical exploration – continuous data • histogramvarname • Simple histogram of your variable • graph box varlist • Box plot of your variable • qnorm varname • Quantile plot of your variable to check normality
STATA commandsDescriptive statistics – categorical data • tabulate [varname] • Counts and percentages • (see also, table - this is very different!)
STATA commandsAnalytic statistics – 2 categorical variables • tabulate [var1] [var2] • “Cross-tab” • Descriptive options , row (row percentages) , col (column percentages) • Statistics options , chi2 (chi2 test) , exact (fisher’s exact test)
Getting help • Try to find the command on the pull-down menus • Help menu • If you don’t know the command - Search... • If you know the command - Stata command... • Try the manuals • more detail, theoretical underpinnings, etc
STATA commandsAnalytic statistics – 1 categorical, 1 continuous • bysortcatvar: summarize [contvar] • mean, SD, range of one in subgroup • ttest [contvar], by(catvar) • t-test • oneway [contvar] [catvar] • ANOVA • table [catvar] [, contents(mean [contvar]…) • Table of statistics
STATA commandsAnalytic statistics – 2 continuous • scatter [var1] [var2] • Scatterplot of the two variables • pwcorr [varlist] [, sig] • Pairwise correlations between variables • “sig” option gives p-values • spearman [varlist] [, stats(rho p)]
In Lab Today… • Expect some chaos! • IT will be here to help with wireless, logins, etc • Familiarize yourself with Stata • Load a dataset • Use Stata commands to analyze data and fill in the blanks
Next week • Do files, log files, and workflow in Stata • Find a dataset!
Website addresses • Course website • http://www.epibiostat.ucsf.edu/courses/schedule/biostat212.html • Computing information • http://www.epibiostat.ucsf.edu/courses/ChinaBasinLocation.html#computing • Download RDP for Macs (for Stata Server) • http://www.microsoft.com/mac/otherproducts/otherproducts.aspx?pid=remotedesktopclient • Citrix Web Server • http://apps.epi-ucsf.org/