160 likes | 349 Vues
Stat E-150 Statistical Methods. Section 1 Feb 12, 2014. Who Am I?. I got my PhD in Molecular and Cellular Biology in 2011 Research Interest: Neuroscience Master’s student in biostatistics in the Brown University. Outline for today. Brief Lecture Review Introduction to SPSS
E N D
Stat E-150 Statistical Methods Section 1 Feb 12, 2014
Who Am I? • I got my PhD in Molecular and Cellular Biology in 2011 • Research Interest: Neuroscience • Master’s student in biostatistics in the Brown University
Outline for today • Brief Lecture Review • Introduction to SPSS • Hide will have another section tonight immediately after lecture and will continue to discuss SPSS • My section review the lecture from last week • Hide’s section review mainly focus on this week’s lecture • Homework Q & A • Post your questions on the course discussion board • If you have more questions, go to his section
Statistical Models • What is a statistical model ? • Population versus sample • DATA = MODEL + ERROR • Y = f(X) + ε • Simple linear model: Y=β0 + β1X + ε • Why do we need statistical models ? • To simplify reality • How useful are statistical models? • Make predictions • Understand relationships • Assessing differences • Can statistical models (or generally, statistics) conclusively tell us what will or will not happen? • No
Basic terminology • Types of variables: • Quantitative • Categorical • Response variables • Explanatory(predictor) variables
Statistical Model Building • Exploratory data analysis • barplots • histograms • The Four-Step Process for statistical modeling: • 1. Choose a form for the model • 2. Fit the model to the data • 3. Assess how well the model fits the data • Verify assumptions • Examine the residuals • Investigate significance, refine model • 4. Use the model to make predictions, explain relationships, assess differences
Simple linear model Takes the form: Y=β0 + β1X + ε • Y= the response variable • X= the explanatory, predictor • ε= the random error that the model does not account for • β0= intercept • β1= the slope = average change in y for every unit increase in x • predicted Y= β0 + β1X • ε = observed Y – predicted Y
Fitting a simple linear model • Least squares to find the line the best estimates for βo and β1. • Residual=observed – predicted • The sum of the squared errors (SSE) provides a measure of how well the line predicts the actual response for a sample
Assess the model: verify assumptions • Linearity • The distribution of ERRORS • Zero Mean - the distribution of the errors is centered at zero – always true with least squares regression! • Constant Variance - the variability of the errors is the same for all values of the predictor variable • Independence - the errors are independent of each other • Normality – we often need to assume the random errors follow a normal distribution.
A uniform scatter of the points around zero • You don’t want a systematic pattern
Bad Good Long tails
Simple linear regression: a summary of the basics! • Data requirements: 2 quantitative variables • Assumptions of model: linearity, constant variance, mean of zero, normality, independence of errors, and randomness of sample • Check these with residual plot, histogram of errors, and a normal probability plot (NPP) • How to write regression equation • How to interpret the betas, calculate residuals
Outliers, Influential Points and Transformations • Outlier: doesn’t fit vertically with other plots • Influential point: Doesn’t fit vertically and horizontally – can ‘pull’ regression line • Square root and log transformation
Chapter 0, Question 9 • An article in the Journal of American Medical Association reported on a study in which 160 subjects were randomly assigned to one of four popular diet plans: Atkins, Ornish, Weight Watchers, and Zone. Among the variables measured were: • Which diet the subject was assigned to • Whether or not the subject completed the 12-month study • The subject’s weight loss after 2 months, 6 months, and 12 months (in kilograms, with a negative value indicating weight gain) • The degree to which the subject adhered to the assigned diet, taken as the average of 12 monthly ratings, each on a 1-10 scale (with 1 indicating complete nonadherence and 10 indicating full adherence)
Chapter 0, Question 9 • Classify each of these variables as quantitative or categorical • The primary goal of the study was to investigate whether weight loss tends to differ significantly among the four diets. Identify the explanatory and response variables for investigating this question • A secondary goal of the study was to investigate whether weight loss is affected by the adherence level. Identify the explanatory and response variables for investigating this question • Is this an observational study or a controlled experiment? Explain. • If the researchers’ analysis of the data leads them to conclude that there is a significance in weight loss among the four diets, can they legitimately conclude that the difference is because of the diet? Explain.
Chapter 0, Question 9 • If the researchers’ analysis of the data analysis leads them to conclude that there is a significant association between weight loss and adherence level, can they legitimately conclude that a cause-and-effect association exists between them? Explain.