Download
stats 330 lecture 1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
STATS 330: Lecture 1 PowerPoint Presentation
Download Presentation
STATS 330: Lecture 1

STATS 330: Lecture 1

270 Vues Download Presentation
Télécharger la présentation

STATS 330: Lecture 1

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. STATS 330: Lecture 1 Introductory Stuff 330 Lecture 1

  2. Today’s agenda: • Introductory Comments: • Housekeeping • Computer details • Plan of the course • Statistical Modelling: an overview • Our Analysis strategy • Goals for the course • Role of Graphics • Data cleaning 330 Lecture 1

  3. Housekeeping Contact details…. Plus much else on course web page www.stat.auckland.ac.nz/~lee/330/ Or via Cecil 330 Lecture 1

  4. 330 Lecture 1

  5. Assignments • There will be five assignments • (20% of total grade) • The due dates are listed in the course summary • Assignment 1 is due on August 2. • No paper will be issued in class – download assignments and data from the Web or Cecil 330 Lecture 1

  6. Tutorials • These will cover computing details • Held in basement tutorial lab, 303S Extension (Rm 303S-B75) • Tutorial 1: 11-12 Wed • Tutorial 2: 10-11 Fri • Tutorial 3: 2-3 Fri • Start second week of semester 330 Lecture 1

  7. Computer details • All analyses will be done using R • I require homework to be typed, use Word, cut and paste R text and graphics into Word • Use home/laptop computer or basement lab (303S) • See web page for info on downloading R330 package • URL is www.stat.auckland.ac.nz/~lee/330 330 Lecture 1

  8. Course Plan • There will be 33 lectures, divided into chapters as follows • Chapter 1: Introduction (1 lecture, this one!) • Chapter 2: Graphics (3 lectures) • Chapter 3: Multiple Regression Model (12 lectures) • Chapter 4: Factors (4 lectures) • Chapter 5: Models for categorical and count responses (12 lectures) • Revision (1 lecture) 330 Lecture 1

  9. Course book • This covers most of the material we will discuss in the lectures • Has 5 chapters corresponding to the division on the previous slide • On-line version available on the course web site • Paper copies available from the Statistics Department, Commerce A 330 Lecture 1

  10. Overview of Statistical Modelling • Statistical models summarize relationships between variables • Regression models focus on one variable (the response) and how its distribution can be modelled by one or more explanatory variables 330 Lecture 1

  11. Example: Response is BPD BPD: Bi-Parietal Diameter For an unborn baby, depends on several factors, including Gestational Age. 330 Lecture 1

  12. Ultrasound image 330 Lecture 1

  13. Points to take into account: • Not all babies of the same age have the same BPD. • In general, the older the baby, the bigger the BPD. 330 Lecture 1

  14. Model relating BPD and GA Line is BPD =a + b GA BPD Each bell curve has sd 10 mm 30 40 Gestational Age Mean BPD for 30 weeks = a + b 30 Mean BPD for 40 weeks = a + b 40 Mean BPD for GA weeks = a + b GA 330 Lecture 1

  15. Regression model features • Model specifies the distribution of the response • Also says how the distribution of the response is affected by the covariates (explanatory variables) • In this case the covariate GA determines the mean response: mean BPD = a + b GA ie a straight-line relationship. 330 Lecture 1

  16. In general…. • Response has a distribution depending on (conditional on) the explanatory variables • Mean of the distribution given by some function of the explanatory variables • Need to • describe the function • describe the variability • Balance accuracy with simplicity. 330 Lecture 1

  17. Our Analysis strategy: • Explore the data using graphics and summary statistics • Construct a useful model (trial and error) • Use the model to gain knowledge about the system under study (How big should a 40 week baby’s BPD be?) • Communicate findings (be able to write a report!!) 330 Lecture 1

  18. Goals for 330 • Get more practice in exploring data • Expand knowledge of regression • Get better at fitting models • Improve your diagnostic skills (how to recognize when a model doesn’t describe the data properly) • Improve your interpretation and communication skills, in a more flexible way. 330 Lecture 1

  19. Role of Graphics • Data Cleaning: Are there gross errors, outliers, special codings (eg 999 for missing), missing data, absolute rubbish in the data? • Exploratory analysis: what sort of model might be appropriate? • Diagnostics: having tentatively selected a model, is it any good? (Residual plots, etc) 330 Lecture 1

  20. Exploratory analysis for BPD data Linear relationship (straight line) appropriate? Outlier 330 Lecture 1

  21. Diagnostic Graphics: residuals versus fitted values Smoothing enhances interpretation. Conclusion? Outlier stands out 330 Lecture 1

  22. Data Cleaning • All real-life data sets are likely to contain errors • These will usually be revealed by suitable plots, such as the one on the previous slide • Before starting any analysis, data should always be carefully checked • To encourage this practice, I will occasionally introduce errors into the assignment data supplied on the web. 330 Lecture 1

  23. Data Cleaning (2) It is your responsibility to check all data supplied on the web, against the assignment sheets . Failure to so will cost marks. YOU HAVE BEEN WARNED!!! 330 Lecture 1