610 likes | 624 Vues
Develop your ability to understand and solve practical statistical problems, choose appropriate techniques, analyze data, and present results through lectures and practical work.
E N D
Introductory Data Analysis F73DA2
Contact Times (Spring Term 2008) • Monday 4:15 - 5.15: Lecture in LT3 • Tuesday 2:15 - 3.15: Lecture in LT3 • Wednesday 10.15 - 11.15: Lecture in LT3 • Group 1 • Tuesday 1.15 - 2.15: Practical in SRG12/13 • Group 2 • Tuesday 4.15 - 5.15: Practical in SRG12/13
The web pages for this module can be found linked from John Phillips Home Page: http://www.macs.hw.ac.uk/~jphillips John Phillips Office: CM S06 Email: j.phillips@hw.ac.uk
Aims This module aims to develop students' abilities in understanding and solving practical statistical problems, and to teach them how to choose appropriate techniques, analyse data and present results.
The module will consist of a mixture of lectures and practical work. Lectures will focus on statistical modelling, including the selection of appropriate models, the analysis and interpretation of results, and diagnostics. Exploratory and graphical techniques will be considered, as well as more formal statistical procedures.
Both parametric procedures (e.g. linear and generalized linear models) and nonparametric methods will be discussed, as will modern robust techniques. There will be considerable emphasis on examples, applications, and case studies, especially for continuous response variables. Computing facilities, especially R, will be used extensively.
Assessments The module will be assessed by the student's completion of two practical assignments, to be handed in by specified times during the term.
Installing R PC Caledonia
Simply double click on the “Installer” then select the “R” icon. This will produce a short-cut to R which should be available every time you log on.
Installing R On your own pc
Download free from the Comprehensive R Archive Network http://cran.r-project.org
R screen Arrow keys on keyboard are very useful. Pressing repeatedly allows you to retrieve previous commands entered.
Many keys and function names are very much as you would expect. > 6+4 [1] 10 > 18*3 [1] 54 > log(100) [1] 4.60517 > pi [1] 3.141593 > sin(pi) [1] 1.224606e-16
Many keys and function names are very much as you would expect. > cos(pi) [1] -1 > x=7 > y=10 > x+y [1] 17 > sqrt(x*x+7*x*y-2*y*y) [1] 18.41195 >
Example : A survey produced the following 200 results of individuals salaries: 23454 20622 19314 19882 22467 16611 17790 17613 19892 17397 22340 17731 20058 22083 18055 18212 24114 20396 20394 20521 17643 19692 24214 16876 22545 17608 24631 21333 21797 20734 17836 20930 16709 18319 19097 20512 17693 23130 20316 19209 21220 17315 22102 21472 19974 22764 18183 20918 19358 20685 21261 21394 22333 21732 19734 19280 18696 21055 25762 18258 20255 19762 17016 20326 19479 18699 18686 17483 20843 20395 19734 19911 18990 19220 17313 21357 17514 17455 21932 21523 21606 23169 21461 19624 18931 18785 20225 25406 21376 20141 18541 23768 19024 21353 19802 19216 19442 19450 19385 20995 21162 21399 18805 18217 17847 19992 17105 14488 20522 21032 19191 20268 19996 17428 21877 19433 20625 19453 19081 21502 21890 2184420116 17601 22296 21751 . 19513 19300 21031 19784 19767 16619 24021 22686 17818 22233 17774 20918 17180 19279 21029 19983 19703 23421 18140 20845 22054 17858 21523 20041 19968 20537 17755 19872 19005 19835 19717 20134 21757 19093 19692 21445 19219 19669 20769 22049 20561 20810 22525 21458 21618 16973 19093 18551 20841 17032 20549 18219 19224 19999 21367 22332 19235 22697 23620 22420 16811 20250 21124 19267 20400 18743 22448 20443 19634 21185 18448 21236 24047 20621
Graphical Representation • Histogram • Stem-and-Leaf • Boxplot • Frequency Polygon
> stem(salaries) The decimal point is 3 digit(s) to the right of the | 14 | 5 15 | 16 | 66789 17 | 0001233445556666778888889 18 | 112222334567777889 19 | 000111122222223333344445555667777777888889999 20 | 00000001111233333444445555566667788888999 21 | 00001122223344444445555556678888999 22 | 01112333344555778 23 | 124568 24 | 00126 25 | 48
> mean(salaries) [1] 20123.01
> mean(salaries) [1] 20123.01 > median(salaries)
> mean(salaries) [1] 20123.01 > median(salaries) [1] 20020
> mean(salaries) [1] 20123.01 > median(salaries) [1] 20020 > sd(salaries)
> mean(salaries) [1] 20123.01 > median(salaries) [1] 20020 > sd(salaries) [1] 1878.09
x y 5 6.2 7 9.3 3 6.0 4 6.1 11 12.8 7 8.1 6 8.1 15 16.7 20 23.4 3 4.7 8 10.5 7 7.7 12 14.0 15 16.6 22 24.2
> plot(y~x) > abline(lm(y~x))
> television=scan( ) 1: 1 1 2 2 1 4 3 3 5 5 1 1 1 2 1 3 3 3 3 3 4 1 2 1 3 4 27: Read 26 items
> television=scan( ) 1: 1 1 2 2 1 4 3 3 5 5 1 1 1 2 1 3 3 3 3 3 4 1 2 1 3 4 27: Read 26 items > barplot(table(television))
> television.counts=table(television) > names(television.counts)=c("BBC1","BBC2", "ITV1","CH4","Other") >pie(television.counts,col=c("purple","green2", "cyan","yellow","white"))
Binomial Distribution It takes ages to calculate a series of probabilities
If n= 5, a=0.2 and x runs from 0 to 5 5! p(0)= 0.20 0.85 0! 5! p(0) = 0.32768
If n= 5, a=0.2 and x runs from 0 to 5 5! p(1)= 0.21 0.84 1! 4! p(1) = 0.4096
If n= 5, a=0.2 and x runs from 0 to 5 5! p(2)= 0.22 0.83 2! 3! p(2) = 0.2048
If n= 5, a=0.2 and x runs from 0 to 5 5! p(2)= 0.22 0.83 2! 3! p(2) = 0.2048 …………and so on
Using R > dbinom(0:5,5,0.2) [1] 0.32768 0.40960 0.20480 0.05120 0.00640 0.00032
Using R > dbinom(0:5,5,0.2) [1] 0.32768 0.40960 0.20480 0.05120 0.00640 0.00032 > pf=dbinom(0:5,5,0.2) > pf [1] 0.32768 0.40960 0.20480 0.05120 0.00640 0.00032 >