Statistical Analysis.

January 2013 Statistical Analysis. With apologies to all maths students! Any errors or short cuts are down to an ‘Old Geographers’ interpretations. Useful- http://www.statsoft.com/textbook/distribution-tables/

January 2013 Mode Median Distribution (standard deviation, skew, kurtosis) Graphs Mean (Average) Statistical Analysis. Descriptions Statistics Relationships Similarity Differences • Chi Square • Student’s T • Mann Whitney U Correlation Regression Spearman’s Rank MINIMUM 15 pairs

Key Idea January 2013 • We can rarely ask everyone or measure everything. So we take a SAMPLE or a selection of all the possible measurements. • Sampling plans. • How do we plan our sampling system. • Firstly we must decide what sample size is practical. • Remember the more categories you try to research (e.g. different reactions by people of different ages groups) the more people you will have to ask. • Remenber some statistical test have a minimum sample size or must even have the same sample size in each category you research. • Regular (a grid or every 5th person) Stratified regular • Random samples Stratified random Statistical Analysis.

Key Idea January 2013 A research hypothesis This is a testable question that which to find out if it is right or not. The null hypothesis Technically all statistical tests involve testing a null hypothesis against an alternative hypothesis. These two terms have specialised meanings, and are not the same as your researchquestion or hypothesis. However, the null hypothesis and alternative hypothesis can be constructed from the same research hypothesis. A useful idea is innocent until proven guilty. When you carry out a (geographical) investigation, you must assume that the null hypothesis is true, and only change your mind (and reject the null hypothesis) if there is strong enough evidence to show otherwise. So, for an investigation about rivers - you must start by assuming that distance from source is 'innocent' of causing any change in hydraulic radius, and only change your mind if there is strong enough evidence to show that distance from source is 'guilty'. Statistical Analysis.

Key Idea January 2013 • Significance testing. • How do you decide that the evidence is strong enough - that the distance from source is 'guilty'? • If you roll a dice, the chance of rolling a six is 1 / 6. • If you roll a dice two times, the chance of rolling two sixes is 1/6 x 1/6 = 1 / 36. • If you roll a dice three times, the chance of rolling three sixes is 1/6 x 1/6 x 1/6 = 1 / 216. • If you roll a dice ten times, the chance of rolling ten sixes is 1 / 60 466 176 - a very small number! Ten sixes in a row could happen by chance, but is unlikely. If it did happen you might suspect that the dice has been loaded so that it always falls on one side. • Survey or experimental results can always be due to chance. A fluke of the maths. • The significance level is a measure of how strong the evidence needs to be before the null hypothesis is rejected. • Most science uses used the 1% or5% significance level … Thuswe are 99% or 95% certain that the results are truly significant. Excel STATS functions often generate this probability directly. • Many surveys use a bigger significance level but then there is greater chance of you saying the results are useful, when in fact it is a fluke of the numbers. Statistical Analysis.

January 2013 Statistical Analysis. With apologies to all maths students! Any errors or short cuts are mine. Y axis INDEPENDENT VARIABLE Height (m) Max x Max x x x x x 75% x Mean x x x x x 75% 25% x x x x x x x x Mean x x x x x 25% x x Min x x X axis DEPENDENT VARIABLE Age (years) Min Year 12 Year 9 Year 12 Year 9 Are these two sets of height data different enough to say the result is significant? (Research hypothesis Year12’s are more mature than year 9’s.) A box and whisker plot shows distribution. Excel can calculate these values and maybe the graph as well directly from the raw data table.

January 2013 1 Types of Data: There are basically two types of random variables and they yield two types of data: numerical and categorical. A chi square (X2) statistic is used to investigate whether distributions of categorical variables differ from one another. Basically category variable yield data in the categories and numericvariables yield data in number format. Responses to such questions as "What is your major?" or Do you own a car?" are categorical because they yield data such as "biology" or "no.“ In contrast, responses to such questions as "How tall are you?" or "What is your G.P.A.?" are numerical. Numerical data can be either discrete or continuous. Chi Square (test of difference): This test is suitable for category data. The categories can be nominal (no order) like number (frequency count) of farms on limestone, granite & waterlogged geology. Fisher probability test can also handle this type of data. If you want the maths (!) http://math.hws.edu/javamath/ryan/ChiSquare.html Video advice- http://www.youtube.com/watch?v=WXPBoFDqNVk Interactive- http://www.quantpsy.org/chisq/chisq.htm Calculator- http://graphpad.com/quickcalcs/chisquared1.cfm Statistical Analysis.

January 2013 2 Student T-Test of ………….. This is a test of DIFFERENCE between two sets of data. Simply (!) put it is a value (index) that shows a ratio …… Statistical Analysis. t = Difference between means (averages) of two data sets Difference between standard error of two data sets You can then use this to say how sure you are that the two sets of data really are different(significant) rather than possibly (probability) just being a little bit different by accident, or worse just looking a bit strange in your graphs! You can graph two data sets and calculate the mean and the upper quartile and lower quartile values. Graph this as a box & whisker plot. Inspect visually for overlap . Critical values graph here

January 2013 Student T-Test of ………….. • This is a test of DIFFERENCE between two sets of data. The Student T index is then compared to a graph of significant values for each • Level of confidence (e.g. 95% sure it is real still 5% chance of error) • Degrees of freedom (linked to sample size … more data = less risk) Statistical Analysis. n =sample size of the each data set Degrees of freedom = ( n1 – 1 ) + (nb – 1) Luckily… • … you don’t have to do all of this by taping on a calculator! Both Excel and various on-line options are available. • EXCEL – Click on FORMULA menu and click INSERT formula. S elect the STATISTICAL TEST option and scroll down. Use the HELP function! • On-line test --- http://www.graphpad.com/quickcalcs/ttest1.cfm • Look out for - paired (before and after data on same sample) and unpaired data (two different groups or places

January 2013 3 Mann-Whitney U test … This is also a test of differencebut the data can be put into rank order. For example: you wish to find out if bus users and car drivers travel into town with a different frequency. We will try to REJECT our NULL hypothesis that there is “no difference between the number of city journeys by bus or car users”. TAILS – do you wish to ask whether there is any difference (two tailed test) or are you trying to predict that there are more bus journeys than car (directional test - one tailed) If you do this test on Excel just type in your two data sets. If you do it by hand then rank all of the data set from biggest to smallest but note which data set each item belongs to (car or bus). The statistic essentially tries to measure how many records of the other data set (bus) are bigger than each record of the control set (cars). As usual we check our test statistic (U) against a probability table, knowing our sample size (n1 and n2) and the level of confidence (99%). For help look at: http://www.geography-fieldwork.org/geographical_enquiry/stage4.htm Statistical Analysis.

Note this is for an older version of Excel but it is still a useful guide. Student t test on Excel. • T tests can be used to compare two groups or treatments. • Click on any empty cell. Hit the = sign in the bar at the top of the spreadsheet. • USE INSERT FUNCTION button on the FORMULAs menu. • If "TTEST" is not on the list, click "more functions", choose "statistical", then "TTEST". • A dialog box will appear. Click in the box next to "Array 1". Drag the dialog box out of the way, then highlight your first column of numbers. • Click in the box next to "Array 2" and highlight your second column of numbers. • To answer the "tails" question, remember your prediction about the direction of the difference between the groups. If you predicted group A would be lower than group B, pick 1 tail. If you predicted group B would lower than group A, pick one tail. If you didn’t predict which would be higher, use 2 tails. You can’t change your mind after the data are gathered. • There are three types of T test you can use on Excel. Let’s say you wanted to test whether heart rate increased after drinking a cup of hot sauce (don’t actually try this!) or whether plant growth would increase after adding fertilizer to pots of soil. In these cases you would be comparing the heart rate of the same people, or the growth of the same pot of plants before and after the treatment. This would require a "paired" or "dependent" T test. Excel calls this a "type 1" test. • Let’s look at another situation. Say you want to know whether nursing students consume more coffee than do biology students. You would then have two groups of test subjects rather than taking 2 measurements on each person. Now you would use an "unpaired" or "independent" T-test. Excel calls these "type 2" or "type 3" tests. Now the tricky part is to decide which of these to use. Are the standard deviations (how spread out is the data) about the same for both groups, or are they different? You can test this statistically, but let’s just work with how they seem. If in doubt, go with "type 3" for unequal variances. • Now hit "OK" and see what the number is. This is your P-value. Remember that a P-value below 0.05 is generally considered statistically significant, while one of 0.05 or greater indicates no difference between the groups. If your number looks like this: 2.03188E-7, Excel is giving you the number in its version of scientific notation. This number is actually 2.03 X 10 -7, or 0.000000203.

Statistical Analysis.

Statistical Analysis.

Presentation Transcript

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis

Statistical Analysis