Research Statistics 2 Dr.S.Nishan Silva (MBBS)
My weight Plot as a function of time data was acquired:
Comments: background is white (less ink); Font size is larger than Excel default (use 14 or 16) Do not use curved lines to connect data points – that assumes you know more about the relationship of the data than you really do
Assume my weight is a single, random, set of similar data Make a frequency chart (histogram) of the data Create a “model” of my weight and determine average Weight and how consistent my weight is
average 143.11 Inflection pt s = 1.4 lbs s = standard deviation = measure of the consistency, or similarity, of weights
Width is measured At inflection point = s W1/2 Triangulated peak: Base width is 2s < W < 4s
Pp = peak to peak – or – largest separation of measurements Area= 68.3% +/- 1s Area +/- 2s = 95.4% Area +/- 3s = 99.74 % Peak to peak is sometimes Easier to “see” on the data vs time plot
(Calculated s= 1.4) 144.9 Peak to peak 139.5 s~ pp/6 = (144.9-139.5)/6~0.9
Inferential Statistics Used to determine the likelihood that a conclusion based on data from a sample is true
Terms p value: the probability that an observed difference could have occurred by chance
Standardised Normal distribution • Formula Z = X- µ ó Z – SND X – variable µ Mean andó varience
Regression and Correlation • Correlation • To analyze the relationship between two variables • Regression • Dependant of the variable x on variable y • In this course we consider only two - In real life, multiple variable interactions are possible.
Basic Linear regression Equation • Equation: Y` = a + bx • b is the gradient, slope or regression coefficient • a is the intercept of the line at Y axis or regression constant • Y` is a value for the outcome • x is a value for the predictor (real x valye)
Correlation Coefficient • Page 100 lower down
Finding the significance of “r” • Simple correlation significance • http://www.biology.ed.ac.uk/archive/jdeacon/statistics/table6.html#Correlation coefficient • Pierson Product-moment coefficient • http://www.experiment-resources.com/pearson-product-moment-correlation.html
Refferences • Best - http://www.biology.ed.ac.uk/archive/jdeacon/statistics/tress11.html • In detail http://www.statsdirect.com/help/regression_and_correlation/rcr.htm
Inferential Statistics – Page 102 • Sample statistics – “Generalized” to the entire population • Formulate hypothesis • ? Null Hypothesis • Prove hypothesis
Types of Errors Truth Conclusion Power = 1- (100% - The probability of a type 2 error)
confidence interval: The range of values we can be reasonably certain includes the true value.
If the “probability” of the true value not being included is less than 5% we reject the null hypothesis
The Use of the Null Hypothesis Is the difference in two sample populations due to chance or a real statistical difference? The null hypothesis assumes that there will be no “difference” or no “change” or no “effect” of the experimental treatment. If treatment A is no better than treatment B then the null hypothesis is supported. If there is a significant difference between A and B then the null hypothesis is rejected...
Parametric tests • T test Page 104
T-test T-test determines the probability that the null hypothesis concerning the means of two small samples is correct The probability that two samples are representative of a single population (supporting null hypothesis) OR two different populations (rejecting null hypothesis)
Use t-test to determine whether or not sample population A and B came from the same or different population t = x1-x2 / sx1-sx2 x1 (bar x) = mean of A ; x2 (bar x) = mean of B sx1 = std error of A; sx2 = std error of B Example: Sample A mean =8 Sample B mean =12 Std error of difference of populations =1 12-8/1 = 4 std deviation units
Non Parametric test • Chi Squared test – Page 108 • Test for Goodness of fit • Test of independence
Chi square Used with discrete values Phenotypes, choice chambers, etc. Not used with continuous variables (like height… use t-test for samples less than 30 and z-test for samples greater than 30) O= observed values E= expected values
Interpreting a chi square Calculate degrees of freedom # of events, trials, phenotypes -1 Example 2 phenotypes-1 =1 Generally use the column labeled 0.05 (which means there is a 95% chance that any difference between what you expected and what you observed is within accepted random chance. Any value calculated that is larger means you reject your null hypothesis and there is a difference between observed and expect values.
How to use a chi square chart http://faculty.southwest.tn.edu/jiwilliams/probab2.gif
T-test or Chi Square? Testing the validity of the null hypothesis Use the T-test (also called Student’s T-test) if using continuous variables from a normally distributed sample populations (ex. Height) Use the Chi Square (X2) if using discrete variables (if you are evaluating the differences between experimental data and expected or hypothetical data)… Example: genetics experiments, expected distribution of organisms.
Qualitative Analysis – Pages 113-114 • Phenomenology • Data collected using interviews, tapes etc • Analyzed as the researcher prefers • Describes using descriptive statistics • Ethnography • Data collected using note taking, observation etc • Categorised • Relationships between patterns, identified • Concurrent Analysis • Qualitative data is transformed to numerical data • Qualitative value may be lost
Using Excel (Example)
Microsoft Excel • A Spreadsheet Application. It features calculation, graphing tools, pivot tables and a macro programming language called VBA (Visual Basic for Applications). • There are many versions of MS-Excel. Excel XP, Excel 2003, Excel 2007 are capable of performing a number of statistical analyses. • Starting MS Excel: Double click on the Microsoft Excel icon on the desktop or Click on Start --> Programs --> Microsoft Excel. • Worksheet: Consists of a multiple grid of cells with numbered rows down the page and alphabetically-tilted columns across the page. Each cell is referenced by its coordinates. For example, A3 is used to refer to the cell in column A and row 3. B10:B20 is used to refer to the range of cells in column B and rows 10 through 20.
Microsoft Excel Opening a document: File Open (From a existing workbook). Change the directory area or drive to look for file in other locations. Creating a new workbook: FileNewBlank Document Saving a File: FileSave Selecting more than one cell: Click on a cell e.g. A1), then hold the Shift key and click on another (e.g. D4) to select cells between and A1 and D4 or Click on a cell and drag the mouse across the desired range. • Creating Formulas: 1. Click the cell that you want to enter the formula, 2. Type = (an equal sign), 3. Click the Function Button, 4. Select the formula you want and step through the on-screen instructions.
Microsoft Excel • Entering Date and Time: Dates are stored as MM/DD/YYYY. No need to enter in that format. For example, Excel will recognize jan 9 or jan-9 as 1/9/2007 and jan 9, 1999 as 1/9/1999. To enter today’s date, press Ctrl and ; together. Use a or p to indicate am or pm. For example, 8:30 p is interpreted as 8:30 pm. To enter current time, press Ctrl and : together. • Copy and Paste all cells in a Sheet: Ctrl+A for selecting, Ctrl +C for copying and Ctrl+V for Pasting. • Sorting: Data Sort Sort By … • Descriptive Statistics and other Statistical methods: ToolsData Analysis Statistical method. If Data Analysis is not available then click on Tools Add-Ins and then select Analysis ToolPack and Analysis toolPack-Vba
Histograms in Excel Select Tools/Data Analysis 1
Histograms in Excel (continued) Choose Histogram 2 ( Input data range and bin range (bin range is a cell range containing the upper class boundaries for each class grouping) Select Chart Output and click “OK” 3
Microsoft Excel Statistical and Mathematical Function: Start with ‘=‘ sign and then select function from function wizard Inserting a Chart: Click on Chart Wizard (or InsertChart), select chart, give, Input data range, Update the Chart options, and Select output range/ Worksheet. Importing Data in Excel: File open FileType Click on File Choose Option ( Delimited/Fixed Width) Choose Options (Tab/ Semicolon/ Comma/ Space/ Other) Finish. Limitations: Excel uses algorithms that are vulnerable to rounding and truncation errors and may produce inaccurate results in extreme cases.
Computing the Mean • Sum xi divide by n (or N for population mean) • Excel • =AVERAGE(cellrange)
Computing the Mode • Value that occurs most often in discretized data • Excel • =MODE(cellrange) • Reports first value seen if tie