Lecture 14

Lecture 14 Statistical Example Chapters 10 and 12

Outline 10.1 Solving Simple Problems 10.2 Assembling Solution Steps 10.3 Summary of Operations 10.4 Solving Larger Problems 12.1 Behavioral Abstraction 12.2 Matrix Operations 12.3 MATLAB Implementation

“Simple” Problems • Basic Character of the Data and Operations • Define the input data • Define the output data • Extract the transformations upon the input that produce the output • Write the transformations as code operations • Debugging as necessary

“Not That Simple” Problems • Build solutions to problems keeping in mind operations we know how to perform. • Think how applying one operation might make the problem easier. • Keep doing this, until the problem is broken down into parts that we can do. • Build modular solutions, so that the building blocks for future problems are larger than the operations supplied by the language.

ANOVA • Planned tests are determined before looking at the data and post hoc tests are performed after looking at the data. Post hoc tests such as Tukey's test most commonly compare every group mean with every other group mean and typically incorporate some method of controlling of Type I errors. • *from http://en.wikipedia.org/wiki/Analysis_of_variance

Example, Apply ANOVA • Measure B/C preference • Three groups of data • Tiz • Dax • Zup • Is the B/C preference influenced by use of Tiz, Dax, Zup, or, are the groups all the same?

Getting to Numbers • Some number of tests, n, will be used to develop a number of times the outcome B occurs. • Some fraction of the time, outcome B will occur. Let this be b/n. • If n = 1, then b/n can only be 1 or 0. • We want to obtain a certain number of test results, so that we can calculate their mean and variance.

Formulae for Mean, Variance • Mean = sum(the results)/number of results • Square of deviation = (one result – Mean) 2 • Variance = sum(squares of deviation)/number of results • Standard deviation = positive square root of variance

Information from Data • We can produce these sums over all subjects holding the shape constant. • We can try to find out whether shape matters. • We can produce these sums over all shapes holding the subject constant. • We can try to find out whether the subjects are different from one another.

Arranging Information in an Array • Suppose we have several subjects (A, F, G, J) and several types of test (T, D, Z) and a result (number of b choices per total choices) for each. • We could use the subject as an index on an array. • We could use the type of test as an index on an array. • We could use the index of the test (A’s 12th test session with D, so, 12) as an index on an array. • We would store the ratio b/c as the value in the location given by the indices:biasMeasure (subject, type, index) = #of b choices per total choices

Generalizing • The example had three dimensions, subject, type, index. • The number of dimensions could be more or fewer.

Mean, Varianceof Some Particular Thing • Suppose we wanted the mean and standard deviation of Jack’s data, averaged over all values of index and type • Suppose Jack’s subject identifier is “7”. • with biasMeasure(subject, type, index) • Mean = (1/(nTypes*nIndices))* sum(sum(biasMeasure(7,:,:))) • Variance = (1/(nTypes*nIndices))* sum(sum(biasMeasure(7,:,:)-Mean) 2 ))

Mean, Variance ofSomething Else • Suppose we wanted the mean and standard deviation of one type data, averaged over all values of index and subject • Suppose the type’s identifier is “3”. • with biasMeasure (subject, type, index) • Mean = (1/(nTypes*nSubjects))* sum(sum(biasMeasure(:,3,:))) • Variance = (1/(nTypes*nSubjects))* sum(sum(biasMeasure(:,3,:)-Mean) 2 ))

Analysis of Variance The ANOVA tests the null hypothesis that samples in two or more groups are drawn from the same population. To do this, two estimates are made of the population variance. These estimates rely on various assumptions. The ANOVA produces an F statistic, the ratio of the variance calculated among the means to the variance within the samples. If the group means are drawn from the same population, the variance between the group means should be lower than the variance of the samples, following central limit theorem. A higher ratio therefore implies that the samples were drawn from different populations. See http://en.wikipedia.org/wiki/One-way_ANOVA and Howell, David (2002). Statistical Methods for Psychology. Duxbury. pp. 324-325

Knowing Array,Choosing File Design • We would like a multidimensional array, so that we can calculate easily the variances we want. • Let’s look at some sample code for reading in from a file into a multidimensional array.

The File All entries are coded as numbers.

Sample Code - 1 [nums text raw] = xlsread('exmple4ANOVA.xls') nums = 1 1 1 1 2 0 1 3 1 2 1 1 2 2 0 2 3 0 3 1 1 3 2 0 3 3 0 4 1 1 4 2 1 4 3 0 The subjects are numbered 1-4. The types are numbered 1-3. The outcome (b or s) is coded 1 for a b.

Sample Code -2 [nRowsnColumns] = size(nums); %now we can see how many independent variables there are nIndependentVariables = nColumns -1; dependentVariables = nums(:,end);%the last column numsExceptLast = nums(:, 1:nIndependentVariables); allMins = min(numsExceptLast); allMaxs = max(numsExceptLast); allRanges = allMaxs-allMins+1; outArray = zeros(allRanges); columnValue = ones(1,nIndependentVariables); for rowIndex = 1:nRows if dependentVariables(rowIndex) == 1 %if there is a b to be added on for independentVariableIndex = 1:nIndependentVariables columnValue(independentVariableIndex) = ... nums(rowIndex, independentVariableIndex)... -allMins(independentVariableIndex)+1; end outArray(columnValue(1), columnValue(2))= outArray(columnValue(1), columnValue(2))+1; end end end function outArray = fillMDArrayFrom2DNums(nums) %fillArray(nums) takes an array of numbers that has been %read in from a file %and extracts the values (dependent variables) associated with setting of %independent variables %for example, subject, type and index might be independent variables %the number of 'b' choices in n trials might be the dependent variable %the returned array has a dimension for each of the independent variables %the file has a column for each independent variable (except index), %plus one column for the dependent variable %for example, a column for subject, a column for type, a column for b %the number of trials with the same independent variables %is obtained by counting the number of repetitions %the index of the trial is obtained by the value of the counter

File Output • >> outArray=fillMDArrayFrom2DNums(nums) • outArray = • 1 0 1 • 1 0 0 • 1 0 0 • 1 1 0 • >> xlswrite('theOutputFile.xls', outArray);

The File

Lecture 14

Lecture 14

Presentation Transcript

Lecture 14

Lecture 14

Lecture #14

Lecture 14

Lecture 14

LECTURE 14

Lecture 14

Lecture 14

Lecture 14

Lecture 14

Lecture 14

Lecture 14

Lecture 14

Lecture 14

Lecture 14

Lecture 14

Lecture 14

Lecture 14

Lecture 14

Lecture 14

Lecture (14)

Lecture 14