PSYCOMETRICS

PSYCOMETRICS Unit 6.1Evaluation of the measurement instrument: VALIDITY I Salvador Chacón Moscoso Susana Sanduvete Chaves Thanks to Francisco Pablo Holgado Tello for his colaboration elaborating this material.

TEMA 6.1. Evaluation of the measurement instrument: VALIDITY I 1. Introduction to the concept of validity and its historical evolution. 2. Content validity 3. Construct validity 3.1. Multitreat-multimethod matrix. 3.2. Factor analysis. 4. Criterion validity 4.1. The problem of selection and mesaurement of criteria 4.2. Statistical procedure used for criteria validity. 5. Validity for an only one predictor and one indicator of the criterion. 5.1. Validity coefficient. 5.2. Lineal regression model. 5.2.1. Regression ecuations. 5.2.2. Residual varianze and typical estimated error. 5.2.3. Confidence intervals. 5.3. Interpretation of the evidence obtained due to the predictive capacity of a test. 5.3.1. Determination coefficient (D.C) 5.3.2. Alignment coefficient (A.C) 5.3.3. Predictive Value coefficient (V.P.C) 6. Bibliography.

1. INTRODUCTION TO THE CONCEPT OF VALIDITY AND ITS HISTORICAL EVOLUTION. • Historical evolution: • According to the " Standards for educational and psychological tests and manuals " (APA, AERA , NCME , 1966 , 1985), the validity refers to the degree in Which Measure aims That which is measured , being able to differentiate between content validity , criterion and construct  predominant tripartite view . • Messick (1989) broadens the concept: • open concept as not limiting it to the test scores , but including scores on any other assessment tool. • highlights the importance of considering the usefulness of decisions and consequences of the measurement procedure , appearing the concept of consequential validity . • integrative approach validity macroconcept around construct validity .

1. INTRODUCTION TO THE CONCEPT OF VALIDITY AND ITS HISTORICAL EVOLUTION. • Historical evolution: • “…Unified integrates considerations of content validity , criterion and consequences on a framework to construct the empirical evaluation of rational hypotheses about the meaning of the scores and relevant relationships from a theoretical point of view, including those of a scientific nature and applied” (Messick, p. 741)

1. INTRODUCTION TO THE CONCEPT OF VALIDITY AND ITS HISTORICAL EVOLUTION. Messick’s most important work:(1994) Break with the tripartite view of validity. Construct validity assumes the relevance and representativeness of the aspects measured ( v . Content) and relations with external criterion of interest ( v . criterion) . We understand the validity as a unitary concept where the construct validity is the focus where converge scientific criteria ( representativeness and usefulness ) and social values ( consequences of the application of the measure) .

1. INTRODUCTION TO THE CONCEPT OF VALIDITY AND ITS HISTORICAL EVOLUTION. • In theory tests , validity refers to the appropriateness of inferences made from test scores . • Validation: the process by which the manufacturer or user of tests includes the necessary empirical evidence to support the inferences to be made. It is understood by both data evidence , observations and facts , the arguments that can support and sustain those facts. - By the standards (1999 ) , we will understand : -The Validity as a unitary concept . - The validation as an ongoing process that will allow collecting evidence about the appropriateness of inferences . Different types of validity "is replaced by " different strategies for evidence of validity

2. CONTENT VALIDITY VALIDATION OF CONTENTS : analyzes the extent to which elements of a test are: 1. Relevant : exhaustive specification of all possible behaviors or domains of the construct to be measured ( no spare) . 2. Representative : that all possible behaviors is recorded on the test ( nothing missing ) . Artificial distinction between content validity and Construct , as in the time domain defined measure , the construct will also delimiting

2. CONTENT VALIDITY Process: a) Domain definition of the construct  clearly define each of the dimensions. b) Development of test specifications . That is, the domain and objectives that will cover the test. c) Selecting a panel of experts in the domain. d) Establishment of a structured approach to quantify the degree of agreement among different judges frame.

2. CONTENT VALIDITY Applications Content validity is important for any test construction process , but we believe it is absolutely necessary in tests of educational and occupational performance because it allows answer questions such as : 1. Is the content of free test of irrelevant variables? 2. It covers a representative sample of specific skills for a job, for example?

2. CONTENT VALIDITY Content validity face validity apparent : the tests have to pretend they are measuring what is proposed , so that if the contents of the test seems largely irrelevant , nonsensical or child , could discourage participants.

3. CONSTRUCT VALIDITY Is understood as far as the psychological test reflects from the theory that has been constructed , and to interpret the scores giving a theoretical meaning (APA , aera , NCME , 1999). • Does it really measures the test variable that attempts to measure ? • Is there really such a variable?

3. CONSTRUCT VALIDITY Test scores are not the construct . The construct can manifest through multiple indicators. Therefore , by Construct Validation , evidence supporting that test scores are one of the possible manifestations accumulate.

3. CONSTRUCT VALIDITY Phases: 1. Carefully define the construct to be measured; for this, is often used existing theories and formulate hypotheses from: Latent and observed variables ( validity of trait) ; Latent Variables measures and other latent variables ( nomological validity) . 2. Design measuring instrument containing representative and relevant elements of the construct being measured. 3. Obtaining evidence , and evidence of the previously hypothesized relationships . 4. Set : The internal structure of test: interrelations between the scores obtained by the participants in the various items . The external structure of the test: relationship between scores on the tests and other measures of the same construct with other instruments.

3. CONSTRUCT VALIDITY Procedures for assessing construct validity - Correlations with other measures of the construct. That is, correlations with other previously validado. test in so far as the correlation is high greater the ratio of the test with the construct is measured. - Multitrait - multimethod matrices (Campbell and Fiske , 1959). - factor Analysis .

3. CONSTRUCT VALIDITY 3.1. Multitrait-multimethod matrix • Multitrait multimethod - matrices ( Fiske and Campbell , 1959) : the same construct is measured by different methods and different constructs by the same method . Three types of correlations are obtained : • Coefficients of reliability : are the correlations between the same construct measured by the same method. • Convergent validity ( monorrasgo - heterométodo ) are the correlations of the same construct measured by various methods . • Discriminant validity ( heterorrasgo - monométodo ) are the correlations of different constructs measured by the same procedure.

3. CONSTRUCT VALIDITY 3.1. Multitrait-multimethod matrix You want to measure three constructs : Numerical Reasoning ( NR) , Space Factor (SF) and Abstract Reasoning (AR ) , and measured with different format tests : True-False ( TF ) ; and Multiple Choice (MC ) . The table shows the correlations between different measured constructs are presented with both methods

3. CONSTRUCT VALIDITY 3.1. Multitrait-multimethod matrix 1. Reliability refers to the same construct measured by the same method (diagonal) . For example, the reliability of measured numerical reasoning method 1 is 0.95 , while Method 2 is 0.93

3. CONSTRUCT VALIDITY 3.1. Multitrait-multimethod matrix Convergent validity or coefficient monorrasgo - heterométodo referred to was measured the same construct but with a different method ( green). In theory , these correlations should be high and significant . For example, the measured numerical reasoning methods 1 and 2 obtains a correlation of 0.90.

3. CONSTRUCT VALIDITY 3.1. Multitrait-multimethod matrix 3. Discriminant validity refers to correlations between different traits with the same method ( red). In theory , they must be low , and lower than the reliability and validity convergent . For example , the correlation between MC measured NR and method 1 is 0.20 .

3. CONSTRUCT VALIDITY 3.1. Multitrait-multimethod matrix One problem is That there is no statistical criterion to determine Whether there is convergent and discriminant validity . Current practice is to Investigate esta aspect by methods derived from confirmatory factor analysis .

3. CONSTRUCT VALIDITY 3.2. Factor analysis Factor Analysis: one of the techniques used to determine the internal and external structure of the test in relation to the construct . The purpose is to explain a set of observed variables (test items, for example) by fewer factors called latent variables unobservable (theoretical dimensions). It is these factors that allow to give a theoretical interpretation by the way the items according to their content , which should coincide with the theoretical dimensions used in the construction of the scale are grouped .

3. CONSTRUCT VALIDITY 3.2. Factor analysis • Exploratory versus Confirmatory : • AFE : no one has absolute certainty of the dimensionality of the scale. After running an AFE , we get a solution on the number of factors or dimensions that can summarize the observed variables. • AFC : we have certain assumptions about the number of dimensions and the way in which the items are grouped . Report fit indices that help us decide whether this structure is reproduced in the data.

3. CONSTRUCT VALIDITY 3.2. Factor analysis 1. All factors are correlated or not. 2. All variables observed are affected by all latent factors. 3. The errors are not correlated with each other.

4. CRITERION VALIDITY It involves obtaining evidence about the extent to which scores on the test can be effectively used to make inferences about the actual behavior of the participant on a criterion that can not be measured directly . APA (1999 ): evidence based on the relationship with other variables.

4. CRITERION VALIDITY Possible designs : they vary depending on when the test data (eg questionnaire to assess degree of depression) are collected and criteria (eg depressive symptoms according to DSM - V ) . 1. Predictive validity : the criterion is measured after applying the test. - Objective: To predict future scores on the criterion from those obtained in the test. - Example: the participant will have a depressive disorder  use predictive test? . 2. Concurrent validity : test and criterion are measured simultaneously. Example : Does the participant currently a depressive disorder diagnosis  use test? . 3. retrospective validity: the criterion is measured before administering the test. Example : Did the participant a depressive disorder two years ago?

4. CRITERION VALIDITY Phases: - Clearly define the criterion to be measured . - Identify the indicator or indicators to be used as criterion measures . - Select a representative sample of participants. - Apply the predictor test and get a score for each participant. - Obtain a measure of each participant in the criterion. - Determining the degree of relationship between scores on tests and criterion .

4. CRITERION VALIDITY 4.1. The problem of selection and measurement of criteria A major problem is how we define and delimit the criterion: Criterion easy or simple indicator definition : A test vendor selection encyclopedia (success criteria encyclopedias = 10 / week) . Criterion or indicator complex (more complicated definition) : Professor of Psychometrics Selection (criterion of success = subject knowledge there social Skills Do publications ?, etc. ? ) . All indicators are partial and do not provide a complete understanding of the criterion

4. CRITERION VALIDITY 4.1. The problem of selection and measurement of criterion For selection criteria , Thorndike and Hagen (1989 ) recommend indicators that are : Relevant: that relate to the criteria; however , no statistical tests that allow us to conclude, therefore  can use expert judges. Free of bias: use variables do not affect differentially between groups. Reliable : have to use stable indicators over time . accessible

4. CRITERION VALIDITY 4.1. The problem of selection and measurement of criterion When operativizamos the validity coefficient by correlating test- criterion problems linked to the nature of the correlation shown : 1. Reliability of predictor and criterion: low coefficients of reliability test and debase values criterion validity coefficient . 2. Range restriction : the coefficient of validity may be reduced due to restrictions on the variability of the sample ( eg recruitment as participants are chosen with high scores ) . 3. Dichotomization in the test , criterion or both : reduce validity coefficient values .

4. CRITERION VALIDITY 4.2. Statistical procedures used for criterion validity Validity coefficient  correlation between test scores and the criterion ( Crocker and Algina , 1986; Robinson and Stafford , 2006). Depending on the number of tests and criteria Martínez- Arias (1995 ) differentiates between : 1. A predictor and a criterion (item 5 , item 6.1 ): correlation and simple regression . 2. Several predictors and criterion (item 1 , item 6.2 ): correlation analysis and multiple regression , discriminant analysis (qualitative criteria ) , logistic regression (dichotomous criterion). 3. Several predictors and several criteria: multivariate regression analysis , canonical correlation analysis ( complex techniques that hardly lead to direct results ) . 4. Validity and utility of the decision (point 2 , item 6.2 ) : operational research (technical maximax and minimax ) (Van der Linden , 1991)  optimize decisions taken with the test.

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.1. Validity coefficient Correlation and simple regression : to what extent can we predict scores on the criterion of a participant , given its score on the test? The formula for the correlation between test and criterion is defined by the following expression : Important : the type of correlation depends on the type of test and criterion variable .

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.1. Validity coefficient Example : Suppose you want to perform a validation study concerning the criterion of a mechanical aptitude test (X ) . For this, the test sample produced a 6 participants is applied. These participants are then evaluated by their supervisors , on a scale of 0-10 , depending on the time taken to repair a car with the same fault ( Y). The results are: Calculate the coefficient of validity .

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.1. Validity coefficient We obtain a value of 0.73. Since the maximum value of the validity coefficient is 1, one can say that the test has good predictive ability .

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method Known correlation, linear regression model is used to predict the scores on the criterion (Y) from the test scores (X). The linear function is defined by :

Yi P(X,Y) = value of Y observed for Xi e = Yi - Y’i Y’i P(X,Y’) = predicted Y value for Xi by the linear function a Xi 5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 3. The difference between (XY) and ( XY' ) determines the prediction error  vertical distance between two points. 2. For each Xi , we have two points (XY ) and (X, Y ') predicted . That is, the function predicts a value of Y that does not correspond to the real. 1. a = point where intersects the Y axis, or expected value of Y when X line is 0 .

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.1. Regression ecuations Formulas for calculating the parameters ( b ) of the regression equation :

Yi Direct scores Y`i Differential scores a a =(0,0) Xi 5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.1. Regression ecuations In differential scores  the intercept passes through the point ( 0,0):

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.1. Regression ecuations In typical ratings  the slope of the line is the coefficient of validity Example : with the data of the previous year , calculate the regression equation to direct , differential and standard scores

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.1. Regression ecuations • Regression equation to direct scores :

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.1. Regression ecuations 2. Regression equation in differential scores : 3. Regression equation in standard scores : Example : calculate , in raw scores , the value predicted in criterion ( Y) for each participant and its associated error.

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.1. Regression ecuations If the actual value and subtract the predicted , we obtain the forecast error . For example , for the same participant, the forecast error associated will be from 9 to 7.89 = 1.11 A participant who won X = 12, it is predicted Y '= 7.89

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.2. Residual varianze and typical estimated error The estimation error is the difference between the actual value obtained by a participant in the test and predicted by the linear function : The average errors of all participants squared is the residual variance or error variance :

A greater RXY best be lower   each individual error smaller the standard error of estimate ( Sy.x ) 5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.2. Residual varianze and typical estimated error The square root of the residual variance or error variance (ie , standard deviation ) is the standard error of estimate : Example: With the above data, calculate the standard error of estimate (using the first formula ) and check the following equation

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.2. Residual varianze and typical estimated error

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.2. Residual varianze and typical estimated error Typical estimated error: Assurance of equiality:

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.3. Confidence intervals Because the estimation errors is more convenient to point estimates intervals . Example : with previous data , estimate the estimated interval criterion ( Y) for a person who obtained X = 13 ( N = 95 % ) score will be . Typical estimated error

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.2. Lineal regression method 5.2.3. Confidence intervals Standard error of estimate calculated in the previous year

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERIA PREDICTOR 5.3. Interpretation of the evidence obtained due to the predictuve capacity of the test. 5.3.1. Determination coefficient (D.C.) Equals the validity coefficient squared. Represents the proportion of variance in scores of participants in criterion ( Y) can be predicted from the test ( X). When the error variance is small, the predicted values of Y are close to actual  The standard error of estimate will be small and , therefore, the CD will take values close to one . The possible values of C. D. are between 0 and 1 . Expresses the proportion ( or percentage when multiplied by 100 ) of variation of X-linked and determined by X , explained by X , or can be predicted from X.

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERIA PREDICTOR 5.3. Interpretation of the evidence obtained due to the predictuve capacity of the test. 5.3.2. Alineation coefficient (A.C.) Indicates the proportion that represents the standard error of estimate for the standard deviation of scores on the criterion . Is the uncertainty , randomness that affects forecasts. The possible values of C.A. are between 0 and 1 . When the error variance is high, implies that the predicted values of Y ' are far from the real  the standard error of estimate will be high and therefore the CA will next one values. CA2 is the proportion ( or percentage when multiplied by 100 ) of variation and is not linked to X , as determined by X , explained by X , or that can not be predicted from X.

5. VALIDITY OF AN ONLY ONE INDICATOR AND ONE CRITERION PREDICTOR 5.3. Interpretation of the evidence obtained due to the predictuve capacity of the test. 5.3.3. Predictive coefficient (P.C.) Complementary to CA , is another way of expressing the ability of the test to predict the criterion. The possible values of C.V.P. are between 0 and 1 . The higher the CA , the lower the ability of the test to predict the criterion . Example: calculating the C.D. , the C.A. and complementary . Interpret the results

PSYCOMETRICS

PSYCOMETRICS

Presentation Transcript