Assessing the Assessment

Assessing the Assessment Reliability. Am I measuring something? Test-retest Interobserver agreement Parallel forms Split-half (internal consistency) Validity. Am I measuring what I think I am measuring? Content Criterion Construct Reliability is a necessary prerequisite for validity.

Reliability Reliability refers to the consistency of a measure. Across Time Versions Raters And so on A reliable test has little measurement error. Observed Score = True Score + Error

Reliability • True score – true or perfectly accurate • E.g. the time • Often a fictional mark in psychology • Based on multiple measurements • Aggregation = averaging a number of imprecise measurements to increase reliability

Reliability Test-retest Administer same measure at two points in time Interobserver agreement Multiple observers/judges/raters/scorers rate same target Parallel forms Compare alternate forms of same test Split-half reliability Split test into two halves and compare scores across halves Coefficient alpha: average of all possible split-half reliabilities

Validity Is the test measuring what I think it is? This requires empirical demonstration There are three types of validity Content Validity Criterion Validity Construct Validity

Validity Content Validity A test has content validity if it adequately covers the area of content it is supposed to cover. Difficult to examine statistically Content validity typically must be built in at beginning Course exams are the best examples

Validity Criterion Validity For criterion validity, tests are evaluated against some criterion Often called predictive validity Most at issue for tests employed to make decisions Selection of students Parole decisions Jobs

Criterion Validity - Concurrent • Concurrent validity: does my measure correlate highly with an established measure? • Can my measurement instrument predict a criterion that occurs at the same point in time? • Can my measure (i.e. my operationalization) distinguish between two groups that it should be able to distinguish between?

Criterion Validity - Predictive • Can my measure predict future behavior? • If yes, has predictive validity (a type of criterion validity)

Predictive Validity of the GRE • Kuncel, N.R., Hezlett, S.A., & Ones, D.S. (2001). A comprehensive meta-analysis of • the predictive validity of the Graduate Record Examinations: Implications for graduate school student selection and performance. Psychological Bulletin, 127, 162-181. Graduate Record Examination Originally designed to measure “basic developed abilities relevant to performance in graduate studies” Verbal measure: analogy, antonym, sentence completion, reading comprehension Quantitative measure: quantitative, quantitative comparison, data interpretation Analytic measure: analytical and logical reasoning Subject test: acquired knowledge in particular area Used often and heavily in decisions about admissions

Predictive validity of GRE • Want to establish predictive validity of GRE • What will my criterion of graduate school performance be? • Use several indicators of “performance”: • Graduate GPA • 1st year graduate GPA • Comprehensive exam scores • Publication citation counts • Faculty ratings • (these are the criteria)

Predictive Validity of the GRE

Summary • All areas of GRE were found to be valid predictors of GGPA, 1st year GGPA, faculty ratings, and comprehensive exam scores. • GRE subject tests were consistently better predictors of the criteria than quantitative or verbal tests; • also better than UGPA

Construct Validity • Most important type of validity • “If this were a measure of …, what would it look like?” • Depends heavily on theory: • How is this construct related to other constructs? • Requires broad thinking • In validating my construct, I am validating my theory

Steps to establish construct validity • Need to establish convergent correlations • measures of constructs that theoretically should be related to each other are, in fact, observed to be related to each other (that is, you should be able to show a correspondence or convergence between similar constructs) • Need to establish divergent correlations • measures of constructs that theoretically should not be related to each other are, in fact, observed to not be related to each other (that is, you should be able to discriminate between dissimilar constructs) • Build nomological net

Convergent validity • Measures that should be related are related • These 4 items are converging on the same thing (don’t know for sure that it is “self-esteem” yet

Divergent Validity • Self-esteem measures do not correlate with locus of control measures • These measure seem to be tapping different things

Establishing convergent and divergent validity

Nomological Network • Must develop a “lawful network” for your measure in order to establish construct validity. • Includes • Theoretical framework • Empirical framework • Observables

Childhood Psychopathy Scale Lynam, D.R. (1997). Pursuing the psychopath: Capturing the fledgling psychopath in a nomological net. Journal of Abnormal Psychology, 106, 425-438. “The construct of psychopathy and attendant personality information might profitably be used at the childhood level to identify a more homogeneous group of antisocial children.”

Psychopathy • The [psychopath] is unfamiliar with the primary facts or data of what might be called personal values and is altogether incapable of understanding such matters. • It is impossible for him to take even a slight interest in the tragedy or joy or the striving of humanity as presented in serious literature or art. He is also indifferent to all these matters in life itself. Beauty and ugliness, except in a very superficial sense, goodness, evil, love, horror, and humour have no actual meaning, no power to move him. • He is, furthermore, lacking in the ability to see that others are moved. It is as though he were colour-blind, despite his sharp intelligence, to this aspect of human existence. It cannot be explained to him because there is nothing in his orbit of awareness that can bridge the gap with comparison. He can repeat the words and say glibly that he understands, and there is no way for him to realize that he does not understand (Cleckley, 1941, p. 90 quoted in Hare, 1993, pp. 27-28).

Developed Child Psychopathy Scale • Principles of rational scale construction • Working from Psychopathy Checklist (PCL-R), identified mother-reported items that assessed PCL-R constructs • Operationalized 13 of the 20 PCL-R constructs at 3- to 4-item scales • glibness, untruthfulness, manipulation, lack of guilt, poverty of affect, callousness, parasitic lifestyle, behavioral dyscontrol, lack of planning, impulsiveness, unreliability, failure to accept responsibility, criminal versatility

Items on the CPS

Construct Validity of the CPS If the CPS is truly assessing psychopathy, scores on the CPS should be positively related to serious delinquency

Construct Validity of the CPS If the CPS is truly assessing psychopathy, scores on the CPS should be positively related to stable delinquency

Construct Validity of the CPS If the CPS is truly assessing psychopathy, scores on the CPS should be positively related to impulsivity

Construct Validity of the CPS If the CPS is assessing psychopathy, scores on the CPS should be positively related to externalizing problems and negatively related to internalizing problems

Construct Validity of the CPS If the CPS is assessing psychopathy, scores on the CPS should predict delinquency above and beyond other well known predictors

Assessing the Assessment