380 likes | 463 Vues
PSY 525. Complications in the field of psychology. Constructs are not well defined nor are they directly observable (intelligence) Compare to the problem of measuring something like brain size (assuming that it has some relation to intelligence) How would this be done?
E N D
Complications in the field of psychology • Constructs are not well defined nor are they directly observable (intelligence) • Compare to the problem of measuring something like brain size (assuming that it has some relation to intelligence) • How would this be done? • Similar issues with every construct defined in the DSM (note even the issue of lack of agreement between different versions of the DSM) and other diagnostic criteria such as the international classification of diseases-10)
Why should we assess? • What do we gain? • What is the cost? • How should it be done? • What should be assessed? • First assignment is a 2 page paper on the first organizing question (think about this)
The functional role of assessment for patients • To provide a functional analysis of the patient (“what” they can and cannot do, with less emphasis on “why”) • To direct treatment (one must know what is wrong in order to select an intervention) • What interventions/therapies have you (or are you) learning and what are the indicators for implementing those interventions?
Your assessments count! • See Exhibit 1-1 pp 2-3 Daniel Hoffman v. Board of Education of city of NY • Reports are used by mental health workers, administrators, courts, etc. • Words can be misleading (be clear) • IQs can change • Different tests may provide diff. IQs • Base decisions on multiple tests • Use appropriate tests • Review previous findings/testing
Types of Assessment (pp. 4-5) • Screening • Focused • Diagnostic • Counseling and Rehabilitation • Progress Evaluation • Problem-solving
Four Pillars of Assessment • 1. normed-referenced tests* • 2. interviews • 3. observations • 4. informal assessment procedures *Tests: a) same item content, b) same administration procedure, and c) same scoring criteria (i.e., must be standardized to be considered tests)
Steps in the assessment process • 1. Review referral • 2.Decide whether to accept it • 3. Obtain relevant background info • 4. Consider influence of relevant others • 5. Observe client in multiple settings • 6. Select/administer appropriate test battery • 7. Interpret the assessment • 8. Develop/select intervention strategies • 9. Write report • 10. Meet with examinee • 11. Follow-up an re-evaluate (see Dawes)
Clinical assessment/judgmentDawes et al., 1989 • Although the literature is replete with criticisms of standardized assessments, how does the more informal version (i.e., clinical judgment) do relative to actuarial models? • How is your clinical judgment? • Do you expect that it will improve with training? • How would you stack up compared to someone with no training who is just given instructions to follow? • See Dawes et al., 1989
Assessment patterns – what is assessed- Lubin et al, 1985 • Most commonly used tests have varied over time and setting • Today, a wide variety of tests are employed, representing many diverse perspectives (from behavioral to psychodynamic) • Compare educational vs. inpatient vs. counseling settings • Tests employed driven by referral
Psychometrics • Teach individuals about 5 tests, or teach them how to evaluate tests? • That feed you, or teach you to fish thingy. • Psychometrics allow for an understanding of what makes a test effective, how to evaluate them, how to create good ones, and how to extend this process to other methods of evaluation • This is where science (the process of doing research) and clinical work overlap
Scaling • Categorical – named categories • Ordinal – order, but unequal intervals • Interval – equal intervals but no true 0 point • Ratio – a true 0 point • Only scale that technically allows for the calculation of a mean, SD, and most parametric statistics • The type of scale will dictate the type of statistics that can be used • E.g., a nominal scale should only use the mode as a measure of central tendency
Tools of the trade • You must be very familiar (for this class and ultimately, the licensing exam) with the following concepts – they will be reviewed in greater detail in the readings • Measures of central tendency – mean, mode, median • How to calculate them, the strengths and weaknesses of each, when to use them • Measures of variability – SD & various ranges • How to calculate them, the strengths and weaknesses of each, when to use them
Tools – continued 1 • How do the measures of central tendency and variability relate to one another? Are there some that shouldn’t be used together? • Understanding the normal curve and probability theory (see overhead of percentages) • This information is necessary in order to interpret any assessment results • Why is variability important? (between, not within, individuals)
Reliability • Reliability – consistency between raters (see Cohen’s Kappa), between parallel versions of the same test, within the one test (split half, Chronbach’s alpha), & from one administration to another • How does this relate to standardization? How does this relate to variability? How does this relate to validity? How does this relate to the accuracy of measurement? • Standard error of measure (SEM) = SD X square root of (1 – the test’s reliability) • Possible range of SEM is 0 to the test’s SD • The smaller the SEM the better? Why?
Reliability and error Rating errors • Constant (leniency, severity, tendency to the mean), halo effects, contrast (with previous subject or oneself), proximity (an item’s location on the printed page can result in ratings similar to nearby items), most-recent-performance, and/or inadequate information errors • These can be minimized with more raters, exact instructions, intense training, frequent evaluation and recalibration
p. 2 – reliability and error • Scale calibration • More items are typically needed to achieve high reliability, but there are exceptions • Guttman approach involves ordering items in terms of their level of difficulty (ascending) • How does one determine level of difficulty? • This approach assumes that once a specified number of items are missed, the more difficult items to follow will also be missed, therefore no need to administer them • Cost of this approach? Problems?
p. 3 – reliability and error • Coefficient of determination or R-squared is used when determining the amount of one variable that can be accounted for by a second variable (predictor) • Criterion contamination – when one knows information that makes it impossible to do a fair test of criterion validity (e.g., race of the skulls) • Must conduct blind ratings
p. 4 – reliability and error • Base rates represent an extremely important source of information and are often ignored (e.g., Rosenthal’s famous study of students who claim to hear voices and admit themselves to a psychiatric hospital) • Why is it easier to predict behaviors or outcomes that occur at a base rate near 50%?
p. 5 – reliability and error • All measurement represents an estimate of whatever is being assessed. Therefore statistics are needed to help make such estimates (inferences) • Statistical power is crucial to decision making • Alpha = Type I error or the probability of rejecting the null when it is true • Beta = Type II error or the probability of failing to reject the null when it is in fact false. • Parametric vs. nonparametric (few or no distributional assumptions for the data)
Minimizing error • Standardization – refers to the consistency in applying methods • Implications for testing • Costs of violating standardization (weigh such a decision very carefully, as there are major costs) • Use of proper norms – when can the norm group deviate from those to whom it is applied? • What constitutes an effective norm group? • Systematic and random error • Difficulties in detection and correction
An applied look at the SEM • In November of 2000, we tried to elect a president. • What happens when the margin of victory is smaller than the margin of error (SEM) in counting (the latter being approx 1 in 7,000)? • Impossible to ever know who really won • Issue of what constitutes a “vote” (removal of chad, depression of chad, intent to vote, etc.) • What margin of victory is a real victory? (significant difference is determined by the standard error of measure)
Validity • Validity – is the test doing what you think it’s doing? • Face validity is important for lay people (for them to believe the test is valid). Other advantages/disadvantages of face validity? • Content, construct, predictive, convergent, discriminant. • Internal/external validity (trade-off?)
Factor analysis: What is it? (validity?) Assignment: Using point form, briefly describe the key events the television show “The Apprentice” • What themes emerge? How many different themes? Are the different themes related? • Qualitative FA • Factor analysis represents a method of organizing & reducing data into latent (not directly assessed) constructs. • This is a mathematical rather than a conceptual (qualitative) organization of the data • Can be exploratory (no a priori theory) or confirmatory (compares data to a theory or previous data) • FA in APA journals: 70s = 4%, 80s = 9%, 90s = 21%
Factor analysis: Why do it? • Data reduction • conserve df • minimize problems of multicollinearity • Models for computing composite scores & item parcels • Scale construction & revision (improve psychometrics) • empirical validation (or revision) of theoretical models • arrangement of items (pos & neg loadings) • relative importance of different items; how central each is to the latent construct (how to do item selection?) • Factor(s) must be replicable, generalizable & interpretable • Mean comparisons are irrelevant if factor structures differ e.g., typical study comparing males & females, patients to non-patients
Factor analysis: How to do it? Start with multi-item (min 3/construct) on a ratio or interval scale EFA (Exploratory Factor Analysis) • Ns can range from 5 subjects per item minimum to the ideal 10:1 ratio, though depends on loadings and communality (1-unqueness). Min. = 100 • EFA will determine the number of factors to extract • Varies with the number of latent constructs and the number of items (under vs. over extraction) • Simulated sets of random data will still result in the emergence of factor(s), so check scree plot of factors and their eigenvalues to find the descending linear trend (see p. 291). • Eigenvalue? The amount of the variance explain by each vector. Standardize items to z-scores (M=1, Var=1), sum of the variances = # items. - Item loadings = the items correlation with the vector on which it loads. • When do factors dip below eigenvalues for a random data set with same N? (p. 291)
How many latent factors should emerge? • Scree plots, 50% variance rule & your theory (do not use the 1.0 eigenvalue default) • Low item loadings (.35 or <) typically represent error variance and will not replicate in an independent sample, so avoid factors made up of such item loadings • As items are added, more factors emerge, but this is not just an artifact of the number of items. It could be that new factors are emerging… • e.g., I often feel tired, I am rarely sad, I cry often, I never smile, I rarely sleep, Often I am not hungry • 2 possible factors emerge assessing the latent constructs of depression and timeframe
How to do it? Oblique vs. Orthogonal rotations • You must determine the relation between all of the factors (assuming there is more than 1 factor) • This should be based on a theoretical rationale • Orthogonal (statistically independent/unrelated) • Items with high loadings on F1 are near 0 loading on F2 = simple structure • Oblique (stat. dependent); Fs allowed to inter-correlate • Orthogonal rotations – advantage is that it is more easily interpretable, though it may not fit well with the data (if the latent constructs are not independent) • Oblique rotations – advantage is that it can account for more of the data (especially if the latent constructs are not independent), though it is more difficult to interpret
How to interpret an EFA? • Examine the number of factors and the number of rotations needed for the factor structure to converge • Low loadings (< .35) are likely to be error variance • Factors with few items are likely to be spurious factors • Will the factor structure replicate in a second independent sample? • This is essential, especially when the initial EFA was truly exploratory • Emergent factors will capitalize on chance associations in the data set (i.e., the same type I errors observed whenever conducting numerous analyses) • Now that you have a factor structure, what next?
What is CFA? • CFA is a powerful statistical technique that allows one to define a model and then determine how well the data set matches the predicted model (using chi-square and several fit indices) – can test entire model simultaneously • The predicted model can be theoretically derived or empirically derived (see EFA findings), though if it is the later it MUST be on a different sample to allow for cross-validation • With large samples, randomly split the data (using random ID selection) into to equal sections • Minimum N for CFA = 200 million. Findings are more robust (stable) as N increases
Assessing the fit of the model in CFA • Compare predicted model to observed data using the chi-square statistic (the smaller the better = no sig difference between observed and expected) • Nested modeling – compare the fit of different models to each other Law of parsimony = all multifactor models must fit the data at a level that is sig. better than a one factor model (calculate the chi-square difference) • Indices of fit also used to evaluate the fit of all models – based on model chi-square, null model chi-square, and df. • Comparative fit index (CFI) = 1-[(C2m-dfm)/(C2n-dfn)] • Bentler-Bonnet index (BBI) = (C2n - C2m)/C2n • Delta2 (small Ns), TLI (all require fit > .90), RMSR(0-.05).
Factor analysis and construct validity • No longer acceptable to publish a scale without considering its factorial structure • If your scale is supposed to assess one construct, then this issue can be empirically evaluated (EFAs & CFAs). • Ultimately, one can only test the number of factors and how they relate to one another, not the actual content of the factors (inferred from item content) • If the theory for the construct and the FA do not correspond, then there are two alternatives: • 1) The underlying theoretical constructs may not be correctly specified (your theory is wrong) • 2) The theory may be adequate, but the scale used to assess it is not • Examples?
Organization of constructs • Factor structures – factor analysis (FA) and confirmatory factor analysis (CFA) • Differences between these procedures • Meaning of eigenvalues (extraction) • Rotations (e.g., oblique vs. orthogonal) • How are the constructs inter-related? • Organizational and explanatory power • Theoretical and (not vs) empirical decisions
What is the construct of intelligence? • How do you define it? • How have others defined it? • This definition will determine how the tests are constructed, administered, and interpreted • What is intelligence and do modern IQ tests measure it? – write paper on this • Various conflicting views on this (e.g., Gould suggests that we can’t measure it and don’t with our current tests whereas Boring suggests that it IS what intelligence tests measure) • See handout of definitions
PSY 525 Intellectual Assessment A few tests that we will focus on are those that are commonly used by psychologists, those that are psychometrically sound, and those that you need to know in order to do your job.
Assessing LD with the WAIS-III/IV • Individuals with LD in reading and math generally exhibit IQ scores in the average range. • Index scores are, however, noteworthy: • VCI tend to be 7-13 points higher relative to WMI scores (e.g., VCI is 15 points or greater than the WMI for almost 42% of those with reading disabilities). • POI is approx. 7 points higher than PSI scores for all LD individuals (e.g., POI scores are at 15 points higher than PSI scores for almost 31% of those with LDs).
Intelligence testing in problem populations • The Leiter was developed to evaluate cognitive functioning (i.e., intelligence) in individuals who are deaf-mute, nonverbal persons. • It can also be used with clients from other cultures who do not (or minimally) verbalize in English nor their native language
Neuropsychological Evaluation • Head trauma is primary cause of closed head injuries in the population (adolescents and adults) • Closed head injuries cause more widespread injuries and usually result in a period of lost consciousness. • Amnesia usually results (anterograde and retrograde) • Duration of anterograde amnesia is the best predictor of degree of injury and probability of recovery