Issues in Measuring Behaviour: Why do we want to quantify everything? Types of psychological test. Factors affecting tes

Issues in Measuring Behaviour: Why do we want to quantify everything? Types of psychological test. Factors affecting test reliability. Factors affecting test validity.

Why quantify? 1. Science involves measurement - because measurements can be objectively obtained, are publicly available, and potentially checkable by sceptical others. 2. Science often (but not invariably) involves experimentation - because the experimental method is good for identifying cause and effect.

Types of psychological test: Purpose of tests: 1. Research. 2. Practical applications - clinical, educational, occupational. A 20th century phenomenon, dating back to Binet (1900s).

Dangers of testing: 1. Untrained use - easy to administer, hard to interpret. 2. Spurious “precision”, because quantitative. 3. Misapplication of findings, in a deterministic way. 4. Essentially descriptions of groups; less reliable as descriptions of individuals.

Fast woman or slow man? Sex differences in reaction time?

Types of test: 1. Performance (e.g. IQ tests): Problems - Motivation. Standardisation (culture-fair?) 2. Disposition (e.g., anxiety, extroversion): Problems - Social desirability (criterion-keyed tests). Ambiguity. Need appropriate norms.

Test reliability: A test is reliable if it gives consistent/reproducible results. A score = “true” score + error: Error is due to (a) natural performance variation; (b) lack of precision in defining and measuring psychological constructs (e.g. what do we mean by terms like "aggression" or "intelligence?")

Measures of reliability: (a) Test - retest (time to time). (b) Alternate forms (version to version). (c) Split-half (item to item). (d) Inter-scorer (person to person).

Factors affecting reliability: 1. The phenomenon itself (traits vs. states). 2. Precision of measurement. 3. Length of test (long > short). 4. Time between tests (short > long). 5. Variability in performance (high > low). 6. Format: Multiple choice of 5 answers per question: 20 % correct by chance. True/false: 50% correct by chance. Multiple choice therefore more reliable than true/false. 7. Inter-individual variability in scores (high > low).

Time 1: A B C D Time 1: A B C D Time 1: B A D C Time 1: A B C D Group 1: Group 2: The greater the variability between individuals in test scores, the better the reliability:

Test validity: A test is valid if it measures what it is supposed to be measuring. Important - a test can be reliable without being valid (but not vice versa).

Example of reliable but invalid measurements: • Paul Broca (1870s): • Searched for anthropometric measurements that correlated with the known ranking of human races in terms of intelligence and civilisation. • e.g. ratio of forearm to upper arm: more “ape-like” in negroes than whites. (Abandoned once he realised that by this criterion, whites were more ape-like than Eskimos, aborigines and other inferior races!) • Brain weight: men >women>negroes>gorillas. • Modern brains heavier than mediaeval brains. • French brains heavier than German brains! • 292 male brains : mean weight = 1,325 grams. • 140 female brains: mean weight = 1,144 grams (14% difference). • No account of age of death (young men, old women) or body size. • "We might ask if the small size of the female brain depends exclusively upon the small size of her body...But we must not forget that women are, on the average, a little less intelligent than men...We are therefore permitted to suppose that the relatively small size of the female brain depends in part upon her physical inferiority and in part upon her intellectual inferiority" (1861, p.153).

Measures of validity: (a) Face validity (intuitively looks plausible). (b) Content validity (test covers material which is considered relevant - eg. statistics exams shouldn’t contain history questions!). (c) Criterion validity - predictive or concurrent. Problem - finding appropriate/decent criteria. (d) Construct validity (does performance correlate well with known measures of the phenomenon?). (e) Ecological / external validity.

Factors affecting validity: Norms and standardisation. (a) How well was the test standardised? Stratified random sampling is ideal. Do sub-group norms exist? (b) Are sufficient details given to ensure correct administration? (c) How appropriate is the standardised group as a baseline against which to compare your sample?

Ecological validity: To what extent are our results generalisable to the real world? Depends - e.g. driving simulators are good for simulating vehicle control, useless for simulating how riskily people are prepared to drive. L.E.D. brakelights - light faster, but do the milliseconds make any practical difference to a following driver's braking times?

Issues in Measuring Behaviour: Why do we want to quantify everything? Types of psychological test. Factors affecting tes