Educational Research

Educational Research Chapter 5 Selecting Measuring Instruments Gay, Mills, and Airasian

Topics Discussed in this Chapter • Data collection • Measuring instruments • Terminology • Interpreting data • Types of instruments • Technical issues • Validity • Reliability • Selection of a test

Data Collection • Scientific inquiry requires the collection, analysis, and interpretation of data • Data – the pieces of information that are collected to examine the research topic • Issues related to the collection of this information are the focus of this chapter

Data Collection • Terminology related to data • Constructs – abstractions that cannot be observed directly but are helpful when trying to explain behavior • Intelligence • Teacher effectiveness • Self concept Obj. 1.1 & 1.2

Data Collection • Data terminology (continued) • Operational definition – the ways by which constructs are observed and measured • Weschler IQ test • Virgilio Teacher Effectiveness Inventory • Tennessee Self-Concept Scale • Variable – a construct that has been operationalized and has two or more values Obj. 1.1 & 1.2

Data Collection • Measurement scales • Nominal – categories • Gender, ethnicity, etc. • Ordinal – ordered categories • Rank in class, order of finish, etc. • Interval – equal intervals • Test scores, attitude scores, etc. • Ratio – absolute zero • Time, height, weight, etc. Obj. 2.1

Data Collection • Types of variables • Categorical or quantitative • Categorical variables reflect nominal scales and measure the presence of different qualities (e.g., gender, ethnicity, etc.) • Quantitative variables reflect ordinal, interval, or ratio scales and measure different quantities of a variable (e.g., test scores, self-esteem scores, etc.) Obj. 2.2

Data Collection • Types of variables • Independent or dependent • Independent variables are purported causes • Dependent variables are purported effects • Two instructional strategies, co-operative groups and traditional lectures, were used during a three week social studies unit. Students’ exam scores were analyzed for differences between the groups. • The independent variable is the instructional approach (of which there are two levels) • The dependent variable is the students’ achievement Obj. 2.3

Measurement Instruments • Important terms • Instrument – a tool used to collect data • Test – a formal, systematic procedure for gathering information • Assessment – the general process of collecting, synthesizing, and interpreting information • Measurement – the process of quantifying or scoring a subject’s performance Obj. 3.1 & 3.2

Measurement Instruments • Important terms (continued) • Cognitive tests – examining subjects’ thoughts and thought processes • Affective tests – examining subjects’ feelings, interests, attitudes, beliefs, etc. • Standardized tests – tests that are administered, scored, and interpreted in a consistent manner Obj. 3.1

Measurement Instruments • Important terms (continued) • Selected response item format – respondents select answers from a set of alternatives • Multiple choice • True-false • Matching • Supply response item format – respondents construct answers • Short answer • Completion • Essay Obj. 3.3 & 11.3

Measurement Instruments • Important terms (continued) • Individual tests – tests administered on an individual basis • Group tests – tests administered to a group of subjects at the same time • Performance assessments – assessments that focus on processes or products that have been created Obj. 3.6

Measurement Instruments • Interpreting data • Raw scores – the actual score made on a test • Standard scores – statistical transformations of raw scores • Percentiles (0.00 – 99.9) • Stanines (1 – 9) • Normal Curve Equivalents (0.00 – 99.99) Obj. 3.4

Measurement Instruments • Interpreting data (continued) • Norm-referenced – scores are interpreted relative to the scores of others taking the test • Criterion-referenced – scores are interpreted relative to a predetermined level of performance • Self-referenced – scores are interpreted relative to changes over time Obj. 3.5

Measurement Instruments • Types of instruments • Cognitive – measuring intellectual processes such as thinking, memorizing, problem solving, analyzing, or reasoning • Achievement – measuring what students already know • Aptitude – measuring general mental ability, usually for predicting future performance Obj. 4.1 & 4.2

Measurement Instruments • Types of instruments (continued) • Affective – assessing individuals’ feelings, values, attitudes, beliefs, etc. • Typical affective characteristics of interest • Values – deeply held beliefs about ideas, persons, or objects • Attitudes – dispositions that are favorable or unfavorable toward things • Interests – inclinations to seek out or participate in particular activities, objects, ideas, etc. • Personality – characteristics that represent a person’s typical behaviors Obj. 4.1 & 4.5

Measurement Instruments • Types of instruments (continued) • Affective (continued) • Scales used for responding to items on affective tests • Likert • Positive or negative statements to which subjects respond on scales such as strongly disagree, disagree, neutral, agree, or strongly agree • Semantic differential • Bipolar adjectives (i.e., two opposite adjectives) with a scale between each adjective • Dislike: ___ ___ ___ ___ ___ :Like • Rating scales – rankings based on how a subject would rate the trait of interest Obj. 5.1

Measurement Instruments • Types of instruments (continued) • Affective (continued) • Scales used for responding to items on affective tests (continued) • Thurstone – statements related to the trait of interest to which subjects agree or disagree • Guttman – statements representing a uni-dimensional trait Obj. 5.1

Measurement Instruments • Issues for cognitive, aptitude, or affective tests • Problems inherent in the use of self-report measures • Bias – distortions of a respondent’s performance or responses based on ethnicity, race, gender, language, etc. • Responses to affective test items • Socially acceptable responses • Accuracy of responses • Response sets • Alternatives include the use of projective tests Obj. 4.3, 4.4

Technical Issues • Two concerns • Validity • Reliability

Technical Issues • Validity – extent to which interpretations made from a test score are appropriate • Characteristics • The most important technical characteristic • Situation specific • Does not refer to the instrument but to the interpretations of scores on the instrument • Best thought of in terms of degree Obj. 6.1 & 7.1

Technical Issues • Validity (continued) • Four types • Content – to what extent does the test measure what it is supposed to measure • Item validity • Sampling validity • Determined by expert judgment Obj. 7.1 & 7.2

Technical Issues • Validity (continued) • Criterion-related • Predictive – to what extent does the test predict a future performance • Concurrent - to what extent does the test predict a performance measured at the same time • Estimated by correlations between two tests • Construct – the extent to which a test measures the construct it represents • Underlying difficulty defining constructs • Estimated in many ways Obj. 7.1, 7.3, & 7.4

Technical Issues • Validity (continued) • Consequential – to what extent are the consequences that occur from the test harmful • Estimated by empirical and expert judgment • Factors affecting validity • Unclear test directions • Confusing and ambiguous test items • Vocabulary that is too difficult for test takers Obj. 7.1, 7.5, & 7.7

Technical Issues • Factors affecting validity (continued) • Overly difficult and complex sentence structure • Inconsistent and subjective scoring • Untaught items • Failure to follow standardized administration procedures • Cheating by the participants or someone teaching to the test items Obj. 7.7

Technical Issues • Reliability – the degree to which a test consistently measures whatever it is measuring • Characteristics • Expressed as a coefficient ranging from 0 to 1 • A necessary but not sufficient characteristic of a test Obj. 6.1, 8.1, & 8.7

Technical Issues • Reliability (continued) • Six reliability coefficients • Stability – consistency over time with the same instrument • Test – retest • Estimated by a correlation between the two administrations of the same test • Equivalence – consistency with two parallel tests administered at the same time • Parallel forms • Estimated by a correlation between the parallel tests Obj. 8.1, 8.2, 8.3, & 8.7

Technical Issues • Reliability (continued) • Six reliability coefficients (continued) • Equivalence and stability – consistency over time with parallel forms of the test • Combines attributes of stability and equivalence • Estimated by a correlation between the parallel forms • Internal consistency – artificially splitting the test into halves • Several coefficients – split halves, KR 20, KR 21, Cronbach alpha • All coefficients provide estimates ranging from 0 to 1 Obj. 8.1, 8.4, 8.5, & 8.7

Technical Issues • Reliability (continued) • Six reliability coefficients • Scorer/rater – consistency of observations between raters • Inter-judge – two observers • Intra-judge – one judge over two occasions • Estimated by percent agreement between observations Obj. 8.1, 8.6, & 8.7

Technical Issues • Reliability (continued) • Six reliability coefficients (continued) • Standard error of measurement (SEM) – an estimate of how much difference there is between a person’s obtained score and his or her true score • Function of the variation of the test and the reliability coefficient (e.g., KR 20, Cronbach alpha, etc.) • Estimated by specifying an interval rather than a point estimate of a person’s score Obj. 8.1, 8.7, & 9.1

Selection of a Test • Sources of test information • Mental Measurement Yearbooks (MMY) • The reviews in MMY are most easily accessed through your university library and the services to which they subscribe (e.g., EBSCO) • Provides factual information on all known tests • Provides objective test reviews • Comprehensive bibliography for specific tests • Indices: titles, acronyms, subject, publishers, developers • Buros Institute Obj. 10.1 & 12.1

Selection of a Test • Sources (continued) • Tests in Print • Tests in Print is a subsidiary of the Buros Institute • The reviews in it are most easily accessed through your university library and the services to which they subscribe (e.g., EBSCO) • Bibliography of all known commercially produced tests currently available • Very useful to determine availability • Tests in Print Obj. 10.1 & 12.1

Selection of a Test • Sources (continued) • ETS Test Collection • Published and unpublished tests • Includes test title, author, publication date, target population, publisher, and description of purpose • Annotated bibliographies on achievement, aptitude, attitude and interests, personality, sensory motor, special populations, vocational/occupational, and miscellaneous • ETS Test Collection Obj. 10.1 &12.1

Selection of a Test • Sources (continued) • Professional journals • Test publishers and distributors • Issues to consider when selecting tests • Psychometric properties • Validity • Reliability • Length of test • Scoring and score interpretation Obj. 10.1, 11.1, & 12.1

Selection of a Test • Issues to consider when selecting tests • Non-psychometric issues • Cost • Administrative time • Objections to content by parents or others • Duplication of testing Obj. 11.1

Selection of a Test • Designing your own tests • Get help from others with experience in developing tests • Item writing guidelines • Avoid ambiguous and confusing wording and sentence structure • Use appropriate vocabulary • Write items that have only one correct answer • Give information about the nature of the desired answer • Do not provide clues to the correct answer • See Writing Multiple Choice Items Obj. 11.2

Selection of a Test • Test administration guidelines • Plan ahead • Be certain that there is consistency across testing sessions • Be familiar with any and all procedures necessary to administer a test Obj. 11.4

Educational Research