480 likes | 492 Vues
Learn the process of creating and implementing measuring tools like psychological tests, scales, questionnaires, and inventories. Understand the characteristics, drafting recommendations, pilot testing, and scoring.
E N D
1. Kinds of tools 1. Test: • Psychological measurement tool, generally to measure cognitive variables (skills, knowledge, performance, etc.). • The participants' responses to each of the items are correct or incorrect. • The total score obtained in the test is calculated by adding all correct answers (directly or weighted sum).
1. Kinds of tools 2. Scale: • Usually, tool designed to measure non-cognitive variables: attitudes, interests, preferences, etc. • On a scale of graduated and ordered categories, participants have to choose the category that best represents his/her position on what the test is measuring. There are not right or wrong answers. • The total score is the sum of the scores assigned to the categories chosen by participants.
1. Kinds of tools 3. Questionnaire: • Usually, it is composed by items that are not necessarily related to each other, whose response options are not ordered or graded, which can be interpreted individually and in which there are no right or wrong answers. • Usually, it includes varied questions to obtain more information about the participant and his environment (age, profession, education level, opinion on the discussed topic, etc.). • Typical in survey research.
1. Kinds of tools 4. Inventory: • Usually, tool to measure personality variables. • The responses of the participants are not right or wrong; they demonstrate compliance or not with respect to statements of the items.
2. Process of construction 2.1. Purpose: • What • Who we are going to measure • Why 2.2. Characteristics of the tool: • Content • Format: kind of items • Length: number of items • Psychometric characteristics of the items 2.3. Drafting of the items: recommendations 2.4. Critical revision of the items by a group of experts.
2. Process of construction 2.5. Pilot test: • Administration instructions • Presentation format • Answer sheet format 2.6. Implementation of the pilot test: • Individual-collective • Paper and pencil, computer, telephone, etc. 2.7. Correction of pilot test and assignment of scores to the participants: • In tests composed by items of choice • In tests composed by items of construction
2.1. Purpose2.1.1. Definition of the construct • Psychological variable (e.g., verbal skills). • Unobservable directly (theoretical). • It is shown through a group of behaviors that can be observed directly, so they can be measured (e.g. number of words known to name a thing; appropriateness in choosing a word depending on the context). • These behaviors, in order to be considered as manifestations of the construct, have to be more or less uniform and constant along the time and in a variety of situations. • All questions that refer to one construct should be reflected in the items of the test.
2.1. Purpose of the test2.1.2. Population to which it is aimed • Different in: • Content. • Vocabulary. • Length. • Conditions of application. • Etc. • E.g., child - adult population.
2.1. Purpose of the test2.1.3. Planned use • We have to consider decisions that are going to be taken based on the scores obtained. • E.g., to detect gifted children – items specially difficult; to detect handicapped children – items specially easy. • Possible uses: selection, classification, diagnosis, certification, guidance/advice, description/information.
2.2. Characteristics of the test2.2.1. Content • Set of behaviors considered manifestations of the construct. • To delimitate these behaviors: • Content analysis. • Bibliographic review. • Direct observation. • Expert opinion.
2.2. Characteristics of the test2.2.2. Format of theitems • Choice items: • Closed response. • Choose one or more alternatives. A1. Twoalternatives (true/false; yes/no; correct/incorrect). • It is often used to measure cognitive variables (skills, abilities and knowledge, and performance test). • Advantage: quick and easy to use. • Disadvantage: participants who do not know the answer and respond randomly have a 50% chance of choosing the correct answer.
2.2. Characteristics of the test2.2.2. Format of the items A2. Multiple choice: • More than 2 alternatives: 3-5 to reduce the possibility of choosing the correct alternative by chance. • One alternative is the correct or the most appropriate. • It is usedtomeasurecognitive variables, mainly in knowledge and performance tests. • Advantage: easy to administrate and mark. • Disadvantage: more difficult to construct than the test of two alternatives.
2.2. Characteristics of the test2.2.2. Format of the items A3. Matching or pairing: - The participant matches concepts structured into two columns. • It is used to measure cognitive variables, specially knowledge. A4. ‘Cloze’ or incomplete format - E.g., the participant has to fulfill gaps in one sentence with words from a list.
2.2. Characteristics of the test2.2.2. Format of the items A5. Rating scale: • One statement and alternative gradual categories along a continuum. • Participant chooses the alternative that best reflects his personal attitude. • It is used to measure non-cognitive variables (attitudes, interests, personality, etc.). • Advantage: participants express their position accurately. • Disadvantage : answer bia: some participants tend to choose the central options (moderate); other participants tend to the extrems (radical).
2.2. Characteristics of the test2.2.2. Format of the items A6. Checklists. Valorative scale in which participants have to show his/her opinion on any facts presented in the statement. • The options list are not ordered but are independent. • Sometimes it is possible to choose more than one option. • Typical format of the questionnaire. • E.g. Your favourite sport is: a) soccer; b) tennis; c) golf.
2.2. Characteristics of the test2.2.2. Format of the items B. Items of construction (open response): the participant have to develop their own response. • We can evaluate not only the level of knowledge of the participants, but their way of structure contents and express them, or their originality. B1. Short answer. B2. Long answer or essay.
2.2. Characteristics of the test 2.2.3. Test length • There is no single solution because you have to take into account many factors (population to be targeted, time constraints, objective test, etc.). • It is recommended that the pilot study includes a larger number of items that will be used for the final version.
2.2. Characteristics of the test 2.2.4. Psychometric characteristics of the items • In CTT, one test is easy or difficult for a given population, depending on the probability that participants have to answer it properly. If this probability is high, the item will be easy and vice versa. • An item will have a high degree of homogeneity with the rest of the items that compose the test when they measure the same thing. • An item will be discriminatory to the extent that it is useful to differentiate between participants who have obtained extreme scores on the test.
2.2. Characteristics of the test 2.2.4. Psychometric characteristics of the items • As regards the difficulty of the items: • Speed test: • Items should be very easy to solve. • The limited time to answer is the factor that allows us to discriminate between participants. • Maximum performance test: • Mainly used in the evaluation of academic performance, abilities and skills. • The items have different levels of difficulty. • Typical performance test: • Mainly used to measure personality, attitudes, etc. • It makes no sense to talk about difficulty because there are not right and wrong answers.
2.3. Drafting of the items2.3.1. General recommendations • Avoid ambiguous statements. • Use short, direct and accurate statements. • Avoid statements that cause response bias: one option is more likely to be chosen regardless of the opinion of the participant; e.g., if participant has to admit a socially unacceptable behavior.
2.3. Drafting of the items2.3.1. General recommendations • Express a single idea in the statement (to avoid double questions). Ex. Are you in favor of reducing alcohol consumption among young people and raise taxes on alcoholic drinks? • Avoid double negations in sentences. Ex. Do you think it is not possible that man never landed on the moon?
2.3. Drafting of the items2.3.2. Recommendations to choose two-alternative items • Be sure that the item is undoubtedly true or false. Ex. Dalí was the greatest painter of the twentieth century. • Avoid words in the title which might lead to the right answer although participants don’t know it (such as always or never). • Locate along the test, randomly, the items whose statement is correct to avoid response patterns recognizable by participants.
2.3. Drafting of the items2.3.2. Recommendations to choose multiple-choice items • Ensure that the statement of the item formulates the problem clearly. • Include most of the text in the statement to avoid unnecessary repetitions in the response options. • Ensure that the ‘distractors’ (incorrect alternatives) are plausible. • Avoid response options as 'None of the previous' or ' All previous'. • There is only one correct option unless you clearly indicate otherwise. • All alternatives are uniform in length and with a similar grammatical construction. • Randomize the location of the correct alternative. • Make that all options seem equally attractive. • Ensure that each alternative is grammatically consistent with the item statement.
2.3. Drafting of the items 2.3.3. Response bias • When choosing items, take into account the possibility of response bias, especially in affective tests (personality, interests, attitudes, etc.): • Acquiescence. Tendency to respond systematically to agree (or disagree) with the statement of the item regardless of its content. • Social desirability. Tendency to respond to the item in a socially acceptable way. • Indecision. Tendency to select the neutral option. • Extreme response. Tendency to choose the ending categories.
2.4. Critical revision of the items by a group of experts • It is preferable that they haven’t been involved in developing the items, so they will be able not only if items are adapted to the content, but the clarity of writing, if they meet the standards in terms of format, etc. • After reviewing the items and remove or correct those which were not suitable, you can construct the preliminary test (the pilot one).
2.4. Critical revision of the items by a group of experts • To determine if items are useful to measure a concrete dimension of the construct, Osterlind index of congruence can be calculated: Xijk = Rating of item i in the dimension k by the judge j. N = number of dimensions in the instrument.n = number of judges.
2.4. Critical revision of the items by a group of experts • An Osterlind index is calculated for each item. • The possible results oscillate between ±1, depending on the degree of congruence in the expert answers: • -1 would imply that all the experts agree that the item does not fit to its dimension at all. • +1 would implies that all the experts assigned the highest degree of fitness item-dimension. • 0 would be the lowest degree of agreement between expert opinions. • Items which obtain 0.5 or a higher score are usually included in the proposed test.
2.4. Critical revision of the items by a group of experts Example: We are constructing a scale to measure the quality of psychological interventions with items referred to 3 dimensions: “extrinsic characteristics, “substantive characteristics” and “methodological characteristics”. We would like to know if items “Moments of measurement: a. after intervention; b. before and after intervention” (item 1); and “Period of intervention: a. <6 months; b. ≥6 months” (item 2) are useful to measure the methodological dimension of the construct quality. Number of experts that answered each option (-1: useless; 0: neither useless nor useful; +1: useful) is presented in the table below:
2.4. Critical revision of the items by a group of experts • Calculate the Osterlind index of congruence for items 1 and 2. • Should any item be removed?
2.4. Critical revision of the items by a group of experts • 0.2<0.5→Item 2 should be removed.
2.5. Pilot test2.5.1. Administration instructions • Each type of test requires certain instructions, but some are common: • Do not use threatening language. Ex. This test will allow us to know how smart you are. • At maximum performance tests (e.g., skill test), explain that the items are of different difficulty. It will reduce anxiety. • At speed tests, explain that time is limited and that only very few people will be able to complete the test.
2.5. Pilot test2.5.1. Administration instructions • You must provide one or more items as an example. • They should inform about how to allocate time and what to do when the participant does not know the answer to an item. • Instructions should encourage participants to answer all the questions, because participants’ score tends to decrease when many answers are blank. • Explain how to mark the choices.
2.5. Pilot test2.5.2. Presentation format2.5.3. Answer sheet format • The presentation format should be clear and readable by all participants, to prevent inadvertent mistakes like confusing the answer box. • Request identification data from the participants at the beginning of the test. • Then, you should present the instructions.
2.5. Pilot test2.5.2. Presentation format2.5.3. Answer sheet format • Then, present the items: • In tests that measure cognitive variables (knowledge, skills, etc.), sort the items out based on their level of difficulty. Do not put hard questions firstly. • In tests that measure non-cognitive variables, be careful not to include delicate questions at the top. • When a test includes items of various formats, they should appear grouped. • Group items that refer to the same topic.
2.6. Implementation of the pilot test • Decide on the method of administration and select one sample of participants belonging to the same population as those for which the test was designed. • Method of administration: • Collective-individual. • Oral (in person or by phone). • Paper and pencil. • By computer (lower cost of time). • By mail.
2.7. Correction of pilot test and assignment of scores to the participants2.7.1. In test composed by items of choice • In a cognitive test: • As it may have correct and incorrect answers, we check whether the responses of the participants match or not with the correct template. Usually, one point for each correct answer. • Final score: usually, the sum of correct answers. • Given the influence of answering by chance and personality patterns (e.g., more or less risky), emphasize that participants do not leave any unanswered item or use a procedure to control the effect of chance on the final score. • It is preferable to use a correction for chance to carry out control.
2.7. Correction of pilot test and assignment of scores to the participants2.7.1. In test composed by items of choice • We can apply the correction formula with two procedures: • 1. Penalizing mistakes. It is assumed that the participant does not know the right answer and that all the item alternatives are equally attractive. XC = corrected score. R = number of items answered right. Rr= number of items answered right at random. W = number of items answered wrong. K = number of alternatives per item.
2.7. Correction of pilot test and assignment of scores to the participants2.7.1. In test composed by items of choice • 2. Reclaiming items not answered. It is assumed that the participant answered only the questions he/she knew and, therefore, there were no mistakes. XC = corrected score. R = number of items answered right. Rr= number of items answered right at random. O = number of omissions. K = number of alternatives per item.
2.7. Correction of pilot test and assignment of scores to the participants2.7.1. In test composed by items of choice • It is advisable to use the first procedure because with the second one scores would be overestimated. • E.g.: Two students knew 10 of 20 questions in an exam (true-false). Student A only answered the 10 he/she knew; student B decided to answer all questions (responding randomly, he/she hit 5 more). Correct their score using procedures 1 and 2.
2.7. Correction of pilot test and assignment of scores to the participants2.7.1. In test composed by items of choice Correction by procedure 1 Correction by procedure 2 Participant A: Participant B: • Participant A: • Participant B: overestimation
2.7. Correction of pilot test and assignment of scores to the participants2.7.1. In test composed by items of choice • When the same test is formed by items with different number of alternatives, we will apply the correction for chance in parts to know the final score for each participant. • Items will be grouped according to the number of alternatives and we will calculate the participant score in each group. • Final score: sum of partial scores.
2.7. Correction of pilot test and assignment of scores to the participants2.7.1. In test composed by items of choice Example: A test is formed by 100 items: 25 items have 2 alternatives; 25, 3 alternatives; and 50, 4 alternatives. (a)Using procedure 1, calculate the corrected score for chance obtained by a participant that answered the 100 items and hit 14 2-alternative items, 21 3-alternative items and 29 4-alternative items; and (b) Calculate the score obtained without making any correction by chance.
2.7. Correction of pilot test and assignment of scores to the participants2.7.1. In test composed by items of choice Example: (a) (b)
2.7. Correction of pilot test and assignment of scores to the participants2.7.1. In test composed by items of choice • In non-cognitive tests: In the absence of correct and incorrect answers, items have a different numerical value assigned for each alternative of response. It is necessary that the numerical assignment to each response category is well done. Correction of test and assignment of scores to the participants: adding the numerical values assigned to the response alternatives chosen by the participant.
2.7. Correction of pilot test and assignment of scores to the participants2.7.1. In test composed by items of choice • When it is used for example, a format of rating scales, we must be very clear which the direction of the continuous of the variable that is being measured is. • If it is an attitudinal variable, we must know which the ends of the continuum that marks a favorable and unfavorable attitude are. Ex. Depression. Which end marks the lack of depression and which refers to the maximum extent. • Then, decide to which end of the continuum the highest numerical value is assigned and take care about all items following the same allocation rule.
2.7. Correction of pilot test and assignment of scores to the participants2.7.2. In test composed by items of construction • Methods: • Analytic mark: • Stablish dimensions to be considered. • Define the correct answer. • Final mark: two values (pass/fail). • Non-experts could evaluate. • Holistic mark: • Global evaluation. • Final mark: different possible values (e.g., 0-10). • Experts previously trained have to evaluate.
2.8. Summary of steps (Croker & Algina, 1986) • Process to develop a measurement tool: • Definition of the target. • Definition of the construct: inductive or deductive process. • Description of the construct components: they can range from very specific or one-dimensional to very general or multidimensional. • Instrument design. • Drafting of items: clarity, no ambiguity, short essay. • Analysis of items quality: descriptive and statistical information. • Reliability: stability of test scores and internal consistency. • Validity: adequacy of inferences made from scores on the test. • Development of implementing rules, interpretation and norm-reference.