1 / 25

Test Design & Construction

Test Design & Construction. RSCH 6109: Assessment & Evaluation Methods. Purpose & Framework Test Specifications or Blueprint Item Construction Field Testing Evaluation & Revision. Classification of Items Response Formats Scoring Procedures Select Item Writing Guidelines MCQs

cearley
Télécharger la présentation

Test Design & Construction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Test Design & Construction RSCH 6109: Assessment & Evaluation Methods • Purpose & Framework • Test Specifications or Blueprint • Item Construction • Field Testing • Evaluation & Revision Classification of Items Response Formats Scoring Procedures Select Item Writing Guidelines MCQs Likert Rating Scales

  2. Test Design & Construction

  3. Test Design & Construction Step 1: Delineate the Purpose & Framework The purpose and framework delineate what the test is intended to measure. Step 2: Prepare the Table of Specifications The table of specifications typically describes the specific format of the items, the response format, and the type of scoring procedures. Step 3: Develop Test Items or Tasks

  4. Defining the Purpose RSCH 6109: Assessment & Evaluation Methods • Like a mission statement for the test • Define the construct to be measured • Define the population with whom the test is to be used • Determine the target audience for the information the test provides, the test users • Define the nature of the decisions to be made based on the information the test provides

  5. What is a Construct? RSCH 6109: Assessment & Evaluation Methods • A construct is an unobservable quality, ability, or attribute • We believe from theory that each person possesses some “amount” of the construct • We can’t directly observe or measure the “amount” or level • We rely on outward behaviors as indicators of the latent, or underlying construct • Contrast Blood Pressure and Depression

  6. Defining the Content Domain RSCH 6109: Assessment & Evaluation Methods • Theory • Literature • Expert opinion • Qualitative research • The goal is to include all aspects of the construct you intend to measure

  7. Test Design & Construction Step 1: Delineate the Purpose & Framework Example: (Optimal) The purpose of the Counselor Achievement Test (CAT) is to assess counseling students’ knowledge, skills, and abilities for effective counseling services. The framework of the CAT is modeled after the National Counselor’s Exam (NCE) and includes eight content areas. The CAT will consist of 24-32 selected- and constructed-response items, as well as performance tasks. The CAT will be a criterion referenced measure. (Typical) The purpose of the Study Habits Scale (SHS) is to assess college students’ habits of study. The SHS includes (between 18 and 30) items. The framework of the SHS is based on the work of Blai (1993). The SHS is a self-report measure designed to identify students’ study attitudes and behaviors.

  8. TTL# Content Area Item Classification* Format (#of Items) K C AP AN S E 3 Human growth and development 1 1 1 MCQ (2) Constructed Response (1) 3 Social and cultural foundations 1 1 1 MCQ (2) Constructed Response (1) 3 Helping relationships 1 1 1 MCQ (2) Constructed Response (1) 3 Group Work 1 1 1 MCQ (2) Constructed Response (1) 3 Career and lifestyle development 1 1 1 MCQ (2) Constructed Response (1) 3 Appraisal 1 1 1 MCQ (2) Constructed Response (1) 3 Research and program evaluation 1 1 1 MCQ (2) Constructed Response (1) 3 Professional orientation & ethics 1 1 1 MCQ (2) Constructed Response (1) 24 6 6 6 2 2 2 Test Design & Construction Step 2: Develop the test specifications or blueprint The table of specifications or test blueprint typically describes the number of items, the specific classification of the items andresponse format, and the type of scoring procedures. Sample Table of Specifications for CAT *Refers to Bloom’s Taxonomy of Educational Objectives (1956). K=knowledge, C=comprehension, A=application, A=analysis, S=synthesis, and E=evaluation

  9. Developing Items RSCH 6109: Assessment & Evaluation Methods • Determine the target length in time to administer and number of items • Consider intended use and practical constraints – cost, complexity of scoring, etc. • Consider the purpose and the stakes involved in decision making • Initially write at least twice as many items as needed • Contrast a screening test with a diagnostic test

  10. Screening Tests RSCH 6109: Assessment & Evaluation Methods • Short • Easy to administer • Inexpensive • Easy to score • Maximizes Sensitivity • Makes the correct decision when the condition of interest is present – Minimizes false negatives.

  11. Diagnostic Tests RSCH 6109: Assessment & Evaluation Methods • Longer • More complex to administer • More expensive • Harder to score • Maximizes Specificity • Makes the correct decision when the condition of interest is not present – Minimizes false positives.

  12. Test Design & Construction Step 2: Develop the test specifications or blueprint The table of specifications typically describes the specific classification of the items, the response format, and the type of scoring procedures. Item Classifications: Bloom and Krathwohl (1956) Knowledge  Comprehension  Application  Analysis  Synthesis  Evaluation

  13. Bloom, et al’s Taxonomy of Educational Objectives (Cognitive Domain) Knowledge Remembering previously learned material. Requires recall of facts, procedures, Define, Recall, Identify, List, Name rules or events. ComprehensionGrasping the meaning of material. Requires reformulation, restatement, translation, Convert, Explain, Summarize or interpretation of content or identification of relationships. ApplicationUsing information in concrete situations. Requires use of information in a setting Compute, Demonstrate, Solve or context other than where it was learned. AnalysisBreaking down material into parts. Requires recognition of logical errors, Analyze, Infer, Differentiate, Relate comparison of components, or differentiation between components. SynthesisPutting parts together into whole. Requires production of something original, Design, Construct, Combine, Formulate solution to an unfamiliar problem, or combination of parts in an unusual way. EvaluationJudging the value of a thing for a given purpose using definitive criteria. Requires Discriminate, Critique, Evaluate,Judge formation of judgements about the worth or value of ideas, products, or procedures that have a specific purpose.

  14. Test Design & Construction Response Formats: Selected-Response Response sets are provided and the user is forced to select among the choices. Examples include: MCQ, T/F, Yes/No, Matching, and Likert Ratings Constructed-Response No response sets are provided and the user is forced to provide a unique response. Examples include: Short Answer & Extended Answer. Performance Tasks No response sets are provided and the user is required to develop a product or perform some task or set of tasks. Examples include: Restricted and Extended Performance Tasks.

  15. Selected-Response Formats 1. Multiple Choice Questions (MCQ) Multiple choice items include a question or STEM followed by a number of possible responses or OPTIONS. These options make-up the RESPONSE SET of the item. 2. True – False Questions True – false items include a stem and two discrete options. These options can be “True-False”, “Yes-No”, “Always-Never”, etc. 3. Matching Items Matching exercises consist of two columns of information. The student is required to select the item in the second column which best reflects the item in the first column. 4. Likert Rating Scale Items Likert ratings include a scale ranging from one extreme to another. The anchors of the scale vary depending on the nature of the statement.

  16. Constructed-Response Formats (Optimal Performance) 1. Short Answer Questions Completion or short answer formats consist of questions that can be answered with a word or short phrase, or a statement having one or more omitted words. 2. Limited Essay Questions Limited essay questions consist of tasks or items requiring students to give brief, concise responses. 3. Extended Essay Questions Extended essay questions consist of tasks or items that allow students freedom to choose the form and scope of their responses.

  17. Format Advantages Disadvantages Difficult and time consuming to write higher order cognitive items. Most items assess knowledge thru comprehension. Guessing reduces validity of scores. Limited in complexity. Guessing reduces validity of scores. Not appropriate for optimal performance measures. Higher order cognitive skills are difficult to assess. Guessing reduces validity of scores. Limited to items that require very few words. Spelling errors can make scoring difficult. Time consuming to administer and score. Limited content can be sampled during a test period. Scoring can be subjective. MCQ Assesses broad range of skills in a limited amount of time. Scoring can be done quickly and objectively. True-False Numerous items can be administered in a brief amount of time. Easy to write and objective to score. Matching Assessed a broad range of skills in a limited time. Scoring can be done quickly and objectively. Short Answer Numerous items can be administered in a short time. Moderately easy to write and score items. Guessing is difficult. Essay Assesses broad range of skills, particularly higher order cognitive skills. Guessing is difficult.

  18. Test Design & Construction Step 2: Develop the test specifications or blueprint Scoring Procedures: Selected-Response Typically, selected response items include 1 correct answer (a.k.a., dichotomous scoring). However, some tests may weigh responses differently. Rating scale items are typically added together for a total score. For example, ten 5-point Likert rating scale items would yield a score range from 10 to 50. Typically, a higher score denotes stronger agreement, satisfaction, etc. with the overall construct.

  19. Test Design & Construction Step 2: Develop the test specifications or blueprint Scoring Procedures (continued): Constructed-Response These formats are relatively more subjective, time consuming, and expensive to score. Short-answer items require a list of acceptable answers. Extended response items typically require a scoring rubric. A scoring rubric is a table describing the criteria for scoring, including detailed descriptions for varying degrees of performance. The scoring rubric may yield a holistic or analytic score. Holistic scores refer to the overall impression of the response (or behavior) and analytic scores refer to the discrete dimensions of the response (or behavior). Holistic scores yield one overall score and analytic scores typically yield sub-scores as well as an overall score. Performance tasks vary depending on the nature and complexity of the tasks. Scoring procedures may require a checklist, Likert rating scale, or rubric.

  20. Supplemental Information: MCQ Alternatives

  21. Supplemental Information: MCQ Alternatives 3. Elimination & Inclusion Scoring Student is asked to either cross out all the alternatives that are incorrect (elimination) or circle the alternatives that are most likely correct (inclusion). 4. Multiple-Answer Format Student is told that any number of the options might be correct. Each item is scored by subtracting the number of incorrect answers from the number of correct answers.

  22. Sample Item: Confidence Weighting Please respond to the following items by circling the letter that corresponds to the correct response. In addition, please rate your level of confidence with your response to each item by circling the corresponding confidence level. What is the main advantage of using a table of specifications when preparing an achievement test? A. It reduces the amount of time required. (+0) B. It improves the sampling of content. (+1) C. It makes the construction of test items easier. (+0) D. It increases the objectivity of the test. (+0) Please circle the number that corresponds to the best descriptor for your level of confidence with the answer chosen: 5 4 3 2 1 Extremely Fairly Neutral Fairly Extremely Confident Confident Unconfident Unconfident Scoring Guide: Multiply the correct answer by the level of confidence. For this example, the student would receive 4 out of a possible 5 points.

  23. Sample Item: AUC Please answer the following items by removing the overlay that corresponds to your response. If the answer chosen reveals an “INCORRECT” response, continue selecting until you reveal the “CORRECT” response. Once you have identified the “CORRECT” response you have completed the item and should move on to the next question. What is the main advantage of using a table of specifications when preparing an achievement test? A. It reduces the amount of time required. B. It improves the sampling of content. C. It makes the construction of test items easier. D. It increases the objectivity of the test. Scoring Guide: 1st Attempt = 100% 2nd Attempt = 66% 3rd Attempt = 33% 4th Attempt = 0%

  24. Sample Item: Elimination Scoring Please respond to the following items by circling the letter that corresponds to the correct response. In addition, please draw a line through those items that you confidently believe are incorrect. What is the main advantage of using a table of specifications when preparing an achievement test? Scoring Guide A. It reduces the amount of time required. (+05) B. It improves the sampling of content. (+85%) C. It makes the construction of test items easier. D. It increases the objectivity of the test.

  25. Test Design & Construction RSCH 6109: Assessment & Evaluation Methods • Purpose & Framework • Test Specifications or Blueprint • Item Construction • Field Testing • Evaluation & Revision Classification of Items Response Formats Scoring Procedures Select Item Writing Guidelines MCQs Likert Rating Scales

More Related