中英西南基础教育项目(广西)学生学习进步测试试题编制中英西南基础教育项目(广西)学生学习进步测试试题编制 广西师范大学教育科学学院 韦义平 firstname.lastname@example.org
内容导航 项目背景 内容与产出 项目愿景 研讨会主要内容 会议议程 学生进步国际评价方式 TIMSS/PIRLS PISA PISA语文测查内容 PISA数学测查内容 多队列追踪评价模式 希望回答的问题 要解决的问题 应对要点 标准化测试要求 各省命题任务 测题标准与题型 数学 3年级 5年级 7年级 9年级 题目量： 语文 3年级 5年级 7年级 9年级 测题编制技术与要求 题型分配 模块 锚题 成果 命题分组 进度
中英甘肃基础教育项目 取得了很多成功经验（改善办学条件；促进教育公平；开发培训材料；开展全员培训；完善教育管理体制；加强师范院校能力建设；开展相关研究） 新的合作意向： 向西南地区推广甘肃项目的经验； 配合政府项目，加强软件建设，注重人员及机构能力的提高 ； 关注最弱势群体（项目省、县的选择）。 项目背景
项目定位：发挥政府的主导作用，配合我国当前教育政策和教育发展重点工作实施项目，并围绕《国家西部地区“两基”攻坚计划》，结合《2003－2007年教育振兴行动计划》有关内容和目前农村教育工作重点，以西南地区为项目目标地区，开展项目活动，实现相关地区“两基”攻坚目标，提高教育质量，改善相关地区基础教育水平。项目定位：发挥政府的主导作用，配合我国当前教育政策和教育发展重点工作实施项目，并围绕《国家西部地区“两基”攻坚计划》，结合《2003－2007年教育振兴行动计划》有关内容和目前农村教育工作重点，以西南地区为项目目标地区，开展项目活动，实现相关地区“两基”攻坚目标，提高教育质量，改善相关地区基础教育水平。 项目特点：配合政府项目，加强软件建设，注重人员及机构能力建设 ；关注最弱势群体（项目省、县的选择）。 新项目的合作重点及特点
项目省（区） 云南省、四川省、贵州省、广西壮族自治区 项目县（27县） 国家级贫困县 到2002年底尚未实现“普九” 项目目标地区
贫困学生资助: 资助初中寄宿制贫困学生，优先考虑弱势群体，如贫困儿童、女童、少数民族儿童和残疾儿童； 教师培训：包括教育管理人员的培训，相关培训资源开发及教师培训系统的建设； 学校发展计划：校长培训，以改善学校管理，同时促进社区参与； 监测与评估：本项目；义务教育； 社会发展和制度发展：公平意识，完善机构等 项目主要内容（产出）
愿景： 通过软件建设，有效的配合政府项目，加强人员和机构能力建设，提高政府实施基础教育项目的能力，以更好的惠及最弱势儿童，包括贫困家庭的儿童、女童、少数民族儿童和残障儿童。 （让软的硬起来、让硬的更硬） 项目愿景
产出一：弱势儿童，特别是初中阶段的女童公平接受九年义务教育的机会得到增加。产出一：弱势儿童，特别是初中阶段的女童公平接受九年义务教育的机会得到增加。 愿 景： 为了所有的孩子受到平等的教育机会 给予少数民族等弱势儿童以更多的关注 提高政府相关政策执行的有效性 贫困学生资助
产出二：通过提高教师发展系统的能力，改善教与学的效果，以触及最弱势的儿童。产出二：通过提高教师发展系统的能力，改善教与学的效果，以触及最弱势的儿童。 愿 景： 一个都没少；个个学得好； 机制更完善；能力有提高。 教师培训
产出三：学校管理系统得到改善，从而提高弱势儿童的福利，重点在于提高校级管理水平和改进学校管理标准。产出三：学校管理系统得到改善，从而提高弱势儿童的福利，重点在于提高校级管理水平和改进学校管理标准。 愿 景： 让学校成为孩子们的乐园，成为社区内学习交流的场所 学校发展计划
产出四：监测与评估系统能力得到提高，以将政策及实践导向支持最弱势的儿童。产出四：监测与评估系统能力得到提高，以将政策及实践导向支持最弱势的儿童。 愿 景： 烧一把火 加一盘菜 （在中国政府工作的基础上，项目帮助政府把普九工作做得更好） 监测与评估
产出五：教育系统的能力得到提高，从而更好的满足最弱势儿童的需要。产出五：教育系统的能力得到提高，从而更好的满足最弱势儿童的需要。 愿 景： 让项目的阳光照到项目区的每一个角落； 让所有相关的人与机构为孩子们服务 让每一个孩子享受有质量的教育 社会发展和制度发展
项目管理框架图 项目高级管理小组 部内工作小组 项目技术 支持小组 国家项目办 跨省管理小组 省项目办 省项目办 省项目办 省项目办 项目县（27个贫困县）
人员：至少6人 负责项目5各领域产出 县级财务管理 职责： 全面协调项目在本县范围内的活动； 根据本县项目实施具体情况制定管理制度和细则，并报省项目办备案； 草拟本县项目活动年度计划及预算； 负责本县所有项目活动的实施和财务管理； 及时总结和评估项目活动的质量及成效； 具体协调县级其他部门在项目实施过程中的参与； 监测本县项目实施进展，并每半年向省项目办提交一份书面报告； 其他项目事宜。 县项目办
主要目标 为编制SBEP（广西）学生成绩测试试题作准备。 主要内容 中英西南基础教育项目学生学业成绩测试的指导思想、测试方式、内容选择、题目结构、阶段衔接等。商定编制的具体进程。 本次研讨会主要内容
会议日程 2.8 8:30-9:50 学生学业成绩评价模式:TIMSS与PISA 9:50-10:10 休息 10:10-11:30 讨论:SBEP学生学业成绩评价的指导思想 14:30-15:50 多队列追踪评价模式的实施方法和技术 15:50-16:00 休息 16:00-17:30 试题编制分组,小组讨论测试方案 2.9 8:30-9:50 小组汇报测试方案与学科组交流 9:50-10:10 休息 10:10-11:30 样题讨论 14:30-15:50 各小组预编测试题 15:50-16:00 休息 16:00-17:00 各小组预编题讨论、交流 17:00-17:30 学生学业成绩测试日程计划
学生学业成绩评价模式TIMSS/PIRLS与PISA Models forInternational Assessment TIMSS/PIRLS andPISA
Trends in International Mathematics and Science Study: Progress in International Reading Literacy Study TIMSS:国际数学与科学学习倾向测验 PIRLS:国际基本读写学习进步测验 Internationally standardised sample-based assessments of grade 4 (TIMSS and PIRLS) and grade 8 (TIMSS only) students. Administered to at least 4,500 students in each country. About 420 minutes of testing at grade 8, and 330 minutes at grade 4; but students take different subsets of test items. Students, principals and teachers complete background questionnaires. Tests based on school curriculum, objectives specific to grades 4 and 8. In the background study, curricula are compared, and core topics identified. Paper-and-pencil tests are used, a total of two hours per student. Items are a mixture of multiple-choice and constructed-response. Half the items are devoted to measuring trends. What is TIMSS/PIRLS?
Programme in International Student Assessment PISA:国际学生评估程序 An internationally standardised sample-based assessment of 15-year-olds, implemented in 56 countries (2006 cycle). Between 4,500 and 10,000 students tested in each country. PISA covers reading, mathematical and scientific literacy.Each cycle explores one domain in depth: reading in 2000, mathematics in 2003, and science in 2006. Tests based on important knowledge and skills needed in adult life, not the school curriculum. Emphasis on mastery of processes, understanding of concepts and ability to function in various situations. Paper-and-pencil tests, two hours for each student; mixed multiple-choice and constructed response items. Total 390 minutes of testing; but different students answer different subsets of items. Students and school principals answer a background questionnaire. Assessment takes place every three years with a plan in place to 2015. What is PISA?
Both are sample-based international assessments – assessing the system, not individuals Both use background questionnaires But they are applied at different points (TIMSS at grades 4 and 8; PISA at grade 9) TIMSS at intermediate grades; PISA at the end of basic education. Their different approaches reflect this. Similarities and Differences
To determine the extent to which young people have acquired the wider knowledge and skills in reading, mathematical and scientific literacy that they will need in adult life. Assessment of competencies is cross-curricular because: Application of specific knowledge acquired in school depends crucially on the acquisition of broader concepts and skills. For example, in mathematics, being able to reason quantitatively and to represent relationships or dependencies is more valuable in everyday life than the ability to answer familiar textbook questions. A focus on curriculum content might restrict attention to common elements, and make the assessment too narrow to inform governments about the strengths and innovations in the education systems of other countries. Broad skills including communication, adaptability, flexibility, problem solving and the use of information technologies, essential for students, are developed across the curriculum and assessing them requires a broad cross-curricular focus. The Primary Aim of the PISA Assessment
PISA Definition of Mathematical Literacy Mathematical literacy is the capacity to identify and understand the role mathematics plays in the world, to make well-founded judgements and to use mathematics in ways that meet the needs of that individual’s life as a constructive, concerned and reflective citizen. • the situations or contexts in which the problem is located; • the mathematical content used to solve the problem, organised by certain overarching ideas; and • the competencies that connect the real world, in which problems are generated, with mathematics, to solve the problems. Three components must be distinguished:
Space and shape Change and relationships Quantity Uncertainty PISA Mathematical Content PISA mathematical content is defined by four overarching ideas: Quantity includes: • number sense (inc. relative size, different representations of, equivalent forms of numbers); • understanding the meaning of operations (e.g. comparisons, ratios and percentages); • having a feel for the magnitude of numbers (e.g., length, area, volume, height, speed, mass, air pressure, money value); • elegant computations; • mental arithmetic; and • estimation (inc. providing rationale for the selecting data and level of precision required).
Thinking and reasoning: posing questions; knowing the kinds of answers that mathematics offers; understanding and handling the extent and limits of given mathematical concepts. Argumentation: knowing what mathematical proofs are; following and assessing mathematical arguments; and creating and expressing mathematical arguments. Communication: expressing oneself, on matters with a mathematical content, and understanding others’ mathematical communication. Modelling:translating reality into mathematical structures; interpreting mathematical models in terms of reality; working with a mathematical model. Problem posing and solving: posing, formulating and defining different kinds of mathematical problems, and solving different kinds of mathematical problems in a variety of ways. Representation: decoding, encoding, translating, interpreting different forms of representation of mathematical objects and situations. Using symbolic, formal and technical language and operations: and understanding its relationship to natural language. Use of aids and tools: knowing about, and being able to use, various aids and tools that may assist mathematical activity. The PISA competencies
PISA Competency Clusters PISA does not test competencies individually. Rather, competencies are clustered
The TIMSS Curriculum Model • TIMSS uses the curriculum as its major organizing concept. • The TIMSS curriculum model has three parts: the intended curriculum, the implemented curriculum, and the achieved curriculum. • These represent: the mathematics society intends students to learn; what is actually taught in classrooms; and, what students learned.
TIMSS uses curriculum-based achievement tests to describe student learning Achievement results are related to information about the intended curriculum, teacher preparation, experience, and attitudes, instructional approaches, the organization and resources of schools and classrooms, and the experiences and attitudes of the students in the schools. TIMSS therefore allows countries to compare their curricula with international practices, as well as assessing learning achievement based on major curriculum goals. The TIMSS Analysis Model
TIMSS Content Domains The table shows the target percentages of testing time devoted to each content domain for both the fourth and eighth grade assessments. At fourth grade, the Algebra content domain is called Patterns, Equations, and Relationships.
Whole numbers(整数、全数) Fractions and decimals(分数和小数) Integers(整数) Ratio, proportion, and percent(比、比例、百分数 Topic Areas: Number Each content domains is a separate analysis and reporting category. Each is divided into topic areas. For example, “Number” is divided into:
Represent whole numbers using words, diagrams or symbols, including recognizing and writing numbers in expanded form. Demonstrate knowledge of place value. Compare and order whole numbers. Identify sets of numbers according to common properties such as odd and even, multiples, or factors. Compute with whole numbers. Estimate computations by approximating the numbers involved. Solve routine and non-routine problems, including real-life problems. Number: Specific Objectives Each topic area is subdivided into grade-specific assessment objectives written in terms of student understandings or abilities that items aligned with these objectives are designed to elicit. The grade 4 specific objectives for “Number” are:
TIMSS Cognitive Domains • Students need to be familiar with mathematics content ; but cognitive skills are just as important. • As an aid in developing balanced tests in which appropriate weight is given to each cognitive domain across all topics, a full set of desirable learning outcomes mathematics educators is essential. • Descriptions of the skills and abilities that will be assessed along with the content are thus defined in detail in the frameworks. • These skills and abilities have been classified into four cognitive domains as in the table below.
Example: Knowing Facts and Procedures • Facility in using mathematics, or reasoning about mathematical situations, depends primarily on mathematical knowledge. • The more relevant facts a student is able to recall, the greater the potential for engaging a wide range of problem-solving situations • Procedures form a bridge between more basic knowledge and using mathematics to solve routine problems.
Total testing time for mathematics is distributed as evenly as possible across the four overarching ideas (space and shape, change and relationships, quantity and uncertainty)… …and across the four situations described in the framework (personal, educational/occupational, public and scientific). The proportion of items reflecting the three competency clusters (reproduction, connections and reflection) is about 1:2:1. The PISA Assessment Structure
Unsuitable Question Types: true-false (because it encourages guessing); multiple choice with fewer than four options (also encourages guessing); multiple-part matching items (complex scoring, tendency to test the same objective several times), “essay-type” constructed-response (i.e., those with a long scale and a relatively open scoring rubric). Suitable Question Types: multiple choice with four or more options; dichotomously-scored (“closed”) constructed-response; and partial-credit (“open”) constructed response with a short scale (e.g., 0-1-2) and a clear, easily followed marking scheme. Multiple Choice Questions(MCQs) can be scored reliably without complex instructions; they can be scored quickly and objectively; they are easily analysed statistically, and tend to be relatively valid and reliable. They are usually short, so students can answer a relatively large number in a given time, improving curriculum coverage. But because the options often guide test-takers, they are an imperfect guide to a test-taker’s knowledge of specific points. Constructed Response Items do not “lead” test takers in the way that multiple-choice does, so are a more reliable guide to knowledge of specific goals. Some aspects of content (e.g., ability to plan, organise, present arguments) can only be tested through constructed response items.But they are more difficult to score, and may be less reliable, than MCQ. This is particularly true of partial-credit items; the longer the scale, the less reliable the distinctions that have to be made. Item Types Some question types are more suitable than others for large-scale testing. In general, questions should test a single specific objective, and should be objectively-scored. Variety in format is not necessarily a virtue; but simplicity of format and language most definitely is. In PISA, about one-third of the items are in multiple-choice response type, about one-third in closed constructed response type, and about one-third in open-constructed response type.
Based on experience, the multiple-choice type is generally regarded as most suitable for assessing items that would be associated with the reproduction and connections competency cluster. The example shows a multiple-choice item associated with the connections competency cluster. Students must translate the problem into mathematical terms, devise a model to represent the periodic nature of the context described, and extend the pattern to match the result with one of the given options. Example: SEAL A seal has to breathe even if it is asleep. Martin observed a seal for one hour. At the start of his observation the seal dived to the bottom of the sea and started to sleep. In 8 minutes it slowly floated to the surface and took a breath. In 3 minutes it was back at the bottom of the sea again and the whole process started over in a very regular way. Question: After one hour the seal was: A. At the bottom B. On its way up C. Breathing D. On its way down A PISA Multiple-Choice Item
Closed-constructed response items pose questions similar to multiple-choice items, but students are asked to produce a response that can be easily judged to be either correct or incorrect. For items in this type, guessing is not likely to be a concern, and distractors (which influence the construct that is being assessed) are not necessary. The example shows a closed-constructed response item with one correct answer and many possible incorrect answers. A PISA Closed-Constructed Response Item
Example: FARMS Below is a student’s mathematical model of the farmhouse roof in the shape of a pyramid, with measurements added. The attic floor, ABCD in the model, is a square. The beams that support the roof are the edges of a block (rectangular prism) EFGHKLMN. E is the middle of AT, F is the middle of BT, G is the middle of CT and H is the middle of DT. All the edges of the pyramid in the model have length 12 m. Question: Calculate the area of the attic floor ABCD. The area of the attic floor ABCD = ............................ m²
A PISA Open-Constructed Response Item Open-constructed response items require a more extended response, and may involve higher-order thinking. Student may be asked to show the steps taken or to explain the answer. Such items allow students to respond at a range of levels of mathematical complexity. Marking the responses may require an element of professional judgement. There is potential for disagreement between markers. Data of the population of Indonesia and its distribution over the islands is shown in the table. One of the challenges Indonesia faces is the uneven distribution of the population. From the table we see that Java has less than 7% of the total area, but almost 62% of the population. Question: Design a graph (or graphs) showing the uneven distribution of the Indonesian population.
A valid assessment of the TIMSS content (mathematics and science together) would take at least seven hours at grade 8 and more than five and a half hours at grade 4. It is not reasonable to expect each student to answer so many questions. Based on experience, testing time should not exceed 90 minutes for grade 8 and 65 minutes for grade 4, plus 15-30 minutes for the student questionnaire. TIMSS therefore divides the assessment material among students. TIMSS Booklet Design
The items in the pool are first grouped into clusters or blocks of items. In TIMSS 2003, there were 28 blocks, 14 in mathematics and 14 in science. Eighth-grade blocks contain 15 minutes of assessment items and fourth-grade blocks 12 minutes; otherwise the general design is identical. TIMSS includes items from earlier assessments to measure trends as well as new items. Of the 14 item blocks in each subject, six (blocks 1 through 6) contain items from earlier TIMSS assessments, eight (blocks 7 through 14) contain new replacement items. Allocation of Items to Blocks
The main aim is to maximize coverage of the framework while ensuring that every student responds to sufficient items to provide reliable measurement of trends in both mathematics and science. A further aim is to ensure that trends in the mathematics and science content areas can be measured reliably. To make linking among booklets, at least some blocks had to be paired with others. Since the number of booklets would be very large if each block were paired with all other blocks, block combinations were chosen to keep the number of student booklets to a minimum. The 28 assessment blocks are distributed across 12 student booklets. Each student booklet consists of six blocks of items. Half the booklets will contain four mathematics blocks and two science blocks, and the other half will contain four science blocks and two mathematics blocks. The same booklet design is used at both fourth and eighth grade. Block Design for Student Booklets
The Student’s View • Each student will complete one of the twelve student booklets and a student questionnaire. • The booklets are distributed so that approximately equal numbers of students respond to each. • The individual student workload is 72 minutes for the test and 30 for the questionnaire at grade 4, 90 and 30 at grade 8.
PISA语文测查内容 读写字词(掌握使用书面材料的基本 工具); 拼音(使用书面的材料,有效地参与社会 活动); 字词理解(在不同环境下对词语的理 解,包含几种层次的理解:直接的解释, 一些简单的推论);
阅读理解(实现个人的目标或发展个人知识与潜能并有效地参与社会的能力,包含PISA的三种不同层次的阅读过程,没有区分阅读的目的和阅读材料的形式.)阅读理解(实现个人的目标或发展个人知识与潜能并有效地参与社会的能力,包含PISA的三种不同层次的阅读过程,没有区分阅读的目的和阅读材料的形式.) 应用写作(使用书面的材料,有效地参与社会活动。在PISA和PIRLS通常是结构式问题)。 PISA语文测查内容
PISA数学（算术）测试内容 -- 整数和数的理解, -- 整数的四则运算, -- 分数、小数、百分数, -- 空间关系和几何图形, -- 简单的应用题 （钱、重量、长度、距离等）
背景 (1)传统的研究方法(非标准化考题) 年年考试→合格率→筛选学生 不能回答如下问题： (a)学校对学生的帮助有多大？ (b)学生学习的进步快慢？ (c)特别教育的干预效果和学生自己随年龄增长的进步水平各有多少？ 多队列追踪评价模式的实施方法和技术
背景 (2)实验研究方法 实验学校与非实验学校学习成绩对比 问题1：学生起点不一样使得终点成绩不能对比 问题2：实验学生和非实验学生的家庭、社会背 景不同可影响结果 问题3：非实验对照组的设定增加了研究成本 多队列追踪评价模式的实施方法和技术