The Collegiate Learning Assessment (CLA) Stephen Klein and Roger Benjamin June 12, 2008
Overview • Purposes of the CLA • Limitations of other approaches • CLA’s measures • CLA’s distinguishing features • Indices of test quality • Value-added score reporting • Research and development plans • Some silly criticisms and suggestions
Purposes of the CLA • Assess certain abilities that colleges and employers say are important • Critical thinking • Analytic reasoning • Problem solving • Writing skills • Compare amount of improvement in these skills over time between: • Colleges after controlling for input • Programs within colleges • Influence curriculum and instruction
Limitations of Other Assessment Methods • Accreditation (only measures inputs) • Actuarial indicators (Graduation rate, Access) • US News & World Report reputational rankings rather than student learning and improvement • NSSE (ambiguous choices for items that focus on “engagement” rather than learning) • Subject matter tests – too many majors and too little agreement on what to measure within each • Portfolios (very costly to score, unreliable grading, and no control for variation in task difficulty)
CLA’s Measures • Analytic writing prompts • Make-an-argument (45 minutes) • Break-an-argument (30 minutes) • Performance Tasks (90 minutes) • Several tasks of each type • All tasks are administered at all schools • A student takes no more than one task per type
Make-An-Argument Prompt “In our time, specialists of all kinds are highly overrated. We need more generalists – people who can provide broad perspectives.” Directions: 45 Minutes, agree or disagree and explain the reasons for your position. Answers graded on a few holistic scales.
Break-An-Argument Prompt Students are asked to discuss how well reasoned they find an argument to be (rather than simply agreeing or disagreeing with it). A respected professional journal with a readership that includes elementary school principals published the results of a two-year study on childhood obesity. This study sampled 50 children, ages 5-11, from Smith Elementary School. A fast food restaurant opened near the school just before the study began. After two years, students who remained in the sample were more likely to be overweight—relative to the national average. Based on this study, the principal of Jones Elementary School decided to address her school’s obesity problem by opposing the opening of any fast food restaurants near her school. Answers graded on analytic and holistic dimensions.
Performance Tasks • Realistic, job sample type tasks, role play • 5 to 8 questions/task • 6 to 10 diverse documents/task • Split screen: • Left side: directions, a question, and a box into which students key enter their answers • Right side: list of documents students are instructed to review, pop up by pressing key • Detailed analytic and holistic scoring guides
CLA’s Distinguishing Features – Focus • College mission statements guide skills tested • Measure high level skills needed across majors • Assess skills employers emphasize • Report results in terms of value-added • Improvement within a school over time(e.g., between freshmen and seniors) • Improvement relative to students with comparable ability at other colleges
CLA Opposite of NCLB • All colleges use the same tests and scoring rules • Focus on improvement rather than percent achieving some arbitrary standard that varies across states • Matrix sample tasks across students • Participation is voluntary • Provide realistic benchmarks against which to assess progress
CLA’s Distinguishing Features – Format • All open-ended, constructed response tests • Answers can be machine scored • Analyses presently focused on schools and programs • Matrix sampling of measures within schools • Control for input (ACT/SAT scores from registrar) • Paperless test administration and score reporting • Use engaging “work samples” that assess an integrated combination of skills
Indices of Test Quality • Validity • Reliability • Fairness • Cost effectiveness
Validity • Job sample tasks • Matrix sampling reduces question/prompt specific variance • Content validity vetted by students and faculty • Positive correlations with college grades • Construct validity (empirical study underway) • Rapid increase in colleges adopting CLA • Characteristics of participating schools are similar to those in the IPEDs national database • Building the case for validity is a continuous process……….
Reliability • Grading • Inter-reader consistency • Agreement between hand and machine assigned scores • Test scores – split sample analyses – high correlations: • School means on a task • School difference (“residual”) scores within a grade • School value-added scores across grades • High correlations could not occur if scores were unreliable • Results reported in peer reviewed national journals (see CAE website for details)
Fairness • Standardized test administration and scoring • Scores on different measures are converted to a common scale • Differences in CLA scores among racial/ethnic groups disappear when control on SAT scores • No systematic interaction of tasks with student demographic characteristics • Controls for contextual effects and reader drift
Cost Effectiveness • Paperless system • Machine scoring of essay answers • Some important skills cannot be measured (or measured well) with multiple choice tests • When the school is the unit of analysis for decision making: • Matrix sampling can be used to enhance validity • A sample of students is usually sufficient so that it is usually not necessary to test everyone
Value-Added Score Reporting • Provides an estimate of a school’s contribution to student learning after controlling for input. • Involves computing whether a school’s mean CLA score is higher or lower than what would be expected given (a) its mean SAT score and (b) the typical relationship between mean CLA and SAT scores among all the schools in the program. • Facilitates measuring and interpreting the progress a school’s students made relative to comparable students at other colleges. • Value added can be computed in different ways.
Research and Development Activities( A work in progress) • Compare effects of different ways of computing value added. • Conduct G-theory analyses to quantify amount of variance (measurement error) due to different sources. • Investigate construct validity collaboratively with ACT and ETS. • Explore whether task and prompt type interact with student background characteristics and academic major. • Assess whether measures constructed from the same shell have more similar statistical properties than tasks created from other shells. • Evaluate feasibility of extending the CLA to high schools, graduate schools, and colleges in other countries.
Some Silly Criticisms and Suggestions • “CLA residual and value added scores are unreliable” – BUT this is mathematically impossible given the high correlations in the split sample studies and other empirical data. • “Scores are less reliable when aggregated up to the school level” – BUT just the opposite is true. • “Computer grading will solve the 1 hour/portfolio scoring time problem” BUT portfolios cannot be machine scored. • “The 0.90 correlation between school level CLA and SAT scores shows these tests measure the same thing” – BUT: • The SAT and CLA require different types of preparation • High correlations between tests can occur even when they measure different things (e.g., still need to learn the law to pass the bar exam despite the 0.92 correlation between school level LSAT and bar exam scores).
3 Methods for Computing Value-Added • Linear regression using the school as the unit of analysis, the school’s mean SAT score as the sole predictor, and expected levels set by the standard error for the regression—this is the current method • Linear regression using the student as the unit of analysis with the SAT and a dummy variable for each school as the predictors (and a separate standard error for each school) • HLM treats students as nested within institutions with SAT as the student-level predictor (and a separate standard error for each school).
Fig. 1: Relationship Between Mean ACT Scores and Mean Total CLA Scores for Freshmen 31 Your Institution (Freshmen) Others (Freshmen) 27 CLAScore 23 Regression Intercept 8.02 Slope 0.66 R-square 0.80 19 15 15 19 23 27 31 ACT Score
Fig. 2: Relationship Between Mean ACT Scores and Mean Total CLA Scores for Seniors 31 Your Institution (Seniors) Others (Seniors) 27 CLAScore 23 Regression Intercept 11.96 Slope 0.62 R-square 0.75 19 15 15 19 23 27 31 ACT Score
Fig. 3: Relationship Between Mean ACT Scores and Mean Total CLA Scores for Freshmen and Seniors 31 27 CLAScore 23 19 15 15 19 23 27 31 ACT Score