Data mining with DataShop

Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University

“Knowledge components are the germ of transfer” Goal of the week: What does Ken mean by this?

Overview • Motivation for data mining • Better understanding of students => better instructional design • Exploratory Data Analysis • Data Shop demo, Excel • Learning curves & Learning Factors Analysis • Example project from last summer

Data Mining Questions & Methods • What is going on with student learning & performance? • Exploratory data analysis • Summary & visualization tools in DataShop • Tools in Excel: Auto filter, Pivot Tables, Solver • How to reliably model student achievement? • Item Response Theory (IRT) • Basis for standardized tests, SAT, GRE, TIMSS… • Version of “logistic regression”

Data Mining Questions & Methods 2 • What’s the nature of knowledge students are learning? How can we discover cognitive models of student learning that fit their learning curves? • Learning Factors Analysis (LFA) • Extends IRT to account for learning • Search algorithm: Discover cognitive model(s) that capture how student learning transfers over tasks over time • What features of a tutor lead to the most learning? • Learning Decomposition • Extends LFA to explore different rates of learning due to different forms of instruction • How to extract reliable inferences about causal mechanisms from correlations in data? • Causal modeling using Tetrad

Overview • Motivation for data mining • Better understanding of students => better instructional design • Exploratory Data Analysis • Demo: DataShop, Excel • Learning curves & Learning Factors Analysis • Example project from last summer Next

Data Shop Demo …

Before going to DataShop, let’s look at a tutor (1997 version!) that generated the example data set we’ll look at

TWO_CIRCLES_IN_SQUARE problem: Initial screen

TWO_CIRCLES_IN_SQUARE problem: An error a few steps later

TWO_CIRCLES_IN_SQUARE problem: Student follows hint & completes prob

How to get to the DataShop: Go to http://learnlab.org & click … 2 3 1

PSLC’s DataShop • Researchers get data access, visualizations, statistical tools • Learning curves track student learning over time • Discover what concepts & skills students need help with

PSLC’s DataShop • Learning curves reveal over- and under-practiced knowledge components • Rectangle-area has an initial low error rate, but is practiced often

Other DataShop Features • Error Reports • Identify misconceptions by looking for common student errors • When do students ask for hints? • Are there alternative correct strategies? • Performance Profiler • Export Data • Get all or part of the data in tab-delimited file • Use your favorite analysis tools …

Exported File Loaded into Excel

Overview • Motivation for data mining • Better understanding of students => better instructional design • Exploratory Data Analysis • Data Shop demo, Excel • Learning curves & Learning Factors Analysis • Example project from last summer Next

Cognitive Model drives behavior of intelligent tutor systems … • Cognitive Model: expert component of intelligent tutors that models how students solve problems 3(2x - 5) = 9 If goal is solve a(bx+c) = d Then rewrite as abx + ac = d If goal is solve a(bx+c) = d Then rewrite as abx + c = d If goal is solve a(bx+c) = d Then rewrite as bx+c = d/a 6x - 15 = 9 2x - 5 = 3 6x - 5 = 9 • Model Tracing: Follows student through their individual approach to a problem -> context-sensitive instruction

Hint message: “Distribute aacross the parentheses.” Bug message: “You need tomultiply c by a also.” Known? = 85% chance Known? = 45% Cognitive Model drives behavior of intelligent tutor systems … • Cognitive Model: expert component of intelligent tutors that models how students solve problems 3(2x - 5) = 9 If goal is solve a(bx+c) = d Then rewrite as abx + ac = d If goal is solve a(bx+c) = d Then rewrite as abx + c = d 6x - 15 = 9 2x - 5 = 3 6x - 5 = 9 • Model Tracing: Follows student through their individual approach to a problem -> context-sensitive instruction • Knowledge Tracing: Assesses student's knowledge growth -> individualized activity selection and pacing

Cognitive Modeling Challenge • Problem: Intelligent Tutoring Systems depend on Cognitive Model, which is hard to get right • Hard to program, but more importantly … • A high quality cognitive model requires a deep understanding of student thinking • Cognitive models created by intuition are often wrong (e.g., Koedinger & Nathan, 2004)

Significance of improving a cognitive model • A better cognitive model means: • better feedback & hints (model tracing) • better problem selection & pacing (knowledge tracing) • Making cognitive models better advances basic cognitive science

How can we use student data to build better cognitive models? • Cognitive Task Analysis methods • Think alouds, Difficulty Factors Assessment • General lecture Tuesday • Peer collaboration dialog analysis • TagHelper track • Newer: • Data mining of student interactions with on-line tutors

Back to DataShop to illustrate

Use log data to test alternative knowledge representations • Which “knowledge component” analysis is correct is an empirical question! • Log data from tutors provides data to compare different KC analyses • Find which “germ” accounts for student learning behaviors

Not a smooth learning curve -> this knowledge component model is wrong. Does not capture genuine student difficulties.

This more specific knowledge component (KC) model (2 KCs) is also wrong -- still no smooth drop in error rate.

Ah! Now we are getting a smooth learning curve. This even more specific decomposition (12 KCs) better tracks the nature of student difficulties & transfer for one problem situation to another.

Overview • Motivation for data mining • Better understanding of students => better instructional design • Exploratory Data Analysis • Demo: DataShop, Excel • Learning curves & Learning Factors Analysis • Example project from last summer Next

Example project from 2006 • Rafferty (Stanford) & Yudelson (U Pitt) • Analyzed a data set from Geometry • Applied Learning Factors Analysis (LFA) • Driving questions: • Are students learning at the same rate as assumed in prior LFA models? • Do we need different cognitive models (KC models) to account for low-achieving vs. high-achieving students?

Rafferty & Yudelson Results 1 • Different student learning rates? • Yes

Rafferty & Yudelson Results 2 • Is it “faster” learning or “different” learning? • Fit with a more compact model is better for low pre for high learn • Students with an apparent faster learning rate are learning a more “compact”, general and transferable domain model • (Became basis of Anna Rafferty’s masters thesis)

Data Mining-Data Shop Offerings Tomorrow Lectures in 3501 Newell-Simon Hall, activities here (Wean 5202) 1. Educational data mining overview & introduction to using the DataShop • Follow-up activities: • Exercise in using DataShop for exploratory data analysis • Use tutor/course that generated target data set. Begin data export, data scrubbing, exploratory data analysis 2. Learning from learning curves: Item Response Theory, Learning Factors Analysis 3. Other data mining techniques: Learning decomposition, causal models with Tetrad • Define metrics to address driving question, begin analysis

Questions?

What’s next? • Tomorrow: • Do you know which offerings you will go to tomorrow? • Any conflicts -- two you want to go to that are at the same time?

END

Data mining with DataShop