Identifying Students’ Gradual Understanding of Physics Concepts Using TagHelper Tools

Identifying Students’ Gradual Understanding of Physics ConceptsUsing TagHelper Tools Nava L. Livne nlivne@aoce.utah.edu Oren E. Livne olivne@aoce.utah.edu University of Utah PSLC Summer School, June 21, 2007

Driving Research Question Can a machine identify students’ gradual understanding of physics concepts? Hypothesis - IBAT learning model: Students learn in four stages. Student Conceptual Learning Transfer principles to complex scenarios Advanced principles Basic notions Ignoring irrelevant data Time PSLC Summer School, June 21, 2007

Outline • Data Collection • Students’ constructed responses to physics questions • Human teacher response classification = the reference for analysis • Data Analysis • TagHelper Tools • Discriminatory classifiers: Naïve Bayes, SMO • User-defined features • Results • Discussion • How well do TagHelper Tools delineate the four stages of students’ conceptual understanding? • Lessons Learned from the Summer School & TagHelper Tools PSLC Summer School, June 21, 2007

Data Collection • Data unit = student constructed response to open-ended physics question: • “Acceleration is defined as the final amount subtracted from the initial amount divided by the time.” 840 student responses collected DevelopmentSet: 420 randomly selected responses ValidationSet: the other 420 responses • Responses were classified by human teachers into 55 concepts, aggregated into four main categories. • Irrelevant • Basic notions: e.g. no gravity in vacuum, definition of force • Advanced principles: e.g. zero net force [implies body at rest] • Complex scenarios: e.g. man drops keys in an elevator PSLC Summer School, June 21, 2007

Data Analysis: Rationale • TagHelper Tools can analyze any text response; which algorithm and option set is best for this type of data set? • Objective: detect four ordered stages use a discriminatory classifier • Naïve Bayes: uses cumulative evidence to distinguish among records • Support Vector Machines (SMO): finds distinguished groups in data • Models must exhibit reasonable predictions for both the training and validation sets to ensure reliability • User features should mainly delineate among scenarios • ANY ( EGG , CLOWN ) • ALL ( PUMPKIN , ANY ( PERSON , MAN ) ) • ANY ( KEYS , ELEVATOR) • Shooting for reliability index κ ~ 0.6-0.7 PSLC Summer School, June 21, 2007

Data Analysis: Models • Best models • Model A: Naïve Bayes, no POS, no user-defined features • Model B: Naïve Bayes, no POS, with user-defined features • Model C: SMO, no POS, exponent = 2.0, no user-defined features • Model D: SMO, no POS, exponent = 2.0, with user-defined features • Procedure • Models were trained on the development set using cross-validation • Evaluation measures: κ (>0.5), % Correctly Classified Instances (> 60%) • If measures were reasonable, model was further tested on validation set PSLC Summer School, June 21, 2007

Results on Development Set* * The model was trained on the development set by dividing it into 10 chunks and running cross-validation among the chunks. PSLC Summer School, June 21, 2007

Results: Development vs. Validation Set PSLC Summer School, June 21, 2007

Discussion #1 • Best model was Naïve Bayes with no user-defined features; it had the lowest κ for the development set, but the highest prediction for the validation set and uniform overall performance. • Watch out for and optimize development/validation tradeoff • Why didn’t the models generalize well? This may be due to the large skew of the data, causing a large variability even between the development and validation sets. Data skew is evident when optimizing the SMO exponent (for non-skewed data, the optimal exponent=1; here it is 2). This may also be the reason why SMO was not superior to NB. • Check data skew (indicated by optimal SMO exponent not equal to 1) • Analysis on the non-aggregated 55 concepts resulted in a higher κ = 0.61, however the confusion matrix is much larger. Difficult to interpret errors. • Strive for a small number of distinct categories PSLC Summer School, June 21, 2007

Discussion #2: Error Analysis • Error analysis provides a fine-grained perspective of the data and sheds light on the characteristic error patterns made by TagHelper. • Identify large entries in the confusion matrix • Look at response examples that represent dominant error types • Design user features to eliminate the errors Notation: I = Irrelevant responses B = Basic notions A = Advanced principles T = transfer to complex scenarios PSLC Summer School, June 21, 2007

Summary • In short, the answer to the driving research question is • YES, A MACHINE CAN IDENTIFY STUDENTS’ GRADUAL LEARNING IN PHYSICS. • Students develop their conceptual understanding in physics in four stages, that correspond to the four categories found in the data (see page 2): • Learning to ignore irrelevant data and focus on the relevant knowledge components • Getting familiar with basic notions. • Learning advanced principles that use the basic notions. • Transfer of the principles to complex real-life scenarios. Each scenario is likely to involve multiple principles. PSLC Summer School, June 21, 2007

Lessons Learned • TagHelper Tools can distinguish between different data categories that represent different knowledge components. • There is a trade-off between fitting to training set and performance on validation set. We chose the model that optimized this trade-off. • The quality of conclusions is limited by the quality of the data. In our case the model validation was reasonable, because the responses were drawn from multiple students but the individual students were not indicated. • TagHelper Tools is a state-of-the-art machine learning framework, but its analysis is limited to identifying structured patterns within its feature space. The default feature space includes simple patterns only, but adding creative user features is the key to making TagHelper Tools even more powerful. • Future directions may generalize TagHelper Tools to more flexible types of structural text patterns and incorporating imported data from other parsers (e.g. mathematical expression parsers). PSLC Summer School, June 21, 2007

Identifying Students’ Gradual Understanding of Physics Concepts Using TagHelper Tools