240 likes | 369 Vues
Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry learning processes. Anjo Anjewierden, Bas Kollöffel and Casper Hulshof. Anjo Anjewierden hdddddtp://anjo.blogs.com. Department of Instructional Technology
E N D
Towards educational data mining: Using data mining methods for automated chat analysis to understand and support inquiry learning processes Anjo Anjewierden, Bas Kollöffel and Casper Hulshof AnjoAnjewierden hdddddtp://anjo.blogs.com Department of Instructional Technology Faculty of Behavourial Sciences University of Twente The Netherlands
Overview (1) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Motivation • Chats can structure collaborative learning • Doing vs. doing and discussing with other learners • Current use of chats is limited to • Logging the messages for later analysis • Our goals related to chat analysis • Provide adaptive feedback based on on-line analysis of the chats • Make the learner part of the simulation by visualising her actions and behaviour (e.g. through avatars)
Approach • Define models by which messages can be classified • One model is based on term usage • Another model is based on the grammar • Later we want to combine the models to find "semantic patterns" • Applying the models to each message of a particular chat it can be assigned a class • Aggregation of class assignments over time is what an avatar can visualise
Learning environment • Both learners see the same simulation on two different screens • One learner can run the simulation • Learners use chat to discuss: • Simulations to run, variable settings, etc. • Interpretation of the results of simulations • Which answer to give to a question • etc.
Overview (2) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Classifications of chats • Which functions should we distinguish in chat messages? • We use a classification proposed by Gijlers and De Jong (2005): • Regulative: planning, monitoring, agreeing, etc. • Domain: transformative • Technical: about the learning environment • Social: greetings, compliments and other off-task
Examples • Regulative: • Ok // Yes // Next • I think the answer is 3 • Perhaps we should try again • Domain: • The momentum becomes negative • Speed of the red ball is 2 m/s • Technical: • Move the mouse to the right • Social: • Well done partner
Data used • Chats collected by Nadira Saab for her Ph.D. research (University of Amsterdam, 2005) • Domain: simulations related to collisions (e.g. momentum for elastic and inelastic collisions) • Language: Dutch • 78 chat sessions • 16879 chat messages
Data normalisation • Messages are extremely noisy • Misspellings (accidental and on purpose) • Chat language (w8 = wait) • See paper for Dutch examples • Messages have been manually corrected to obtain words that can be found in the dictionary • Grammar has not been corrected
Overview (3) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Types of features • For each class one can define • Characterising terms (domain: speed, increases) • Grammatical patterns: • the speed increases (<article> <noun> <verb>) • I think (<personal pronoun> <verb>) • Both terms and syntactic patterns are used by humans to classify the messages • Data mining • Discover the terms and patterns automatically
Words as features • Each word in a message is a feature • Order is not taken into account • Smileys, !, ?, integers are separate words • Example • The answer is 5!!!! :-) • Features: { answer, is, the, #, !, <smiley> } • (where # is any integer)
Grammar as features • Each message is parsed by a part-of-speech (POS) tagger • Determines role words play in a message (noun, verb, etc.) • POS-sequences are a feature, if: • They occur at least 20 times, and • They do not fully overlap a longer sequence • Example: • the speed: {<article>, <noun>, <article> <noun>} • Remove full overlaps: {<article> <noun>}
Naive Bayes classifier • Standard Naive Bayes classifier is used • Once for the word features • Once for the grammar features • See paper for technical details
Overview (4) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Experiment • Four researchers each classified 400 messages • Randomly selected with a bias towards longer messages (nearly all short messages are regulative) • 1280 unique messages were classified • Expert manually checked whether the classifications were "correct" • Result was used to create two classification models (words, grammar) using Naive Bayes
Overview (5) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Overview (6) • Motivation • Classification of educational chats • Methods for automated analysis • Experiment • Results • Conclusions
Conclusions • Automatic classification of messages • Naive Bayes works surprisingly well • Even for a small feature set per item (chat) • And for a large number of features over all items • Sufficiently accurate for • The classes we used • Visualising aggregated learner behaviour through avatars • Misspellings are a source of concern
Future work • Combining manual and automatic classification • Started: see interaction classification tool • Can speed up chat coding in general (also for research) • Find "semantic patterns" in chats • Based on combining information from the word and grammar models • Relate these "semantic patterns" to learner actions in the simulation environment
Thank you! • And thanks to • Nadira Saab • Hannie Gijlers • Petra Hendrikse • Sylvia van Borkulo • Jan van der Meij • Wouter van Joolingen • and the anonymous reviewers