1 / 17

Applying RL to Take Pedagogical Decisions in Intelligent Tutoring Systems

Applying RL to Take Pedagogical Decisions in Intelligent Tutoring Systems. Ana Iglesias Maqueda Computer Science Department Carlos III of Madrid University. Content. Intelligent Tutoring Systems (ITSs) Definition Problems Aims Reinforcement Learning (RL) Proposal

sibley
Télécharger la présentation

Applying RL to Take Pedagogical Decisions in Intelligent Tutoring Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying RL to Take Pedagogical Decisions in Intelligent Tutoring Systems Ana Iglesias Maqueda Computer Science Department Carlos III of Madrid University

  2. Content • Intelligent Tutoring Systems (ITSs) • Definition • Problems • Aims • Reinforcement Learning (RL) • Proposal • RL Application in ITSs • Working Example • Conclusions and Further Reseach

  3. Intelligent Tutoring Systems(ITSs) • Intelligent Tutoring Systems (ITSs): “computer-aided instructional systems with models of instructional content that specify what to teach, and teaching strategies that specify how to teach” [Wenger, 1987]. ITSs Aim RL RL in ITS

  4. ITS Modules (Burns and Capps, 1988) Domain Knowledge Student Knowledge Instructional content What to teach KNOWLEDGE TREE Student Learning Characteristics Domain Module Student Module How to teach it PEDAGOGICAL STRATEGIES Pedagogical Module Pedagogical Knowledge Interaction with the student Interface Student ITSs Aim RL RL in ITS

  5. ITS. Knowledge Tree Database Design ........ Definition Sub - topics Examples Problems Exercises tests ....................... ..... 1 1 1 1 Def Def Exer Exer ..... ..... 1 n 1 n . Conceptual Design: E/R Model Logical Design: Relational Model ........ Basic Elements Examples Problems Exercises tests Definition ........ Def SubT T .... ..... ... Def Def Exer Exer 1 1 1 1 ..... ..... 1 n 1 n 1.n 1.n T T 1 n ....................... Binary Relationships Entities Attributes ........ Def SubT T Def Examples Def Examples Def 1.1 Def 1.1 Cardinality Conectivity Def 1.1 Def 1.1 Def 1.1 Def 1.1 ..... Degree ..... ..... 1 n 1 n 1 n Ex. Subt . Ex. Def . Def . Ex. Def . N:M 1:N Ex. . Def Test. Test. Def . Ex Def.1 Def.2 Test1 Test.2 Ex.1 Test.3 ITSs Aim RL RL in ITS

  6. ITS. Knowledge Tree. E/R

  7. ITS. Pedagogical Strategies (PS) • Specify [Murray, 1999] : • how the content is sequenced • what kind of feedback to provide, • when & how to show information (when to summarise, explain, give an exercise, definition, example, etc.) • Problems [Beck, 1998]: • To encode them • A lot of them • to incorporate all the experts knowledge • ¿ How many strategies are necessary ? • Differences among them • The moment to apply them • ¿ Why they fail ? ¿ how to solve it ? ITSs Aim RL RL in ITS

  8. Aims • To eliminate the pre-defined PS • Tutor learn to teach effectively • Representing the pedagogical information based on a RL model • what, when and how to show the content • Adapting to students needs in each moment • Based only in adquired experience at the interaction with others students with similar learning characteristics ITSs Aim RL RL in ITS

  9. T a s I i Agent R r Reinforcement Learning (RL) • Definition [Kaelbling et al., 1996] : • An agent is in a determinated state (s) • The agent execute an action(a) • The execution produce a state transaction (T)to an other state (s’) • The agent perceive the current state by the perception module (I) • The environment provide a reinforcement signal (r) to the agent • The agent aim is to maximice the long-run reward ITSs Aim RL RL in ITS

  10. .... 0 1 0 0 1 1 .... Relationship Cardinality Degree Connectivity 1:N N:M Proposal. RL Components (1/3) • Agent --> ITS • Set of states (S) • Set of actions (A): To show items ....................... Binary Relationships ........ SubT Conectivity Cardinality Degree Subt . Ex. Def . Ex. Def . Def . Ex. N:M 1:N Ex. . Def Test. Test. Def . Ex Def1 Def2 Test1 Test.2 Ex.1 Test.3 A1 = to show Def.1 = {def1} A2 = {def2} A3 = {ex1} A4 = {def1 + ex1} .... ITSs Aim RL RL in ITS

  11. Proposal. RL Components (2/3) • Perception of the environment (I: S  S): • How the ITS perceives the knowledge student state. • Evaluating his/her knowledge by tests. • Reinforcement(R: SxA  R): • Reinforcement signals provided by the environment • maximun value upon arriving to the ITS goals. ITSs Aim RL RL in ITS

  12. + = g max Q ( s ' , a ' ) r Q ( s , a ) a ' Proposal. RL Components (3/3) • Value-action function (Q: SxAx R): • Estimates de usefulness of executing an action when the agent is in a determinated state. • ITS aim: to find the maximum value of Q function. • Algorithm: Q-learning (determinist) [Watkins, 1989]: where  is the discount parameter in future actions ITSs Aim RL RL in ITS

  13. Proposal. Q-learning

  14. .... 1 1 1 1 1 1 .... .... 0 1 0 0 1 1 .... .... 0 1 0 0 1 1 .... .... 0 1 0 0 1 1 .... Relationship Cardinality Degree Conectivity 1:N N:M Q(s,a) A1 A2 A3 A4 S 0,8 0,8 0,8 0,8 Goal 0.0 0.0 0.0 0.0 Proposal. Example (1/2) A4 = {def1+ex1} Q(s,A4) = 0,8 A1 = {def1} Q(s,A1) = 0,8 S A4 = {def1+ex1} Q(s,A4) = 0,8 S Goal A3 = {ex1} Q(s,A3) = 0,8 S A2 = {def2} Q(s,A2) = 0,8 A4 = {def1+ex1} Q(s,A4) = 0,8 ITSs Aim RL RL in ITS

  15. = - + g ( size ( a ) 1 ) size ( a ) max (1) g Q ( s , a ) r Q ( s ' , a ' ) a ' = - + = 1 1 1 (4) Q ( S , A 2 ) 0 . 9 * 1 0 . 9 * max { 0 , 0 , 0 , 0 } 1 a ' Proposal. Example (2/2) • Let us suppose • r = 1 if s’= goal 0 if s’  goal. •  = 0,9 • Example • Student 1: • A1 action is randomly chosen: • A4 is executed next: • Student 2: • A2 is randomly chosen: = + = 1 Q ( S , A 1 ) 0 0 . 9 * max { 0 . 8 , 0 . 8 , 0 . 8 , 0 . 8 } 0 . 72 (2) a ' = - + = 2 1 2 (3) Q ( S , A 4 ) 0 . 9 * 1 0 . 9 * max { 0 , 0 , 0 , 0 } 0 , 9 a ' ITSs Aim RL RL in ITS

  16. Conclusions • To eliminate the pre-defined PS • System adapts to student • in real time: by trial and error, • based only on previous information of interactions with other students with similar characteristics • General technique • domain independent

  17. Further Research • Experiments • Implement the theorical model • Test the ITS with real students • Validate the model • Others • Classify students • Use hierarchical RL algorithms • Use planning

More Related