1 / 30

Student simulation and evaluation

Student simulation and evaluation. DOD meeting Hua Ai (hua@cs.pitt.edu) 03/03/2006. Outline. Motivations Backgrounds Corpus Student Simulation Model Comparisons Conclusions & Future Work. Motivations. For larger corpus

Télécharger la présentation

Student simulation and evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Student simulation and evaluation DOD meeting Hua Ai (hua@cs.pitt.edu) 03/03/2006

  2. Outline • Motivations • Backgrounds • Corpus • Student Simulation Model • Comparisons • Conclusions & Future Work

  3. Motivations • For larger corpus • Reinforcement Learning (RL) is used to learn the best policy for spoken dialogue systems automatically • Best strategy may often not even be present in small dataset • For cheaper corpus • Human subjects are expensive

  4. Dialog Manager Simulated User Reinforcement Learning Strategy Dialog Corpus Simulation models Strategy learning using a simulated user (Schatzmann et al., 2005)

  5. Backgrounds (1) • Education community • Focusing on changes of student’s inner-brain knowledge representation forms • Usually not dialogue based • Simulated students for (Venlehn et al., 1994) • tutor training • Collaborative learning

  6. Backgrounds (2) • Dialogue community • Focusing on interactions and dialogue behaviors • Simulated users have limited actions to take • (Schatzmann et al., 2005) • Simulating on DA level

  7. Corpus (1) • Spoken dialogue physics tutor (ITSPOKE)

  8. (T) Question (T) Question (S) Answer (S) Answer Dialogue (T) Q (S) A … Dialogue (T) Q (S) A … Essay revision Essay revision Dialogue Dialogue Corpus (2) 5 problems • Tutoring procedure … …

  9. Corpus (3) • Tutor’s behaviors • Defined in KCD (Knowledge Construction Dialogues) Correct Incorrect/ Partially Correct

  10. Corpus (4) f03:s05 Different groups of subjects

  11. Simulation Models (1) • Simulating on word level • Student’s have more complex behaviors • DA info alone isn’t enough for the system • Two models trained on two corpus 03ProbCorrect ProbCorrect f03 03Random 05ProbCorrect Random s05 05Random

  12. Simulation Models (2) • ProbCorrect Model • Simulates average knowledge level of real students • Simulate meaningful dialogue behaviors • Random Model • Non-sense • As a contrast

  13. Real corpus question1 Answer1_1 (c) Answer1_2 (ic) Answer1_3 (ic) question2 Answer2_1 (c) Answer2_2 (ic) Candidate Ans: For question1 c:ic = 1:2 c: Answer1_1 ic: Answer1_2 Answer1_3 For question2 c:ic = 1:1 c: Answer2_1 ic Answer2_2 • ProbCorrect Model: • Question 1 • Answer: • Choose to give a c/ic answer with the same average probability as real student • Randomly choose one answers from the corresponding answer set ProbCorrect Model

  14. HC03&05 Question1 Answer1_1 Answer1_2 Answer1_3 Answer1_4 Question2 Answer2_1 Answer2_2 Candidate Ans: 1) Answer1_1 2) Answer1_2 3) Answer1_3 4) Answer1_4 5) Answer2_1 6) Answer2_2 Big random Model: Question i: Answer: any of the 6 answers with the same probability (Regardless the question!) Random Model

  15. Experiments • Comparisons between real corpora • Comparisons between real & simulated corpora • Comparisons between simulated corpora

  16. Real Corpora Comparisons (1) • Evaluation metrics • High-level dialog features • Dialog style and cooperativeness • Dialog Success Rate and Efficiency • Learning Gains

  17. Real corpora comparisons (2) • High-level dialog features

  18. Real corpora comparisons (3) • Dialogue style features

  19. Real corpora comparisons (3) • Dialogue success rate

  20. Real corpora comparisons (4) • Learning gains features

  21. Results • Differences captured by these simple metrics can’t help to conclude whether a corpus is real or not (Schatzmann et al., 2005) • Differences could be due to different user population

  22. Real Vs Simulated Corpora Comparisons

  23. Results (1) • Most of the measurements are able to distinguish between Random and ProbCorrect model • ProbCorrect model generates more realistic behaviors • We can’t conclude on the power of these metrics since the two simulated corpus are really different

  24. Results (2) • Differences between real and random models are captured clearly, but differences between real and ProbCorrect is not clear • We don’t expect this simple model to give very real corpus. It’s surprising that the differences are small

  25. Results (3) • S05 variety > f03 variety  05probCorrect variety > 03probCorrect variety • However, we don’t get significantly more varieties in the simulated corpus than the real ones • Could be the computer tutor is simple (c/ic) • We’re using the same candidate answer set

  26. Results (4) • ProbCorrect models trained on different real corpora are quite different • The ProbCorrect model is more similar to the real corpus it is trained from than to the other real corpus

  27. Comparisons between simulated dialogues with different dialogue structure

  28. Results • Larger differences between the two simulated corpora in prob7 than in prob34 • Dialogue structure of prob34 is more restricted • The power of these simple metrics is restricted by the dialogue structure

  29. Conclusions • The simple measurements can distinguish between • real corpora • Different population • simulated and real corpora • To different extent • simulated corpora • Different models • Trained on different corpora • Limited to different Dialog structure

  30. Future work • Explore “deep” evaluation metrics • Test simulated corpus on policy • More simulation models • More human features • Emotion, learning • Special cases • Quick learners, slow learners

More Related