Automatic Feedback System for Short Essay Answers

Supportive Automatic Feedback for Short Essay Answers debora.field.work@gmail.com Did I really mean that? Applying automatic summarisation techniques to formative feedback Debora Field, Stephen Pulman Nicolas Van Labeke, Dept Computer Science, University of Oxford, UK Denise Whitelock, John T.E. Richardson Institute of Educational Technology, The Open University, UK Background material supporting a poster paper at RANLP 2013

Overview • We use automatic summarisation methods (following (Mihalcea& Tarau, 2005, 2004)) to create input for a web-based essay feedback system • The aim of our fully operational OpenEssayist formative feedback system is to improve the confidence and skills of users by promoting self-directed learning through metacognition • The prototype has been prepared by working with a corpus of 267 real student essay answers to the same assignment question: • "Write a report about the main accessibility challenges for disabled learners that you work with or support inyour own work context(s)" • Essay topics, content, structure, presentation, and formatting were not prescribed or guided and vary greatly across the corpus • OpenEssayistwill be the subject of two full-scale empirical evaluations using real students starting in September 2013 • Our main claim is the application and adaptation of graph-based ranking methods for a novel purpose

NLP Prep and Structure Rec • First do the necessary NLP pre-processing: • Tokenise (paras, sents, words), POS tag, remove punctuation and stop words, lemmatise, store original surface form for feedback to student. • Identify which parts of the essay should be used in the graphs • Key sentences should be only those that were composed by the author and that are true sentences (not headings, refs, assignment question…) • Key word scores should not be skewed by table of contents entries, etc. • Label each sentence with its structural role, currently including: • Table of contents entry, abstract, introduction, discussion, conclusion, appendix, bibliography, title, special heading, general heading, caption, table entry, list item, sentence quoted from the assignment question • The manually-crafted structure rec rules employ these features: • Overall sentence position, sentence position relative to particular items (other paragraphs, sentences, particular structural roles…), sentence and paragraph length, the inclusion of particular terms in the sentence debora.field.work@gmail.com

6 0 0.639009650423 7 0 0.0980580675691 7 6 0.0716114874039 8 0 0.200445931434 8 6 0.243975018237 8 7 0.157242725508 11 10 0.666666666667 12 0 0.176776695297 12 8 0.0944911182523 12 10 0.288675134595 12 11 0.288675134595 12 6 0.129099444874 13 0 0.176776695297 13 6 0.129099444874 13 7 0.138675049056 13 8 0.0944911182523 13 10 0.288675134595 13 11 0.288675134595 13 12 0.5 14 10 0.333333333333 14 11 0.333333333333 14 12 0.288675134595 14 13 0.288675134595 15 0 0.566946709514 15 6 0.414039335605 15 7 0.222374794998 15 8 0.252538136138 15 12 0.133630620956 15 13 0.133630620956 16 0 0.25 16 6 0.182574185835 16 7 0.196116135138 16 8 0.334076552391 16 12 0.176776695297 16 13 0.176776695297 16 15 0.377964473009 17 0 0.353553390593 17 6 0.258198889747 17 8 0.188982236505 17 10 0.288675134595 Key Word Extraction A • The derived lemmas become vertices in a graph • Adjacent words are represented by links (edges) between vertices • A centrality algorithm traverses the graph to derive the key words • A key word is one that co-occurs (within a window of N words) with lots of words that co-occur with lots of words that co-occur... • No external trained model or reference source involved • Different from word frequency and from collocation count • Algorithm captures a word's connectedness to the entire text • Two centrality algorithms tried so far: • PageRank (a variant of Eigenvector centrality)—Brin & Page 1998 • BetweennessCentrality—Freeman 1977 • Result is list of key lemmas (lemma score above a certain threshold) and list of ngrams: within-sentence sequences of key words occurring in original text (key word is surface form of key lemma) .1 .5 C B .3 debora.field.work@gmail.com .7 D

debora.field.work@gmail.com Betweenness 1: fox 2: cat 3: kill 4: dog 5: night 6: rare 7: keep 8: attack 9: encounter … 11: small PageRank 1: cat 2: fox 3: dog 4: night 5: kill 6: keep 7: rare 8: attack 9: small ... 34: encounter Frequency 7 fox 7 cat 4 dog 3 kill 2 night 2 rare 2 keep 2 attack 2 small 1 encounter Ranked key lemma results for Illustration 1 edges vertices A subgraph of the graph generated by Illustration 1

PageRank is recursive in such a way that a vertex's score is promoted if it has a high-scoring neighbouring vertex This promotion is ideal for web search but perhaps not for key word search 1 2 Score PR of vertex uequals… (1 minus damping factor d ) plusd times the sum of each • Betweenness Centrality is "the degree to which a point falls on the shortest path between others" (Freeman 1977, p. 35) • The number of shortest paths i to j in the graph that pass through the vertex k • divided by the total number of shortest paths • No promotion by neighbours here. Is it better? Is RAKE (Rose et al. 2010) ? 3 ((Score PR of vertex v that points to vertex udivided by (number of edges that point from vertex v)) debora.field.work@gmail.com

Is PageRank suitable for key word search? • We first wondered about PageRank's suitability for key word search when working with an essay about the Open University. • PageRank ranked 'university' 6thand betweenness ranked it 5th. • PageRank returned 'open' ranked 7th, whereas betweenness ranked 'open' 26th! • 'open''s key-ness score had been strongly promoted by the high score of its neighbour 'university'. • 'open' hardly appears in the essay apart from in the phrase 'Open University' (only 3 out of 25 occurrences). • Is it sensible to infer that an essay that talks a lot about the Open University is also talking a lot about openness? • How would RAKE (Rose et al. 2010) compare? • RAKE uses stop words as phrase delimiters, and whole phrases (as well as individual words) are treated as nodes in the graph. • The score of a node depends on its degree (its immediately neighbouring nodes), so it is more similar to PageRank than betweenness debora.field.work@gmail.com

PageRank is strongly influenced by word frequency • Frequency of occurrence evidently has a stronger influence on keyness scores under PageRank than it does under betweenness. • If we take frequency out of the picture, will that tell us anything about the differences between PageRank and betweenness with respect to key word search? • Randomise word order of one essay and calculate centrality scores • Repeat many times (x 200, for a single essay) • Build a distribution of expected scores, given the frequency of a word in the essay (because word frequency remains constant in each randomisation, while structure and therefore centrality relations change) • Compare randomised distribution with the real scores debora.field.work@gmail.com

Randomised distribution results for top thirty key words for one essay • Red dots: significantly different result (Kolmogorov-Smirnov (K-S) test) Betweenness Centrality PageRank Centrality For 6 words in each chart, the true result is significantly different from its randomised distribution 4 out of 6 for PageRank are significantly lower debora.field.work@gmail.com E8

Key Sentence Extraction • Roughly similar process to key word graph but each sentence becomes a vertex in a graph • Every sentence is compared to every other sentence to derive a similarity score for each pair • Currently using Cosine Similarity(vectors based on co-occurrence) • Edges connect pairs of vertices that have a similarity score > 0 • Similarity score = Strength of connection (edge weight) • TextRankalgorithm (Mihalcea & Tarau 2004) calculates 'global importance' score for each sentence • Uses graph structure (what links to what) and edge weights • Like PageRank but with edge weights in the mix • Result is list of all essay's sentences returned in order of global importance debora.field.work@gmail.com

TextRank Algorithm (Mihalcea & Tarau, 2005, 2004) 1 2 6 Global weight score WS for vertex Viequals… (1 minus damping factor d ) plusd times all of ( 4 global weight score for Vj)) all of (edge weight j-to-idivided by… 3 5 Number of vertex Vj's that point to vertex Vitimes… (Number of vertex Vk's that point out of Vj times edge weight j-to-k) times… debora.field.work@gmail.com • Based on PageRank • WS score of every vertex is first seeded with same value. Then repeated iterations until difference between most recent WS score and WS score at previous iteration is lower than specified tiny threshold

Cosine Similarity illustration • Get two word-tokenised sentences • S1: ['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'cat'] • S0: ['the', 'cat', 'sat', 'on', 'the', 'mat'] • Compare them to make two vectors • (vector = quantity with magnitude & direction) • S1: [('the', 2), ('quick', 1), ('brown', 1), ('fox', 1), ('jumped', 1), ('over', 1), ('lazy', 1), ('cat', 1), ('sat', 0), ('on', 0), ('mat', 0)] • S0: [('the', 2), ('quick', 0), ('brown', 0), ('fox', 0), ('jumped', 0), ('over', 0), ('lazy', 0), ('cat', 1), ('sat', 1), ('on', 1), ('mat', 1)] • Final vectors (11-dimensional in this case): • [2, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0] • [2, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1] • Cosine of the angle between them: 0.533001790889 • Sort of normalised way of comparing content of two Ss of unequal length

References • Brin, S. and L. Page (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Seventh International World-Wide Web Conference (WWW 1998), Brisbane, Australia, 14-18 April 1998. • Freeman, L. (1977). A set of measures of centrality based on betweenness. Sociometry 40: 35–41. • LohmannG., Margulies D.S., HorstmannA., PlegerB., LepsienJ., et al. (2010). Eigenvector Centrality Mapping for Analyzing Connectivity Patterns in fMRI Data of the Human Brain. PLoS ONE 5(4): e10232. doi:10.1371/journal.pone.0010232. • Mihalcea, R. and P. Tarau (2004). Textrank: Bringing order into texts. In L. Dekang and W. Dekai (eds.), Proceedings of Empirical Methods in Natural Language Processing (EMNLP) 2004, pp. 404–411, Association for Computational Linguistics, Barcelona, Spain, July 2004. • Mihalcea, R. and P. Tarau (2005). A language independent algorithm for single and multiple document summarization. Proceedings of the Second International Joint Conference Natural Language Processing (IJCNLP'05), Korea, pp. 602–607, 11–13 October 2005. • Rose, S., Engel, D., Cramer, N., W. Cowley (2010). Automatic keyword extraction from individual documents. In M.W. Berry and J. Kogan (eds.), Text Mining: Applications and Theory, pp. 1–20, Chichester. John Wiley and Sons, Ltd. debora.field.work@gmail.com

Automatic Feedback System for Short Essay Answers

Automatic Feedback System for Short Essay Answers

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7