14th July 2010 Uppsala, Sweden

Ryu Iida Shumpei Kobayashi Takenobu Tokunaga Tokyo Institute of Technology {ryu-i,skobayashi,take}@cl.cs.titech.ac.jp Incorporating Extra-linguistic Information into Reference Resolution in Collaborative Task Dialogue 14th July 2010 Uppsala, Sweden ACL 2010

Research background • The task of identifying reference relations including anaphora and coreference within texts has received a great deal of attention in NLP • Research trends for reference resolution have drastically shifted from hand-crafted rule-based approaches to corpus-based approaches • Many researchers have examined ways for introducing various linguistic clues(Ge et al. 1998, Soon et al. 2001, Ng and Cardie 2002, Yang et al. 2003, 2005, Poon and Domingos, 2008, etc.)

Typical problem setting of reference resolution • Annotated data sets provided by Message Understanding Conference (MUC) and Automatic Content Extraction (ACE) • Limited version of coreference; relations where expressions refer to named entities  More information extraction-oriented • Coreference task as defined by MUC and ACE is geared toward only identifying coreference relations anchored to an entity within the text

Treatment of referential behavior in language generation community • Investigations of referential behaviourin real world situations (Di Eugenio et al. 2000, Byron 2005, van Deemter 2007, Foster 2008, Spanger et al. 2009) • applications: e.g. human-robot interaction • Spanger et al. (2009): dialogues of two participants collaboratively solving Tangram puzzle • Corpus includes extra-linguistic information synchronised with utterances (e.g. operations on the puzzle pieces) • They revealed that multi-modal perspective of reference is needed for more practical reference understanding

Challenging issue • Create a model bridging a referring expression in text and its object in real world • Focus on incorporating extra-linguistic information into existing corpus-based approach • Target corpus: Spanger et al. (2009)’s REX-J corpus

Table of contents • Research background • Collaborative work dialogue corpus: REX-J corpus • Reference resolution model and use of extra-linguistic information • Empirical evaluation • Summary and future work

REX-J corpus (Spanger et al. 2009) • Collaborative work dialogues in Japanese for solving Tangram puzzle • Operations to solve the puzzle and situations updated by a series of operations are recorded by a puzzle simulator on computer • Relationship between referring expressions and their referents on a computer display is manually annotated

Screenshot of Tangram simulator Working area Goal shape area 3 operations on puzzle pieces:move, rotate,flip Positions of every piece and every action are recorded at intervals of 10 msec

Experimental environment • Share only working area and linguistic information indialogue • Two different roles: “solver”and “operator” cannot see the goal shape can see a certain goal shape operator solver cannot manipulate pieces can manipulate pieces

REX-J Corpus: statistics • Recruited 12 Japanese graduate students • 6 pairs * 4 different goal shapes  24 dialogues

Table of contents • Research background • REX-J corpus • Reference resolution model and use of extra-linguistic information • Empirical evaluation • Summary and future work

Task definition • Task: select a piece out of a fixed set of pieces given a referring expression by referring to both preceding utterances and series of the recent operations 1 Operation history Time piece operation ．．． 12:01:03 1 rotate 12:01:05 3 move 12:01:10 6 move 12:01:12 6 rotate 2 3 5 4 6 7 utterances no antecedent in preceding utterances …A： move it more to the right. B： which triangle? Is this? referent of ‘it’: piece 6

Ranking model to identify referents • Machine learning-based approaches (Soon et al. 2001, Ng and Cardie 2002, etc.) • Take into account linguistic factors: relative salience • Ranking candidate antecedents in preceding discourse(Iida et al. 2003, Yang et al. 2003, Denis and Baldridge 2008) • Denis and Baldridge (2008) reported appropriately constructing a model for ranking all candidates achieved better performance than pairwise ranking. • Adopt a ranking-based model in which all candidates compete with one another • Use ranking SVM instead of Maximum Entropy

Extra-linguistic information (1/2):history of mouse movement • Current position of mouse cursor and history of mouse movements • Represent the temporal salience of participant’s focus of attention and its transition 1 3 2 4 mouse cursor 5 7 6

Extra-linguistic information (1/2):Action history feature • mouse cursor was over a piece (i.e. a candidate referent) at the beginning of uttering a RE • a piece is the last piece that mouse cursor was over • time distance after mouse cursor was over a piece: x <10 sec / 10 sec ≤ x < 20 sec / 20 sec ≤ x • mouse cursor is never over a piece in the preceding utterances

Extra-linguistic information (2/2):history of series of operations • Recently manipulated pieces tend to be paid more attention than the other pieces Operation history 1 Time piece operation ．．． 12:01:03 1 rotate 12:01:05 3 move 12:01:10 2 move 12:01:12 2 rotate 3 2 4 5 7 6

Extra-linguistic information (2/2):Current operation feature • a piece is being manipulated at the beginning of utteringa RE • a piece is the most recently manipulated piece • time distance after a piece was most recently manipulated: x <10 sec / 10 sec ≤ x < 20 sec / 20 sec ≤ x • a piece has never been manipulated

Table of contents • Research background • REX-J corpus • Reference resolution model and use of extra-linguistic information • Empirical evaluation • Summary and future work

Empirical evaluation • Investigate the impact of the extra-linguistic information • Data set: referring expressions in REX-J corpus (2,048 referring expressions in 40 dialogues) • 13 expressions are excluded • Expressions referring to more than one object • Vague expressions • E.g. “biggest triangle” in the situation where there are two biggest triangles on the display • 2,035 expressions are used on 10-fold cross-validation

Two models • Pronouns are likely to be more directly associated with actions pointing to a piece • Denis and Baldridge (2008) • the size of training instances is relatively small, the models induced by learning algorithms should be separately created with regards to distinct features • Separated model • Create two rankers; learn pronouns and non-pronouns independently • Pronoun model: use the training instances whose REs are pronouns • Non-pronoun model: use all other training instances • Combined model • Create one ranker; induced from all training instances

Features • 3 types of features • Action history features • Current operation features • Discourse history features • Acquired from the expressions of a given referring expression and its candidate antecedent in the preceding utterances • e.g. a piece is referred to by the most recent RE case makers (o (accusative) or ni (dative)) follow RE • Baseline model: use only discourse history features

Results

Results 0.227 0.004 Pronouns are more sensitive to the usage of the action history features

Results Partially overlapped Other current operation features may have bad effects for ranking referents due to their ill-formed definitions

Summary and future directions [Summary] • We demonstrated our first result of incorporating extra-linguistic clues into a corpus-based approach to reference resolution • The performance increased by at most 12 points in comparison to the baseline model. • extra-linguistic information in this domain are useful [Futurework] • Explore the effect of other extra-linguistic information • e.g. eye-gaze information • Investigate general aspect between REs and their objects; • Further evaluation based on the different multimodal tasks

14th July 2010 Uppsala, Sweden