440 likes | 583 Vues
Speaking while monitoring addressees for understanding. Seminar „Gaze as function of instructions - and vice versa “. Herbert H. Clark and Meredyth A. Krych. Torsten Jachmann 16.12.2013. Research Question. Speaking and listening in dialog Unilateral
E N D
Speaking while monitoring addressees for understanding Seminar „Gaze asfunctionofinstructions - andviceversa“ Herbert H. Clark andMeredyth A. Krych TorstenJachmann 16.12.2013
Research Question • Speaking and listening in dialog • Unilateral • Speakers and listeners act autonomous • No interaction • Bilateral • Speakers and listeners monitor their respective partner • Joint activity • What do speakers monitor? • How do they use that information?
Grounding • Level 1 • Attend to vocalization • Level 2 • Identify words, phrases and sentences • Level 3 • Understand the meaning • Level 4 • Consider answering
Grounding A: Where you there when they erected the new signs? B: Th… which new signs? (Level 3) A: Little notice boards, indicating where you had to go for everything B: No. Bilateral account
Monitoring • Voices • Attendance to partners utterances • Faces • Gazeand facial expressions as indicator for understanding • Workspaces • Region in front of the body • Manual gestures (but also games, etc.)
Monitoring • Bodies • Head and torso movement as indicator • Shared Scenes • Scenery beyond workspace • Signals vs. Symptoms • Signals are constructed to get meaning across • Symptoms are not intentionally created
Least joint effort • Opportunistic • Selection of the available methods that take the least effort to produce • “Tailored” • Overhearers (not monitored by speaker) may misunderstand utterances
Method • Pairs of directors and builders • 76 students (34 male / 42 female) • Instructions to build 10 simple Lego Models • 2 x 2 design(interactive) • 28 pairs • Additional non-interactive condition • 10 pairs • Video and audio analyses
Interactive • Mixture model • Workspace (between subject) • Visible • Invisible • Faces (within subject) • Visible • Invisible • No restrictions in time and talk
Non-interactive • Only one condition • Director records instructions • No time or talk constrains • Prototype can be examined as long as wanted before recording • Builders listen to instructions • No constrains on actions • Start, stop, rewind
Results • Efficiency • Turns • Gestures and grounding • Deictic expressions • Gestures by addressees • Cross-timing of actions • Timing strategies • Visual monitoring
Efficiency • Visibility of workspace improves efficiency
Efficiency Non-interactive • Time needed to build much longer (245s “n-i” vs. 183s “i”) • Strong drop in accuracy • Inadequate instructions
Turns • Fewer SPOKEN turns of builder when workspace is visible
Deictic expressions • Mainly unusable when workspace hidden • Joint attention needed • only referring to before mentioned situation
Gestures by addressees • Mostly accompanied by deictic utterances (if any) • Explicit verdict usually only on such utterances (otherwise continuing)
Cross-timing • Gestural signals • Reflect understanding at that moment
Cross-timing • Overlapping signals • Usually not in spoken dialog • Start with “sufficient information”
Cross-timing • Projecting • Prediction of following actions/instructions
Cross-timing • Initiation time • Waiting for partner to be able to attend the following utterance
Cross-timing • Time uptake • Responses have to be timed exactly to the action and situation
Timing strategies • Self-interruption • Dealing with evidence from the addressee • Usually not continued
Timing strategies • Collaborative references • Deictic references rely on addressees actions
Visual monitoring • Mainly used when director reaches a problem • Eye gaze as support
Conclusion • Grounding is fundamental • Visible workspace enhances grounding speed • In task-oriented dialogs faces are not important • Compensation possible (only if any monitoring is available)
Conclusion • Updating common ground • Increments are determined jointly • Much evidence for bilateral account • Addressees provide statement about current understanding • Speakers monitor to update and change utterances
Conclusion • Opportunistic process • Offering options • Self-interruptions • Waiting • Instant revision • Multi-modal process • Speech and gestures are combined if possible • Speech alone takes more time
Remarks • Gaze only important for certain types of tasks • Measurement of time maybe outdated (“old” study) • No contradicting studies (To some extend commonsense)
Gaze and Turn-TakingBehavior in CasualConversation Interactions KristiinaJokinen, HirohisaFurukawa, MasafumiNishidaandSeiichi Yamamoto
Differences • Three-party dialogue • No instructional task • Stronger focus on eye gaze
Research Question • How well can eye gaze help in predicting turn taking? • What is the role of eye gaze when the speaker holds the turn? • Is the role of eye gaze as important in three-party dialogs as in two-party dialogue?
Hypothesis • In group discussions, eye gaze is important in turn to management (especially in turn holding cases) • The speaker is more influential than the other partners in coordinating interactions (selects the next speaker)
Method • Three-person conversational eye gaze corpus • Natural conversations • Balanced familiarity (50% familiar; 50% unfamiliar) • Balanced gender (male-only; female-only; mixed)
Method • 28 conversations among Japanese students in their early 20’s with three participants each • Each conversation about 10 minutes • Eye gaze recorded for one participant
Method • Eye tracker fixed on table to remain naturalness
Used data • Estimated at the last 300ms of an utterance if followed by a 500ms pause
Used data • Dialog acts • Speech features • Values of F0, etc. • Eye gaze
Conclusion • Speaker signals whether he intends to give the turn or hold it by using eye gaze • fixating listener vs. focusing attention somewhere • Eye gaze in multi-participant conversation as important as in two-participant conversations
Conclusion • Eye gaze is used to select next speaker (seems to be correct) • Maybe Japanese data interferes with value of speech data • Comparison Study? • Listeners focus on speaker not vice versa
Remarks • Vague information and data presentation • Although various data exists, interaction of factors is not presented • Some conclusions rely on the before mentioned point • Setup only takes one participant in consideration • Much of the data was unused • Lack in quality and way of creation
Remarks • Study is based on data for another study • Setup is not optimal • Realistic design • Yet, contains biasing flaws (situation of the participants, only one eye tracker)
Comparison • Clark and Krych present interesting ideas but eye gaze is only rarely handled • How could this be altered? • Jokinen et al. focus on eye gaze in a (more or less) natural situation but lack in scientific results and setup • What points and ideas of this setup could be beneficial?