Matt Huenerfauth

A Multi-Path Architecture for Machine Translation of English Text into American Sign Language Animation Matt Huenerfauth Student Workshop of the Human Language Technologies conference / North American chapter of the Association for Computational Linguistics annual meeting. Boston, MA, USA. May 2, 2004 Computer and Information ScienceUniversity of Pennsylvania Research Advisors: Mitch Marcus & Martha Palmer

Motivations and Applications • One half of Deaf high school graduates (age 18) can read English at a fourth-grade level (age 10). • But most are fluent in ASL. (ASL ≠ English.) • Many accessibility technologies assume English-fluency. • ASL used by 500,000 Deaf people in North America. • Applications for a Machine Translation System: • TV captioning, teletype telephones. • Computer user-interfaces in ASL. • Educational tools, access to information/media.

MT: Input / Output What’s the input? English Text. What’s the output? Less clear… Imagine a 3D virtual reality human being… One that can perform sign language… But this character needs a set of instructions telling it how to move! The task: English  These Instructions. VCom3d

Off-the-Shelf Virtual Humans Photos: Seamless Solutions, Inc.Simon the Signer (Bangham et al., 2000)Vcom3D Corporation

American Sign Language • Sentence without classifier predicate:Which university does Billy attend? wh #BILLY IXx GO-TO UNIVERSITY WHICH • Sentence with classifier predicate:The car drove down the bumpy road past a cat.CAT ClassPred-bentV-{location of cat} CAR ClassPred-3-{drive on bumpy road}

Difficult to Generate but Important The car drove down the bumpy road past the cat. Where’s the cat? The road? The car? How close are they? Where does path start/stop? How show path is bumpy vs. windy vs. hilly? Some English sentences require a classifier predicate to be translated fluently. Spatial prepositions, adverbs, other phrases… Signers use classifier predicates frequently. Depending on genre, one to 17 times per minute.

Initial Approaches to ASL MT Non-statistical Direct and Transfer MT Architectures

Why not Statistical MT? • ASL has no written form. • Corpora is hard to collect, transcribe. • Annotate video: multiple simultaneous channels of face, body, hand, and arm movements. • There’s no training data.

MT Pyramid(Dorr, 1998.) Interlingua Semantic Composition Semantic Decomposition Semantic Structure Semantic Structure Semantic Transfer Semantic Analysis Semantic Generation Syntactic Structure Syntactic Structure Syntactic Transfer Syntactic Analysis Syntactic Generation Word Structure Word Structure Direct Morphological Analysis Morphological Generation Source Text Target Text Machine Translation Pyramid • Options in MT design. • more work • domain size • subtler divergences handled

Direct ‘ASL’ MT Systems • Word-for-sign dictionary look-up system. • Produces Signed English, not ASL. • Definitely can’t generate classifier predicates.

Transfer ASL MT Systems • Syntactically analyze English text before crossing over to ASL. • Capture more divergences, more phenomena • Still can’t handle the complex use of space. • Still can’t generate classifier predicates.

When the going gets tough… …the tough try an interlingua. • Direct or transfer architectures are insufficient. • If not an interlingua, then at least an approach with more spatial knowledge/representation. Of course, there’s a problem. • It’s hard/impossible to build interlinguasystem for an open-ended domain.

Getting by with limited domain? We can identify sentences that need complex translation. (That need classifier predicates.) When do we use classifier predicates? • Locations, orientations, or movements • Spatial verbs, prepositions, adverbs • Concrete or animate entities • Don’t worry about abstractions, beliefs, intentions

“Multi-Path” MT • Only when needed,Use complex, sophisticated MT. Interlingua? • Otherwise,Use simpler easier-to-build MT. Transfer • Use the linguistic ‘breadth’ of one approachand knowledge/spatial ‘depth’ of the other.

“Multi-Path” MT • Only when needed,Use complex, sophisticated MT. Interlingua? • Otherwise,Use simpler easier-to-build MT. Transfer • If all else fails,Use word-for-sign Direct transliteration.

MT Pyramid(Dorr, 1998.) Interlingua Semantic Composition Semantic Decomposition Semantic Structure Semantic Structure Semantic Transfer Semantic Analysis Semantic Generation Syntactic Structure Syntactic Structure Syntactic Transfer Syntactic Analysis Syntactic Generation Word Structure Word Structure Direct Morphological Analysis Morphological Generation Source Text Target Text “Pyramidal” MT • No longer a set of options. • Now a design for a newmulti-path architecture.

MT Pyramid(Dorr, 1998.) Interlingua Interlingual:Spatial Text Semantic Composition Semantic Decomposition Semantic Structure Semantic Structure Semantic Transfer Semantic Analysis Semantic Generation Transfer: Most Sentences Syntactic Structure Syntactic Structure Syntactic Transfer Direct: Unanalyzable Text Syntactic Analysis Syntactic Generation Word Structure Word Structure Direct Morphological Analysis Morphological Generation Source Text Target Text “Pyramidal” MT • No longer a set of options. • Now a design for a newmulti-path architecture.

But what’s our interlingua?And is it really an interlingua?

What do human interpreters do? Listen to English about spatial topics  make 3D mental model of what’s said  produce ASL classifier predicates • Using a spatial representation of reality…

What could a computer do? Computer analyzes English text  build 3D virtual reality of the scene  use VR as basis for generating the spatial classifier predicate movements • University of Pennsylvania AnimNL system: • 3D virtual reality model with characters/objects. • Input: English directions for characters to follow. • Builds animation: characters obey commands. Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000. & Schuler. 2003.

English-Controlled 3D Scene http://hms.upenn.edu/software/PAR/images.html

How it Works English Text  Syntactic Analysis  Select a PAR Template  Fill the PAR Template  “Planning Process”  Animation Output PAR = “Parameterized Action Representation” (on next slide)

Bob tripped on the ball. Bob { ball_1 } {Bob, translate…, rotate…} Specifics of the path taken… End at 6am. …until dawn.…for 3 hours.…rapidly. 3 Hours. …tripped… Accidentally, Rapidly. Accidentally. Planning Operator:Linked to 3D VRAnimated Movements. Parameterized Action Representation participants: [ agent: AGENT objects: OBJECT list ] semantics: [ motion: {Object, Translate?, Rotate?} path: {Direction, Start, End, Distance} termination: CONDITION duration: TIME-LENGTH manner: MANNER ] start: TIME prep conditions: CONDITION boolean-exp sub-actions: sub-PARs parent action: PAR24 previous action: PAR35 next action: PAR64 Planning algorithm works out movement details. This is a subset of PAR info. http://hms.upenn.edu/software/PAR

English-Controlled 3D Scene http://hms.upenn.edu/software/PAR/images.html

Using this technology… http://hms.upenn.edu/software/PAR/images.html An NL-Controlled 3D Scene

Using this technology… An NL-Controlled 3D Scene

Using this technology… Original image from: Simon the Signer (Bangham et al. 2000.) An NL-Controlled 3D Scene Signing Character

“Invisible World” Approach • Tiny 3D virtual reality in front of signer’s hands. • AnimNL: English sentences about locomotion Move invisible objects accordingly • Put hand on top of an object: go along for the ride! We just built a CLASSIFIER PREDICATE.

Classifier Predicate Pathway

Interlingual:Spatial Text Transfer: Most Sentences Direct: Unanalyzable Text

Design Issues and Discussion

Is the VR really an interlingua? • Depends on your definition & how implemented. • Semantic representation: Yes, model for 3D spatial domains. • Useful for translation: We’ve shown how it can be. • World knowledge beyond input semantics: Yes, in that it handles spatial/physical constraints. • Language neutral: 3D coordinates: not just interlingual, it’s non-lingual. But might need other semantic/discourse information…

Other Languages • Alleviates tradeoff: • Domain specificity vs. divergence-handling power. • Use deeper approach in a broad coverage system. • Translate variety of texts but perform deeper processing on certain inputs. • Important or well-understood sentences. • Sublanguage that requires special handling. • Transfer or deeper/interlingual approach for “special” text and resource-lighter approach for the rest.

Mixing Statistical/Symbolic MT • This system had no statistical pathways. • Nothing prevents their use with this design. • Statistical approach for most inputs; manually override translation of certain texts. • Statistical approach for direct (and transfer). • Hand-build the higher pathways.

Project Status • Finishing design specification. • Beginning implementation. • Other considerations: • Evaluation? • Initial Applications? • How to generate multiple classifier predicates? • Representations to use in transfer pathway?

Questions?

References Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000. Parameterized action representations and natural language instructions for dynamic behavior modification of embodied agents. AAAI Spring Symposium. Bangham, Cox, Lincoln, Marshall. 2000. Signing for the deaf using virtual humans. IEE2000. DeMatteo, A. (1977). Visual Analogy and the Visual Analogues in American Sign Language. In Lynn Friedman (ed.). On the Other Hand: New Perspectives on American Sign Language. (pp 109-136). New York: Academic Press. Holt, J. (1991). Demographic, Stanford Achievement Test - 8th Edition for Deaf and Hard of Hearing Students: Reading Comprehension Subgroup Results. Liddell. 2003. Sources of Meaning in ASL Classifier Predicates. In Karen Emmorey (ed.). Perspectives on Classifier Constructions in Sign Languages. Workshop on Classifier Constructions, La Jolla, San Diego, California. Liddell. 2003. Grammar, Gesture, and Meaning in American Sign Language. UK: Cambridge U. Press. Morford and MacFarlane. 2003. “Frequency Characteristics of ASL.” Sign Language Studies, 3:2. Schuler. 2003. Using model-theoretic semantic interpretation to guide statistical parsing and word recognition in a spoken language interface. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL’03), Sapporo, Japan. Supalla, T. (1978). Morphology of Verbs of Motion and Location. In F. Caccamise and D. Hicks (eds). Proceedings of the Second National Symposium on Sign Language Research and Teaching. (pp. 27-45). Silver Spring, MD: National Association for the Deaf. Supalla, T. (1982). Structure and Acquisition of Verbs of Motion and Location in American Sign Language. Ph.D. Dissertation, University of California, San Diego. Supalla, T. (1986). The Classifier System in American Sign Language. In C. Craig (ed.) Noun Phrases and Categorization, Typological Studies in Language, 7. (pp. 181-214). Philadelphia: John Benjamins.

Advantages of Virtual Reality • ASL signers can arrange objects under discussion in the space around them. • Presence of a virtual reality model in this system enables sophisticated management of these positioned objects. • The AnimNL system can also control the movements of virtual human figures who participate in the 3D scene. These figures possess skills useful for ASL signing; so, we can use one as our signer. • Same technology for signer and 3D spatial model.

System Diagram

Matt Huenerfauth

Matt Huenerfauth

Presentation Transcript

Matt Furie

Matt Slaga matt@slaga.net

Matt Carns

Matt Martin

Matt Wheeler

Matt Hunsaker

Matt Sykes

Matt Wheeler

Matt Greatens

Matt Wheeler

Matt

Matt Palsgrove

Matt Fleekop

Matt Trippel

Matt Hardy

Matt Wheeler

Matt Huenerfauth First Annual Symposium of the Penn Working Group in Language April 17, 2004

Matt Wheeler

Matt Ortega

Matt

Matt

Matt Huenerfauth