Matt Huenerfauth

Spatial Representations of Classifier Predicates for Machine Translation into American Sign Language Matt Huenerfauth Workshop on the Representation and Processing of Signed Languages4th International Conference on Language Resources and EvaluationMay 30, 2004, Lisbon, Portugal Computer and Information ScienceUniversity of Pennsylvania Research Advisors: Mitch Marcus & Martha Palmer

Motivations and Applications • Only half of Deaf high school graduates (age 18+) can read English at a fourth-grade (age 10) level, despite ASL fluency. • Many Deaf accessibility tools forget that English is a second language for these students (and is different than ASL). • Applications for a Machine Translation System: • TV captioning, teletype telephones. • Computer user-interfaces in ASL. • Educational tools, access to information/media. • Transcription, storage, and transmission of ASL.

Input / Output What’s our input? English Text. What’s our output? Less clear… Imagine a 3D virtual reality human being… One that can perform sign language… But this character needs a set of instructions telling it how to move! Our job: English  These Instructions. VCom3d

Off-the-Shelf Virtual Humans Photos: Seamless Solutions, Inc.Simon the Signer (Bangham et al. 2000.)Vcom3D Corporation

Classifier Predicates The car drove down the bumpy road past a cat.CAT ClassPred-bentV-{location of cat}CAR ClassPred-3-{drive on bumpy road} Where’s the cat, the road, and the car? How close? Where does the path start/stop? How show path is bumpy, winding, or hilly? • Pushing the boundaries of ‘language.’ • Hard to handle with traditional computational linguistic representations (lexicons, grammars).

Previous ASL MT Systems • Word-for-Sign direct transliteration. • Produces Signed English, not ASL • Syntactic analysis, transfer, generation. • Handles much of the non-spatial phenomena. • All ignore classifier predicates. • Need ASL classifiers to fluently translate many English input texts. • Signers use classifier predicates once per minute in most genres (17x/minute in some). Morford and McFarland, 2003.

Focus and Assumptions • Other systems: non-spatial ASL sentences. • This project: spatially complex ASL. • This means classifier predicates! • Predicates of movement and location • Generating a single classifier predicate (multi-predicate issues also being studied)

Motivating a Design for a Classifier Predicate Generator Four progressively better designs…

Four Designs: Keep Improving

Design 1: List them all… • Multi-word English lexical entries. • Associate a classifier predicate with each. • Exhaustively list them all… Problem? • Anticipate all of them? • ClassPreds are very productive. • Many ways to modulate performance. • “…drive up the hill…” • This approach is impractical.

Design 2: Composition Rules • Identify minimal components of meaning. • Corresponding element of movement/shape: • path contour, hand elevation, palm orientation… • e.g. “…which way is the person facing…” • Compositional rules to combine these ‘morphemes’ into full classifier predicate.

Design 2: Linguistic Analogy • Analogous to Suppalla’s polymorphemic model of classifier predicate generation. (1978, 1982, 1986) • Every piece of information = morpheme. • Build the predicate = combining lots of them. • E.g. “…two people meet…” (Liddell, 2003)Morpheme count explosion! Not practical.

So, what’s the problem? • Every 3D location/path = a new morpheme. • No model of how objects arranged in space… • 3D model = more intuitive. • Easier to select the motion path of our hand. • Need many fewer morphemes. • Analyze English text  make a 3D model. • 3D coordinates  How to move our hand.

Virtual Reality Spatial Model Four Designs: Keep Improving

A Useful Technology… Controlling a virtual reality with English input commands…

Controlling a Virtual Reality with NL • AnimNL System • Virtual reality model of characters/objects in 3D. • Input: English sentences. Directions for characters/objects to follow. • Produces an animation:Characters/objects obey the English commands. • Updates the 3D scene to show changes. Badler, Bindiganavale, Allbeck, Schuler, Zhao, Lee, Shin, and Palmer. 2000.Schuler. 2003.

English-Controlled 3D Scene http://hms.upenn.edu/software/PAR/images.html

How it Works English Text  Syntactic Analysis  Select a PAR Template  Fill the PAR Template  “Planning Process”  Animation Output PAR = “Parameterized Action Representation” (on next slide)

Bob tripped on the ball. Bob { ball_1 } {Bob, translate…, rotate…} Specifics of the path taken… End at 6am. …until dawn.…for 3 hours.…rapidly. 3 Hours. …tripped… Accidentally, Rapidly. Accidentally. Planning Operator(Artificial Intelligenceformalism for deciding howto act in complex situation.) Parameterized Action Representation participants: [ agent: AGENT objects: OBJECT list ] semantics: [ motion: {Object, Translate?, Rotate?} path: {Direction, Start, End, Distance} termination: CONDITION duration: TIME-LENGTH manner: MANNER ] start: TIME prep conditions: CONDITION boolean-exp sub-actions: sub-PARs parent action: PAR24 previous action: PAR35 next action: PAR64 What is a “planning” algorithm good for? This is a subset of PAR info. http://hms.upenn.edu/software/PAR

Adding Detail, Making Animation • PAR is missing details needed to create animation. • “…turn the handle…” • Use an artificial intelligence “planning” algorithm • Calculate preconditions, physical constraints, sub-actions, effects, etc. of each animation movement. • Works out the details needed to build animation.

Diagram of AnimNL

A 3D Spatial Model forAmerican Sign Language Using the virtual reality English-command technology

English-Controlled 3D Scene http://hms.upenn.edu/software/PAR/images.html

Using this technology… http://hms.upenn.edu/software/PAR/images.html An NL-Controlled 3D Scene

Using this technology… An NL-Controlled 3D Scene

Using this technology… Original image from: Simon the Signer (Bangham et al. 2000.) An NL-Controlled 3D Scene Signing Character

“Invisible World” Approach • Invisible objects floating in front of the signer. • English sentences  commands for virtual reality. • Positions, moves, and orients objects in this world. • So, we’ve got all these floating invisible objects… What do we do with them?

Using the 3D Virtual Reality Design 3 and Design 4

Design 3: Directly Pictorial • Invisible 3D Objects  Classifier Predicate • Put hand in the proper handshape • Place hand directly on top of (inside of)object in the 3D scene. • Follow the paths objects trace through space. We go along for the ride!

Diagram of Design 3

The AnimNL Technology Diagram of Design 3

Linguistic Analogy / Problems • DeMatteo’s gestural model of classifier predicates (1977) • Mental model of scene. • Move hands in topologically analogous manner. • Merely iconic gestural movements. • Problem? Overgenerative. • Doesn’t explain conventions/restrictions: • legal combinations of handshape/movement. • some movements not visually representative. • discourse factors / multi-predicate concerns. (Liddell, 2003) Design 3 hassame problem!

This process is harder than it seems. Diagram of Design 3

The Solution? More Templates! • Can’t just ‘go along for the ride.’ • Making a ClassPred is more complicated. • Our last complicated animation task? • Move 3D objects based on English text. • We used templates and ‘planning’. • Can we do something like this again? • This time: how to move the arm to do a ClassPred.

Insert a template library here… Insert a planningprocess here… Diagram of Design 3

Diagram of Design 4

Library 1 Library 2 A Second PAR Template Library • First library of templates: Possible movements of invisible objects in virtual reality. • Second library: Possible movements of the signer’s hands while performing classifier predicates to describe these objects. Original image from: Simon the Signer (Bangham et al. 2000.)

Selecting/Filling a Template • Big list of prototypical classifier predicates stored as templates. • Select a template based upon: • English lexical items • Linguistic features in English sentence • 3D coordinates and motion paths of objects • Let planning process build animated output. • How is this better than design 3?

…leisurely walking along… • AnimNL: English Text  Virtual Reality. • Parse of Sentence  Select a Template Leisurely-Walking-Upright-Figure • Specifies handshape, palm orientation, “bouncing” path contour, and speed/timing. • Still needs 3D starting/stopping coordinates. • Get coordinates from “invisible world,” fill template, let animation software make output. How’s it better? Invisible world motion path≠ hand motion path.

Linguistic Motivations • “Blended Spaces” Lexicalized Classifier Predicate Model of Scott Liddell (2003). • Signers imagine objects occupying space. • Classifier predicates stored as: lexicon of templates that are parameterized on locations/orientations of these spatial entities. • Both engineering & linguistic motivations.

Exciting Possibilities The Four Designs: Wrap Up

Generating Multiple Classifier Predicates

Generating Multiple ClassPreds • One English sentence  one classifier predicate ? • Sometimes one-to-many or many-to-one. • Change in ordering or organization. • Interact or constrain one another. • Emergent meaning from multiple predicates. • Need to think about generation decisions at a multi-predicate level for the entire scene being described. Need a representation of how several predicates work together for one 3D scene.

Generating Multiple ClassPreds • We’re using PARs as classifier predicate templates. • But PARs can store sub-PARs inside of them to represent sub-parts of a movement. • Instead of using a PAR to store only one classifier predicate template, let’s store several templates together as a group. • We’ll call this group of CPs a “motif.”

Generating Multiple ClassPreds • Several English sentences can trigger a large structure containing rules for how to compose several classifier predicates. • Allows us to plan out a whole scene. • Allows us to rearrange or introduce additional classifier predicates to satisfy ASL grammatical constraints. • Planning process expands motif  several CPs. (More details in paper.)

Benefits of Virtual Reality

Matt Huenerfauth

Matt Huenerfauth

Presentation Transcript

Matt Furie

Matt Slaga matt@slaga.net

Matt Carns

Matt Martin

Matt Wheeler

Matt Hunsaker

Matt Sykes

Matt Wheeler

Matt Greatens

Matt Wheeler

Matt

Matt Palsgrove

Matt Fleekop

Matt Trippel

Matt Hardy

Matt Wheeler

Matt Huenerfauth First Annual Symposium of the Penn Working Group in Language April 17, 2004

Matt Wheeler

Matt Huenerfauth

Matt Ortega

Matt

Matt