1 / 1

SHE WOMAN HER HOUSE FIRE

SHE WOMAN HER HOUSE FIRE. Matching (Dynamic Programming). Motion Representation (Curves in SoRD). Learning Sign Models (Signemes). Features (Relational Distributions). Edge Sequence. Video.

kassia
Télécharger la présentation

SHE WOMAN HER HOUSE FIRE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SHE WOMAN HER HOUSE FIRE Matching (Dynamic Programming) Motion Representation (Curves in SoRD) Learning Sign Models (Signemes) Features (Relational Distributions) Edge Sequence Video Models for Automated Sign Language RecognitionSunita Nayak1, Sudeep Sarkar1, Barbara Loeding21Department of Computer Science & Engineering, 2Department Of Special Education • We report the results in terms of Start Offset and End Offset, i.e. the differenceof frames between the ground truth sign and our retrieved signeme. • Ground truth being used from Boston University’s SignStream annotations. • Tested with 18 signs. • Most of the previous works in sign language recognition use Hidden Markov Models and use color gloves or magnetic trackers. Our work uses plain color videos without any wearable aids. It is based on computing a relational distribution for each image in the video and embedding them in a lower dimensional space called Space of Relational Distributions. • Relational Distribution: Given an edge image of a frame in the video, what is the probability that the horizontal and vertical distance between any two edge pixels is x and y, respectively. • Space of Relational Distribution (SoRD) is arrived at by carrying out a principal component analysis of all the relational distribution images. • Series of points representing the frames of a sentence were linearly • interpolated in the SoRD space to form a point series that represents the • sentence. • Frames representing less motion in the sentence are represented by • closer points along the curve than the frames representing larger motion, • that are represented by points farther apart along the curve, thus achieving • speed invariance. • We consider two linearly interpolated, time-normalized sentences having • a common sign with different adjacent signs. For example, for the sign, • ‘House’, the first two training sentences considered are • “ SHE WOMAN HER HOUSE FIRE”, and • “fs-JOHN CAN BUY HOUSE FUTURE” • Segments with length of k points from the first sentence are compared • with all the segments of length k-points from the second sentence, and • the best matching segments are found using dynamic programming. Problem Statement Extract sign models from continuous sentences of American Sign Language. In the following two sentences, the target word to be extracted is HOUSE. The frames representing the sign ‘HOUSE’ are marked in red, and neighboring words are marked in megenta. The frames in between indicate co-articulation between signs. Some extracted signemes are explicitly shown below: (Ground truth sign is marked in RED, while the localized signemes are marked in GREEN) Sign - 'BUY’ Test sentence - 'JOHN BUY WHAT?' fs-JOHN CAN BUY HOUSE FUTURE Sign - 'FUTURE‘, Test sentence - 'FUTURE JOHN BUY HOUSE' The Approach Conclusion & Future Work: • We presented a novel approach for automatic extraction of sign models from continuous American Sign Language sentences using a time-normalized, continuous representation of signs and sentences. • We proposed the concept of signemes that is robust with respect to coarticulation effects, for modeling signs. • In future, we plan to expand the number of signs and work towards signer independence. • The segment extracted from the first sentence is then used to • extract the segments from all other training sentences. • Signeme is defined as the mean of all the extracted segments. • Advantages: • Does not use tracking • Captures the motion required for discriminating the signs in video • sequences without use of color gloves or magnetic trackers. • Trains the segmentation noise in the sign models themselves, resulting in • practical models with respect to segmentation. • Takes into account the relative position of the head and the two hands, • which is quite important for sign language recognition. Broader Scope: Our research leads towards increasing the ways people can communicate with computers. It would help the Deaf to communicate naturally to other hearing people who do not understand sign language by translating their signs to plain English words that can be read by others. Also for the hearing community, it would increase the ways by which they can communicate with machines. It would help people interact with computers visually using cameras. Tests and Results: • Signeme models were used to localize the • signs in the new test sentences. • The models were compared with the • speed-normalized SoRD test curves to find • the position of best match using the • Euclidean matching similar to the signeme • extraction process. Acknowledgment: This work was supported by the US National Science Foundation ITR Grant No. IIS 0312993.

More Related