Semantic Video Classification Based on Subtitles and Domain Terminologies

Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades Pervasive Computing Research Group Communication Networks Laboratory Department of Informatics and Telecommunications University of Athens – Greece KAMC ‘07 @ Genoa, Italy polina@di.uoa.gr polina@di.uoa.gr polina@di.uoa.gr b.tsetsos@di.uoa.gr b.tsetsos@di.uoa.gr b.tsetsos@di.uoa.gr shadj@di.uoa.gr shadj@di.uoa.gr

Outline • The Polysema Platform • Introduction-Motivation • VideoCategorizationMethod • ExperimentalEvaluation • Conclusions - Future Work

Polysema platform • Develops an end-to-end platform for iTV services • Semantics-related research focuses on the development of: • semantics extraction techniques for automatic annotation of audiovisual content, • a personalization framework for iTV services with SW technologies, • a tool with GUI for video annotation and MPEG-7 metadata creation http://polysema.di.uoa.gr

Introduction - Motivation • Multimedia databases are becoming popular • Most video classification methods are based on visual/audio signal processing • Text processing is more lightweight than visual/audio processing • High-level semantics are more closely related to human language than to visual features • Subtitles capture the semantics of the corresponding video

Step 1: Text Preprocessing • Subtitles are segmented into sentences • A Part of Speech Tagger is applied to each sentence • Stop words (e.g., “to”, “him”) are removed based on a stop words list

Step 2: Keyword extraction • We used the TextRank algorithm to extract keywords • TextRank • represents the text as a graph, • applies to the vertices a ranking algorithm based on Google’s PageRank, • sorts vertices in decreasing rank order, • extracts the top highly ranked vertices for further processing TextRank: Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts, in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, July 2004

Step 3: Word Sense Disambiguation • Words have many possible meanings, called senses • A Word Sense Disambiguation (WSD) algorithm is applied to determine the correct sense of each word • WSD • is based on the lexical database WordNet, • is a variation of Lesk’s WSD algorithm WSD: Banerjee, S., Pedersen, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In the Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLING-02) Mexico City, Mexico (2002)

Step 4: WordNet Domains Extraction (1/2) • augment WordNet with domain labels • a taxonomy of ~200 domain labels • synsets have been annotated with at least one domain label WordNet domains WN domains: http://wndomains.itc.it/wordnetdomains.html

Step 4: WordNet Domains Extraction (2/2) • For each video: • Extract the WordNet domains for each keyword’s sense • Calculate the frequency occurrence of each domain label • Sort domain labels in decreasing order according to their occurrence frequency

Step 5: Correspondences between categories & WN domains • For each category label: • Look up in WordNet the senses related to it (include senses related through hypernym & hyponym relations) • Obtain the corresponding WordNet domains • Calculate the occurrence score for each domain • Sort domains in decreasing occurrence order Example:

Step6: Category label assignment • Compare the ordered list with the WN domains of each video with the ordered list of the WN domains of each category Example: WN domains of a video science animals

Experimental Evaluation (1/2) • 36 subtitle files of documentaries • 36 subtitle files of documentaries Statistical information of files (average values): • Classify under the categories: geography, animals, history, war, technology, science, accidents, music, transportation, people, religious, politics, arts • Classify under the categories: geography, animals, history, war, technology, science, accidents, music, transportation, people, religious, politics, arts

Experimental Evaluation (2/2) • Classifiers: • Proposed method • Proposed method in which Step 6 has been replaced with Spearman’s footrule distance • J4.8 • decision tree classifier • supervised approach

Conclusions – Future Work • Conclusions • A novel approach that is based only on text and uses natural language processing techniques • No training phase is required (unsupervised approach) • Future Work • The application of a method on a per video segment basis • Definition of domain knowledge more close to movie classification • Performance comparison with other unsupervised approaches

Thank you! Questions??? http://p-comp.di.uoa.gr

Semantic Video Classification Based on Subtitles and Domain Terminologies