240 likes | 356 Vues
MediaEval Workshop 2011. Pisa, Italy 1-2 September 2011. Introduction. Genre Tagging task: Given 1727 videos and 26 genre tags, decide which tag goes to which video. Genres were – art, health, literature. Technology, sports, blogs, religion, travel, etc.
E N D
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011
Introduction • Genre Tagging task: Given 1727 videos and 26 genre tags, decide which tag goes to which video. • Genres were – art, health, literature. Technology, sports, blogs, religion, travel, etc. • Videos were from an online video hosting site called blip.tv
Introduction cont.. • Data given to us: Videos, Speech transcripts, metadata and some user defined tags. • Total data/videos were divided into two sets. • Development set (consisting of 247 videos of which we were given the ground truth, so that we can play around with our algorithm). • Test Set (consisting of 1727 videos for which we were not given the ground truth and we had to submit our results in the workshop).
TUD-MIR at MediaEval 2011 Genre Tagging Task: QueryExpansion from a Limited Number of Labeled Videos
Main Idea • Information Retrieval approach • Just used the textual data • Using a relatively small number of labeled videos in the development set to mine query expansion terms that are characteristic of each genre.
Approach • Combined all the videos of the same genre in the development set together. • Apply preprocessing such as stop word removal and stemming. • Perform weighting and ranking of all the terms in the development set vocabulary. • And then use the top 20 terms from each genre document to be expanded query terms.
Offer Weighting Formula In the formula above, r is the number of videos of a particular genre in which term t(i) appears in, R is the total number of videos of that genre, N is the total number of videos in the collection and n is the number of videos in the collection in which term t(i)appears.
Few other Query Expansion Techniques • They also ran several query expansions: PRF, WordNet, Google Sets and YouTube. • To expand queries via YouTube, they first download metadata (e.g. title, description and tags) of the top-50 ranked videos returned by YouTube for each genre label, except for default category and sample 20 expansion terms from those using the Offer Weight as explained earlier.
LIA @ MediaEval 2011 : Compact Representation ofHeterogeneous Descriptors for Video Genre Classification
Main Idea • Classification approach • A method that extracts low dimensional feature space based on text, audio and video information. • Late fusion of SVM results for each modality.
Data Collection • Training data set was collected from the web. • They first expanded the query terms using Latent Dirichlet Allocation (LDA) on Gigaword corpus and then used top 10 expanded terms for each genre. • They Queried YouTube and Daily-motion for the videos (total of 3120 videos). • For textual data they used web pages from Google (1560 documents/web pages)
Features Extracted • Features – • Text: TF-IDF metric • Audio: Acoustic frames of MFCC every 10ms in a hamming window of 20 ms large. • Visual: Color structure descriptor or dominant color structure like homogeneous texture descriptor or edge histogram descriptor. Texture was the best feature according to them.
Classification • Each modality is separately given to SVM classifier and the scores of each are combined using linear interpolation.
User Name Similarity • They also tried to use the user name similarity in the training set. They refer to the relation of genres and user name as a knowledge base and use it to boost the genre scores. • So they increase the scores of genre for any video if the user name of that video exists in the knowledge base (development set).
TUB @ MediaEval 2011 Genre Tagging Task: Predictionusing Bag-of-(visual)-Words Approaches
Main Idea • Classification task • Bag-of-words approaches with different features derived from visual content and associated textual information
Features Extracted • Mainly textual features: • They translated foreign language program ASR in English using Google Translate. • Used Bag-of-Words (Tf-Idf) model for the textual features. • For visual features: • They used local feature SURF extracted from each key frame of video sequence.
Classification • Fusion: • Early fusion of visual and textual features and then SVM classification. • Classification: • Used multi-class SVM, Multinomial Naïve Bayes and Nearest Neighbor for classification.
SINAI-Genre tagging of videos based on information retrieval andsemantic similarity using WordNet
Main Idea • IR approach • Query expansion using WordNet • And different similarity measure rather than Cosine similarity
Approach • Query Expansion: Produce a bag of words using WordNet’s synonyms, hyponyms and domain terms for each genre term. • An existing framework, Terrier IR system, has been used to obtain a measure of relatedness between the videos and the genre terms.
Second Approach • They also used a formula proposed by Lin, which is based on WordNet, to measure the semantic similarity between the nouns detected in each test video and the bags of words generated for each genre category.
Then they only kept the matches which exceeded the threshold of 0.75 score. • Finally, the accumulated similarity score has been divided by the number of words detected in the video, obtaining the final semantic similarity score.