1 / 1

Introduction

In-Video Product Annotation with Web Information Mining. Guangda Li, Tat-Seng Chua School of Computing, National University of Singapore. Introduction

deana
Télécharger la présentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In-Video Product Annotation with Web Information Mining Guangda Li, Tat-Seng Chua School of Computing, National University of Singapore Introduction Product annotation in videos is of great importance for video browsing, search and advertisement. However, most of the existing automatic video-annotation techniques focus on the annotation of high-level concepts, such as events, scenes and object categories. This technology introduces a novel solution to the annotation of specific products in videos by mining information from the Web. This invention first collects a set of high-quality training data for the product of interest by simultaneously leveraging Amazon and Google image search engine. A visual signature of the product of interest is then built based on the bag-of-visual-words representation of the images used for training. A correlative sparsification approach is employed to remove noisy images from the visual signatures. These signatures are used to annotate video frames. Result Analysis Framework for of Visual Reference System Example annotated frames of product in video from YouTube video The AP comparison of automated product annotation in videos with only using text clue, only using visual clue, and both text and visual clues. We can see that the results based on visual information are much better than those obtained using only text information. Combining text and visual information achieves the best results. BoVW sparsfication can reduce the noisy visual words. The left column is the visual signature file/ The right column shows the corresponding visual words in a video frame. Due to privacy issue, we blurred the human face in the figure.

More Related