300 likes | 563 Vues
Video Data Retrieval. Pratyusha Koduri Anish Reddy Devireddy Akaash Vankamamidi. Outline. Introduction Problem Statement Our Contributions Related Work Methodology Comparison Evaluation Future Work Conclusion References. Introduction. Problem: Growing amounts of video data.
E N D
Video Data Retrieval Pratyusha Koduri Anish Reddy Devireddy Akaash Vankamamidi
Outline • Introduction • Problem Statement • Our Contributions • Related Work • Methodology • Comparison • Evaluation • Future Work • Conclusion • References
Introduction • Problem: Growing amounts of video data. • video data in form of News Video, Film archives, Surveillance, user-generated content, distance learning, video conferencing, medical applications, sports • Video data is dynamic • With the development of multimedia data types and available bandwidth there is huge demand of video retrieval systems • One could store the digital video information on tapes, CD-ROMs, DVDs, or any such device. • Goal: Effective video retrieval.
Problem Statement • All the papers we worked on are related to retrieval of video data. And how to do this on a compressed video data. • In content-based video retrieval systems choosing features reflect real human interest and how do feature extraction affects the video retrieval.
Our Contributions • First, we identify the video retrieval approaches from spatial and temporal analysis • We focused on content-based video retrieval systems and video retrieval in compressed data • classify the methods and summarize the future trends and open problems of video retrieval
Related Work • Since we have large amounts of data Compress it • Retrieving Data from compressed data without processing overhead • To index and retrieve semantic datawe use semantic indexing of video data using the generalized n-ary operators • Dominant regions are used in video indexing and retrieval which include all types of users. • The main objective is to provide concurrency control for virtual editing of video data among different users
Related Work • Framework for semantic retrieval of video database. • Each frame of video clips characterized by its HSV (hue-saturation-value) color feature, is first projected onto the spatial principle components • Efficient video retrieval method takes users feedback on the relevance of retrieved videos and iteratively reformulates the input query feature vectors (QFV) for improved video retrieval
Key Concepts • QFV reformulation Performed by optimization method based on Simultaneous Perturbation Stochastic Approximation (SPSA)technique • Relevance feedback (RF), a popular technique in the area of content-based image retrieval (CBIR). • Tracking semantic objects in a video and then modeling spatio-temporal events based on object trajectories and object interactions Mine spatio-temporal data
Video Retrieval Useful in • Historical Archives • Forensic documents • Fingerprint & DNA matching • Security usage Retrieval Granularity is also important. • How do users want to retrieve materials? • What is the purpose of retrieval? • What is the user expertise?
Content Based Video Retrieval • Content-based video retrieval systems automatically index video material by segmenting it into clips and extracting features such as text, color, texture, motion from each clip to support search. • As digital video collections become more widely available, content-based video retrieval tools will likely grow in importance for an even wider group of users. • CBVR system aims at assisting a human operator (user) to retrieve sequence (target) within a potentially large database
Content Based Video Retrieval • Selection of extracted features play an important role in content based video retrieval • Content based Video Indexing and Retrieval (CBVIR), is an extension to application of image retrieval problem • “Content-based” means that the search will analyze the actual content of the video. The term ‘Content’ in this context might refer colors, shapes, textures. • These systems are aiming at accessing video by its content, namely, the spatial-temporal (video) information.
Methodology • The first step for video-content analysis, content based video browsing and retrieval is the partitioning of a video sequence into shots • Once key frames are extracted next step is to extract features • breakdown Sequence->scene->shot->frame->object
Features • Two type • Low-level • High-level • Low-level features such as object motion, color, shape, texture, loudness, power spectrum, bandwidth, and pitch are extracted directly from video in the database • High-level features are also called semantic features. Features such as timbre, rhythm, instruments, and events involve different degrees of semantics contained in the media
Issues • One of the key issues in CBVR is, to bridge the ”semantic gap”, which refers to the gap between low level features and high level semantic meanings of content • Low level features such as color and textures are easy to measure and compute • But it is a challenge to connect the low level features to a semantic meaning, especially involving intellectual and emotional aspects of the human operator (user). • Another issue is how to efficiently access the rich content of video information, these involves video content, spatial and temporal analysis of videos
Generalized n-ary relation • The principle component of video data is the spatial/temporal semantics associated with it • Generalization in both spatial and temporal domains is to simplify describing complex spatial or temporal events. • For the spatial domain the operands represent the physical location of the objects • In temporal case they represent the duration of a certain temporal event.
N-Ary • Spatial event, consider a player holding the ball in a basketball game. • A frame consisting event "player holding the ball". • This is characterized by six of the n-ary relations in both x and y coordinates . M, O, C, S, CO,E • Spatial events can serve as the low level (fine-grain) indexing mechanisms for video data. • Temporal event is extension of the spatial event “holding a ball” to ‘passing of a ball between two players”. • B is the before n-ary operation, and d(Events) are the durations of the spatial events
Architecture The system is hierarchical in nature and allows multi-level indexing and searching mechanism by modeling information at various levels of semantic granularity and hence allows processing of content-based queries without processing raw image or video data
Retrieval In Compressed Data • To avoid the processing overhead of decompressing video stream into individual frames, it is better to detect these features directly from compressed video data. • Spatio-temporal data can be dominant regions, color information and motions from compressed video data. • Dominant regions are used in video indexing and retrieval, these are extracted from intensity data. • DC Image Data Quantization Flat Regions Simplified Data Filtering Watershed Algorithm Dominant Regions
Retrieval In Compressed Data... • Color information is computed from HSV quantized table. • Camera motion detection from region-based segmented data. • Based on above features we can extract semantic information of video content. • Above information can be useful in content based video indexing and retrieval.
Comparison Study Summary • Key issues we noticed in this study are • 1. Bridging the semantic gap: • To do annotation automatically or semi-automatically, we need to bridge the "semantic gap", i.e., to find algorithms that will infer high-level semantic concepts (sites, objects, events) from low-level image/video features that can be easily extracted from the data (color, texture, shape and structureetc) • One sub-problem is Audio Scene Analysis. Researchers have worked on Visual Scene Analysis (Computer Vision) for many years, but Audio Scene Analysis is still in its infancy, and an under-explored field.
Comparison Study Summary • 2) Human intelligence and machine intelligence • One advantage of information retrieval is that in most scenarios there is a human (or humans) in the loop. One prominent example of human-computer interaction is Relevance Feedback. • 3) New Query Paradigms • For image/video retrieval, people have tried query by keywords, similarity, sketching an object, sketching a trajectory, painting a rough image, etc. Can we think of useful new paradigms? • 4) Data Mining • Searching for interesting/unusual patterns and correlations in video has many important applications, including Web Search Engines and dealing with intelligence data. Work to date on Data Mining has been mainly in Text data.
Comparison Study Summary • 5) Unlabeled Data • Can we use the large number of unlabeled samples in the database to help? • Another problem related to image/video data annotation is Label Propagation. Can we label a small set of data and let the labels propagate to the unlabeled samples? • 6) Incremental Learning • In most applications, we keep adding new data to the database. We should be able to change the parameters of the retrieval algorithms incrementally, not needing to start from scratch every time we have new data.
Comparison Study Summary • 7) Using Virtual Reality Visualization To Help • Can we use 3D audio/visual visualization techniques to help a user to navigate through the data space to browse and to retrieve? • 8) Structuring Very Large Databases • Researchers in audio/visual scene analysis and those in Databases and Information Retrieval should really collaborate CLOSELY to find good ways of structuring very large video databases for efficient retrieval and search.
Comparison Study Summary • 9) Applications of Video Retrieval • Few real applications of video retrieval have been accepted by the general public so far. Is web video search engine going to be the next killer application? It remains to be seen. With no clear answer to this question, it is still a challenge to do research that is appropriate for real applications.
Conclusion & future work • Despite the considerable progress of academic research in video retrieval, there has been relatively little impact of content based video retrieval research on commercial applications with some niche exceptions such as video segmentation. • Choosing features that reflect real human interest remains an open issue. One promising approach is to use Meta learning • Low to High Level Semantic Gap: Visual feature based techniques at the low level of abstraction, mostly from the contribution of signal processing and computer vision communities have been explored in the literature. • Current research efforts are more inclined towards high-level description and retrieval of visual content. • The techniques that bridge this semantic gap between pixels and predicates are a field of growing interest. • Intelligent systems are needed that take low-level feature representation of the visual media and provide a model for the high-level object representation of the content.
References • http://research.microsoft.com/en-us/um/people/yongrui/ps/sigproc06.pdf • Day, Y.F.; Dagtas, S.; Iino, M.; Khokhar, A.; Ghafoor, A., "Spatio-temporal modeling of video data for on-line object-oriented query processing," Multimedia Computing and Systems, 1995., Proceedings of the International Conference on , vol., no., pp.98,105, 15-18 May 1995 • Hang-Bong Kang, "Spatio-temporal feature extraction from compressed video data," TENCON 99. Proceedings of the IEEE Region 10 Conference , vol.2, no., pp.1339,1342 vol.2, Dec 1999 • Sze-Man Chan, S.; Li, Qing, "VideoMAP*: a Web-based architecture for a spatio-temporal video database management system," Web Information Systems Engineering, 2000. Proceedings of the First International Conference on , vol.1, no., pp.393,400 vol.1, 2000 • Xia, J.; Wang, Y., "A spatio-temporal video analysis system for object segmentation," Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the 3rd International Symposium on , vol.2, no., pp.812,815 Vol.2, 18-20 Sept. 2003 • Bo Geng; Hong Lu; XiangyangXue, "IncremetalSpatio-Temporal Feature Extraction and Retrieval for Large Video Database," Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on, vol., no., pp.961,964, 27-30 May 2007 • Velusamy, S.; Bhatnagar, S.; Basavaraja, S. V.; Sridhar, V., "SPSA based feature relevance estimation for video retrieval," Multimedia Signal Processing, 2008 IEEE 10th Workshop on , vol., no., pp.598,603, 8-10 Oct. 2008 • XinChen; Chengcui Zhang, "An Interactive Semantic Video Mining and Retrieval Platform--Application in Transportation Surveillance Video for Incident Detection," Data Mining, 2006. ICDM '06. Sixth International Conference on , vol., no., pp.129,138, 18-22 Dec. 2006 • Mehmet EminDönderler;ÖzgürUlusoy; UgurGüdükbay “Rule-based spatiotemporal query processing for video databases”The VLDB Journal- The International Journal on Very Large Data Bases; Volume 13 Issue 1, January 2004; Pages 86 – 103 • FudongSun; Minyong Shi; Weiguo Lin, "Feature Label Extraction of Online Video," Computer Science and Electronics Engineering (ICCSEE), 2012 International Conference on , vol.3, no., pp.211,214, 23-25 March 2012 • Divakaran, A.; Vetro, A.; Asai, K.; Nishikawa, H., "Video browsing system based on compressed domain feature extraction," Consumer Electronics, IEEE Transactions on , vol.46, no.3, pp.637,644, Aug 2000 • Al-Salih, A.A.M.; Ahson, S.I., "Object detection and features extraction in video frames using direct thresholding," Multimedia, Signal Processing and Communication Technologies, 2009. IMPACT '09. International , vol., no., pp.221,224, 14-16 March 2009 • SifeiLu; Li, R.M.; Tjhi, W.-C.; KeeKhoon Lee; Long Wang; Xiaorong Li; Di Ma, "A Framework for Cloud-Based Large-Scale Data Analytics and Visualization: Case Study on Multiscale Climate Data," Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on , vol., no., pp.618,622, Nov. 29 2011-Dec. 1 2011