1 / 45

A Framework for the Representation and Integration of Multimedia Content and Context Information

A Framework for the Representation and Integration of Multimedia Content and Context Information. Radu S. Jasinschi Philips Research. Overview. Introduction Related work Problem statement Proposed formalism Representation: content and context Multimodal integration:

marcin
Télécharger la présentation

A Framework for the Representation and Integration of Multimedia Content and Context Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Framework for the Representation and Integrationof Multimedia Content and Context Information Radu S. JasinschiPhilips Research ECE-CMU, April 29, 2002

  2. Overview • Introduction • Related work • Problem statement • Proposed formalism • Representation: content and context • Multimodal integration: • Bayesian networks: content information • Hierarchical priors: context information • Application: Video Scout • Experiments • Conclusion ECE-CMU, April 29, 2002

  3. Introduction • Market facts: • Digital video consumption: 300 + channels • Personalized Video Recorders: next wave • Web search engines: exponential grow in multimedia information • Research needs: • Content-based video analysis and retrieval of multimedia information • High-level video content information indexing • Proposed framework: • Content and context information • Structured representation • probabilistic integration ECE-CMU, April 29, 2002

  4. Related Work • Video databases • QBIC (IBM) • Informedia (CMU) • Virage • VideoQ (Columbia University) • Probabilistic Methods • M. Naphade (IBM) • N. Vasconcelos (COMPAQ) • Speech driven applications • C. Neti (IBM) • T. Chen (CMU) ECE-CMU, April 29, 2002

  5. Problem Statement • How do we segment, index, and, store many hours of video from 300 + TV channels? • How do we represent and integratemultimodal information? ECE-CMU, April 29, 2002

  6. Overview • Introduction • Related work • Problem statement • Proposed formalism: • Representation: content and context • Multimodal integration: • Bayesian networks: content information • Hierarchical priors: context information • Application: Video Scout • Experiments • Conclusion ECE-CMU, April 29, 2002

  7. Proposed Formalism • Structured multimedia representation • Content information • Granularity • Abstraction • Context information • Probabilistic method of multimodal information integration • Bayesian networks • Hierarchical priors ECE-CMU, April 29, 2002

  8. Multimedia Content Information • Multimedia content: objects • Three modalities • visual: shots, faces, trees, etc. • audio: speech, music, etc. • text: transcript, keywords, etc. • Structured content representation • Levels of granularity and abstraction • Allows for the consistent representation and integration of multimedia content information ECE-CMU, April 29, 2002

  9. Structured Content Representation • Content granularity: levels of detail • Content abstraction: semantic information ECE-CMU, April 29, 2002

  10. Multimedia Context Information • Context information • Underlying structure, signature or patterns • Supports an interpretation but it is not and interpretation itself • Can be used to constraint the content information, reducing the number of possible interpretations • Content versus context information ECE-CMU, April 29, 2002

  11. Multimedia Context Taxonomy ECE-CMU, April 29, 2002

  12. Semantic (Textual) Context • Formalized in the linguistic domain • Example: the proposition P (“Holmes is a detective”) has an ambiguous meaning • Knowledge of its semantic context, in this case the stories of Sherlock Holmes, disambiguates the statement • Formalization: ist (context-of (“Sherlock Holmes stories”, “Holmes is a detective”)) • General structure ist(C, P), where C is the context ECE-CMU, April 29, 2002

  13. Multimedia Context • Visual context taxonomy ECE-CMU, April 29, 2002

  14. Multimedia Context • Audio context taxonomy ECE-CMU, April 29, 2002

  15. Multimodal Integration • Combine evidence: robustness • Use all modalities: visual, audio, text • Integrate content information • Integrate content and context ECE-CMU, April 29, 2002

  16. Probabilistic Framework • Bayesian network • Integrate content information at the same granularity level • intra-modality: same mode, different attributes • inter-modality: different mode and attributes • Link different levels of granularity • Hierarchical priors • Integrate content and context • Context use as “prior” information to content ECE-CMU, April 29, 2002

  17. Bayesian Network: Example ECE-CMU, April 29, 2002

  18. Bayesian Network: Elements • Directed acyclic graph • Conditional probability • Joint probability distribution ECE-CMU, April 29, 2002

  19. Hierarchical Priors: Example ECE-CMU, April 29, 2002

  20. Hierarchical Priors: Elements • Chapman-Kolmogoroff equation • Nested priors ECE-CMU, April 29, 2002

  21. Content and Context Layers • Combine Bayesian networks and hierarchical priors ECE-CMU, April 29, 2002

  22. Overview • Introduction • Related work • Problem statement • Proposed formalism: • Representation: content and context • Multimodal integration: • Bayesian networks: content information • Hierarchical priors: context information • Application: Video Scout • Experiments • Conclusion ECE-CMU, April 29, 2002

  23. Application: Video Scout • End-to-end system prototype of personal video recorder • Input • Broadcast TV program video • Electronic program guide (EPG) • Personal profiles: program (PPP) and content (CPP) • Output • Segmented and indexed TV program segments by topics ECE-CMU, April 29, 2002

  24. Video Scout: Overview ECE-CMU, April 29, 2002

  25. Content and Context Layers ECE-CMU, April 29, 2002

  26. TV Programs • Domain structure • Commercials versus program parts • Commercials: short (~30sec.), fast pace • Program: long (> 5min.), specific structure • Multimodal (visual, audio, and transcript) information • Structural correlation • Stochastic nature of multimedia information ECE-CMU, April 29, 2002

  27. PSS Frames 11 PSS 12 Program sub-segments PS 1 PSS 1N1 COMM PS 2 Program segments COMM PS 3 COMM PS 4 TV Program Structure Commercials ECE-CMU, April 29, 2002

  28. Overview • Introduction • Related work • Problem statement • Proposed formalism: • Representation: content and context • Multimodal integration: • Bayesian networks: content information • Hierarchical priors: context information • Application: Video Scout • Experiments • Conclusion ECE-CMU, April 29, 2002

  29. Experiments • Input • 9 TV programs (~6 hrs.) • Financial news and talk shows • Features • Visual: keyframes, visual text, faces • Audio: noise (No), speech (Sp), music (Mu), Sp+Mu, Sp+Sp, and Sp+Mu • Transcript (close captioning): 20 categories • Output: TV program segments and their classification according to topics ECE-CMU, April 29, 2002

  30. Algorithm for Segmentation and Indexing 1. Commercial segmentation 2. Program sub-segment (PSS) generation 3. Frame-based probability generation 4. PSS probabilities’ computation: P_AUDIO_FIN, P_AUDIO_TALK, P_CC_FIN, P_CC_TALK P_FACETEXT_FIN, P_FACETEXT_TALK 5. Combine PSS with context probabilities 6. Compute joint probabilities: P_FIN_TOPIC, P_TALK_TOPIC ECE-CMU, April 29, 2002

  31. Example: Letterman • CC Categories ECE-CMU, April 29, 2002

  32. CC Categories Example: Letterman PSS # 12 • Mid-level audio probabilities • Mid-level visual features’ probabilities ECE-CMU, April 29, 2002

  33. PSS Content Probabilities • PSS # = 12, start_time = 23614, end_time = 24727 (frames) • Visual • P_V_FACE = 0.91, P_V_TEXT = 0.09 • Audio • P_NOISE = 0.11, P_SPEECH = 0.74, P_MUSIC = 0.00, P_SPEECH + NOISE = 0.00, • P_SPEECH + SPEECH = 0.00, P_SPEECH + MUSIC = 0.15 • Transcript (Close Captions) • P_CC_WEATHER = 0.20, P_CC_INTERNATIONAL = 0.20, P_CC_CRIME = 0.00, P_CC_SPORT = 0.20, P_CC_MOVIE = 0.20, P_CC_FASHION = 0.00, • P_CC_TECH_STOCK = 0.00, P_CC_MUSIC = 0.00, P_CC_AUTOMOBILE = 0.00, P_CC_WAR = 0.00, P_CC_ECONOMY = 0.20, P_CC_ENERGY = 0.00, • P_CC_STOCK = 0.00, P_CC_VIOLENCE = 0.00, P_CC_FINANCIAL = 0.00, P_CC_NATIONAL = 0.00, P_CC_BIOTECH = 0.00, P_CC_DISASTER = 0.00, • P_CC_ART = 0.00, P_CC_POLITICS = 0.00 ECE-CMU, April 29, 2002

  34. Audio Genre Context Probabilities ECE-CMU, April 29, 2002

  35. Visual Genre Context Probabilities ECE-CMU, April 29, 2002

  36. Audio Genre Context Extraction 1. Select TV programs of a known genre 2. Segment commercials 3. Tessellate the program part into units, such as the PSS based on close captions 4.Determine a probability for each PSS based on the vote/probability table 5. Sum up the votes for each vote/probability 6. Select the best vote/probability: context (probability) pattern ECE-CMU, April 29, 2002

  37. Vote/Probability Table and Results ECE-CMU, April 29, 2002

  38. Vote/Probability Results: News ECE-CMU, April 29, 2002

  39. Combining Content & Context • Final multimodal joint probabilities: P_FACETEXT_FIN = 0.0, P_FACETEXT_TALK = 1.0 P_AUDIO_FIN = 0.0, P_AUDIO_TALK = 1.0 P_CC_CAT_FIN = 0.5, P_CC_CAT_TALK = 0.5 • Final joint topic probabilities: P_FIN-TOPICS = 0.0, P_TALK-TOPICS = 0.5 • Accumulated classification results for first 12 segments: Class.: # of FIN SEGS = 2, # of TALK SEGS = 10, Comm. = 0 ECE-CMU, April 29, 2002

  40. Classification Results: Content and Context Integration Precision: 91.4%, Recall: 85.7% Precision: 81.1%, Recall: 86.9% ECE-CMU, April 29, 2002

  41. Classification Results: Financial News with and without Integration With context/content integration No context/content integration ECE-CMU, April 29, 2002

  42. Classification Results: Talk Show with and without Integration With context/content integration No context/content integration ECE-CMU, April 29, 2002

  43. Conclusion • Novel multimedia framework: • Representation: • Content data tessellation: granularity • Content semantic structure: abstraction • multimedia context • Multilayered content/context structure • Multimodal integration: • Context and context • probabilistic method: • Bayesian networks • hierarchical priors • Video Scout: beyond the TiVo paradigm ECE-CMU, April 29, 2002

  44. Achievements • Exhibitions • Philips Corporate Research Exhibition (CRE) 2001 • ICME 2000 Exhibition • ACM 2000 Exhibition • Customer presentation 2000 • 7 papers • 5 International conferences (presented) • 1 International conference (accepted) • 1 Journal paper (submitted) • 14 Patents (filed) ECE-CMU, April 29, 2002

  45. Acknowledgement • CIM team that collaborated in this work: • Nevenka Dimitrova • Lalitha Agnihotri • Jennifer Louie • Thomas McGee • Radu Jasinschi • Dongge Li • Mei Shi • John Zimmerman ECE-CMU, April 29, 2002

More Related