1 / 80

Multimedia Content Analysis via Computational Human Visual Cognition

Multimedia Content Analysis via Computational Human Visual Cognition. Shenghua ZHONG Department of Computing The Hong Kong Polytechnic University. Outline. Introduction to multimedia content analysis Introduction to human visual system

paulos
Télécharger la présentation

Multimedia Content Analysis via Computational Human Visual Cognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimedia Content Analysis via Computational Human Visual Cognition Shenghua ZHONG Department of Computing The Hong Kong Polytechnic University

  2. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Bilinear deep learning for image classification Multimedia Content Analysis via Computational Human Visual Cognition Analysis 2

  3. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Bilinear deep learning for image classification Multimedia Content Analysis via Computational Human Visual Cognition Analysis 3

  4. Multimedia Content Analysis Definition of multimedia content analysis Computerized understanding of the semantic meanings of a multimedia document [Wang et al, SPM, 2000] Difficulty in multimedia content analysis Semantic gap is the well-known challenge [Jiang et al, ACMMM, 2009] Low-level features computable by computer High-level concepts understandable by human Typical multimedia content analysis tasks [Amit, MIT, 2002] [Liu, et al, HCI, 2001] Quality assessment Object detection and recognition Indexing and annotation Classification and retrieval Multimedia Content Analysis via Computational Human Visual Cognition Analysis 4

  5. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Bilinear deep learning for image classification Multimedia Content Analysis via Computational Human Visual Cognition Analysis 5

  6. Introduction to Human Visual System • Definition of cognitive science • Science of mind which may be concisely defined as the study of the nature of intelligence, mainly about the nature of the human mind [Eckardt, MIT, 1995] • Definition of human visual system • One of research focus of cognitive science • The part of the central nervous system which enables organisms to process visual information • Four processes of human visual system • Formation of image on retina • Visual processing in visual cortex • Attentional guidance • Perception processing Multimedia Content Analysis via Computational Human Visual Cognition Analysis

  7. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition [Zhong et al ICMR, 2011] Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Bilinear deep learning for image classification Multimedia Content Analysis via Computational Human Visual Cognition Analysis 7

  8. Moment Invariants to Motion Blur for Water Reflection Detection and Recognition Fig. The proposed work about water reflection detection and recognition which background color is purple. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 8

  9. Introduction to Water Reflection What is water reflection The change in direction of a wavefront at an interface between two different media so that the wavefront returns into the medium from which it originated Special case of imperfect symmetry Water reflection is very important to multimedia content analysis Fig. Example about the influence of water reflection part. (a) is an example of image with water reflection. (b) is the correct segmentation result of (a). (c) is the actual segmentation result of (a). (d) is the color histogram of the whole image (a). (e) is the color histogram of the object part of (a). Multimedia Content Analysis via Computational Human Visual Cognition Analysis 9

  10. Ineffective of Existing Symmetry Technology Example based on Scale-invariant feature transform(SIFT) descriptor in [Loy et al, ECCV, 2006] (b) (a) Fig. Examples of ineffectiveness of local feature in images with water reflection. (a) is the correct one, (b) is the result of SIFT descriptor matching result. The red circles are used to denote the SIFT detector results. The green lines are used to denote the matched SIFT descriptor pairs. It is easily to find the SIFT method is ineffective to the water reflection recognition or detection. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 10 10

  11. Ineffective of Existing Water Reflection Detection Technology Example based on flip invariant shape detector in [Zhang et al, ICPR, 2010] Fig. Examples of sharp detection result of invariant sharp technique. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 11 11

  12. Basic Idea • Characteristics of image formation • Human eyes expose pictures around 30 times per second • The relative motion of sensor and the scene in this exposure time will lead to motion blur[Flusser et al, CS, 1996] • Definition and influence of motion blur • A well known degradation factor due to motion changes the image features needed for feature-based recognition techniques Multimedia Content Analysis via Computational Human Visual Cognition Analysis 12

  13. Moment Invariants to Motion Blur The geometric moment Central moments The centroid moment The complex moments Multimedia Content Analysis via Computational Human Visual Cognition Analysis 13 13

  14. High Frequency Energy Decay in Water Reflection (a) Original water reflection image (b) High frequency information part Fig. Decay of the information and energy in high-frequency band due to motion blur Multimedia Content Analysis via Computational Human Visual Cognition Analysis 14 14

  15. Flowchart Multimedia Content Analysis via Computational Human Visual Cognition Analysis 15

  16. Experiments and Results on Detection of Reflection Axis Database 100 nature images with water reflection from Google Compared algorithms Matching of SIFT descriptors to detect reflection axis [Loy et al, ECCV, 2006] Matching based on flip invariant shape detector [Zhang et al, ICPR, 2010] Results The accuracy of axis detection is 29% [Loy et al, ECCV, 2006] The accuracy of axis detection is 46% [Zhang et al, ICPR, 2010] The accuracy of axis detection of our algorithm is 87% Multimedia Content Analysis via Computational Human Visual Cognition Analysis 16

  17. Detection Results of Two Algorithms SIFT algorithm result Sharp algorithm result Our algorithm result Fig. Thumbnails of some comparision example images with reflection symmetry detection results Multimedia Content Analysis via Computational Human Visual Cognition Analysis 17 (i) (j) (k) (l) (m) (n) (o) (p) Fig.4. Thumbnails of some example images with water reflection and without water reflection in experiment I. The first two lines are images with water reflection and the last lines are nature images without water reflection.

  18. Distinguish Object Part and Reflection Part Reflection Part Object Part (a) Reversed water reflection image. (b) Positive Curvelet coefficients of object part (left) and reflection part (right). Fig. Object part and reflection part determined by Curvelet coefficients Multimedia Content Analysis via Computational Human Visual Cognition Analysis 18

  19. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition [Zhong et al ICMR, 2011] Top-down and bottom-up saliency map for no-reference image quality assessment [Zhong et al ICIP, 2010] Fuzzy based contextual cueing for region level annotation Bilinear deep learning for image classification Multimedia Content Analysis via Computational Human Visual Cognition Analysis 19

  20. Top-down and Bottom-up Saliency Map for No-Reference Image Quality Assessment Fig. The proposed work about no-reference image quality assessment. The background color is purple. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 20

  21. Introduction to No-Reference Image Quality Assessment • Definition of no-reference image quality assessment • Predefined correct image is not available • Mainly aim to measure the sharpness/blurriness • Difficulty • How to assess the quality in agree with human judgement • Limitation of existing work • Ignore cognitive understanding influences the perceived quality [Wang et al, TIP, 2004] (a) (b) (c) • Fig. Example of images quality influenced by the cognitive understanding. (a) Image without distortion (b) Blurriness mainly on the girl(c) Blurriness mainly on the apple Multimedia Content Analysis via Computational Human Visual Cognition Analysis 21

  22. Basic Idea • Combine semantic information from prior information to build the saliency map • Existing bottom-up saliency map does not match actual eye movements • Measure sharpness based on top-down and bottom-up saliency map modeling Multimedia Content Analysis via Computational Human Visual Cognition Analysis 22

  23. Target Information Acquisition in Whole Flowchart Fig. Flowchart illustrating of the proposed image sharpness assessment metric. The orange color part is the target informationacquisition. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 23

  24. Target Information Acquisition • Tag:People, New York • Remove: New York • (not belong to physical entity) • Remain: People • Target information acquisition Multimedia Content Analysis via Computational Human Visual Cognition Analysis 24

  25. Saliency Map Model Learning in Whole Flowchart Fig. Flowchart illustrating of the proposed image sharpness assessment metric. The orange color part is the saliency map model learning. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 25

  26. Flowchart of Top-Down & Bottom-Up Saliency Map Model Learning Fig.Flowchart illustrating of the proposed top-down & bottom-up model algorithm. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 26

  27. Top-Down & Bottom-Up Saliency Map Model Learning • Learning the saliency map model by binary SVM • Ground truth map • Created by convolving the function of contrast sensitivity [Wang et al, TIP, 2001] over the fixation locations [Judd et al, ICCV, 2009] • An example of group truth map Function of contrast sensitivity Notations • e: Half resolution eccentricity constant • L: Image width. • v: Viewing distance. • N: fixation locations • M:Number of users In the example, N=15 and M=6. (a) original image (b)Eye-fixation locations (c) Ground truth map Multimedia Content Analysis via Computational Human Visual Cognition Analysis 27

  28. Image Quality Assessment Based on Proposed Saliency Map Fig. Flowchart illustrating of the proposed image sharpness assessment metric. The orange color part is image quality assessment based on proposed saliency map. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 28

  29. Experiment Setting of Top-Down and Bottom-Up Saliency Map Model • Dataset • From eye-tracking database[Judd et al, ICCV, 2009] • Training: 200 images • Testing: 64 images • Training samples from ground truth map • Positive labelled data • Randomly choose 30 saliency pixels from 10% most salient locations • Negative labelled data • Randomly choose 30 saliency pixels from 10% least salient locations Multimedia Content Analysis via Computational Human Visual Cognition Analysis 29

  30. Compare Results of Two Saliency Map Model (a) Original image (b) Eye-fixation locations (c) Fixation points covered by bottom-upsaliency model (d) Fixation points covered by our saliency model Fig. An sample example tocompare coverage of fixation points by different saliency model. Table Evaluation of the proposed saliency map Multimedia Content Analysis via Computational Human Visual Cognition Analysis 30

  31. Experiment Setting and Result of Image Quality Assessment • Database • 160 images download from Flickr blurred with eight different Gaussian masks • Subjective image quality assessment • Fourteen subjects rated the images • Rate from 1 to 5 corresponding to “very annoying”, “annoying”, “slightly annoying”, “perceptible but not annoying”, and “imperceptible” • Evaluation Results Multimedia Content Analysis via Computational Human Visual Cognition Analysis 31

  32. A Better Saliency Map via Unsupervised Learning Fashion • Difference of Gaussian transform • Lack scale-space information • Wavelet transform • Lack directional information • Proposed feature extraction transform [Zhong et al ICIMCS, 2011] • 2D Gabor transform • Frequency and orientation representations are similar to those of the human visual system • Good ability to represent spatial localization information • Curvelets transform • Subtlecapability to resolve directional feature than wavelet transform • Improved ability to represent edges and other singularities along curves Multimedia Content Analysis via Computational Human Visual Cognition Analysis 32

  33. Bottom-up Saliency Detection • Preattentive features • Red/green (RG) color • Blue/yellow (BY) color • Intensity (I) • Feature map • Garbor channel • Curvelet channel • Activation maps • The feature maps are across-scale combined and normalized into activation maps Multimedia Content Analysis via Computational Human Visual Cognition Analysis 33

  34. Framework of Gabor & Curvelets based Saliency Map (GCSMP) Multimedia Content Analysis via Computational Human Visual Cognition Analysis 34

  35. Bottom-up Saliency Detection • Dataset • Including 779 landscape images and 228 portrait images with 15 subjects’ eye tracking data [Tilke et.al, ICCV, 2009] • ROC area comparison • ROC curves can be plotted as the False Positive Rate vs. Hit Rate • ROC area can be calculated as the area under the ROC curve to demonstrate the overall performance of a saliency model Table . ROC area comparison based on bottom-up model • Basic Itti [Itti et. al, Vision Res., 00’] Graph Gabor [Harel et. al, NIPS 06’] • DOG [Achanta et. al, CVPR 09’] Log-Gabor [Wang et. al, ACMMM10’] Multimedia Content Analysis via Computational Human Visual Cognition Analysis 35

  36. Experiment Results Integrated with Center Bias • Performance comparison with center bias • Average ROC curve comparison Table . The comparison of ROC area with center bias Figure .The ROC curves of our model and the other models. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 36

  37. Experiment Results Integrated with Object Information • Face as object information channel • People or its hyponym of words are the most popular tags in Flickr • The inferior temporal cortex (IT) fires when face appears in its receptive field • Performance integrated with face detection channel • Images with face: ROC area increases from 0.8180 to 0.8281 • Images without face: ROC area decrease from 0.8086 to 0.7999 Multimedia Content Analysis via Computational Human Visual Cognition Analysis 37

  38. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation [Zhong et al ICIMCS, 2010] Bilinear deep learning for image classification Multimedia Content Analysis via Computational Human Visual Cognition Analysis 38

  39. Fuzzy Based Contextual Cueing for Region Level Annotation Fig. The proposed work about region level annotation. The background color is purple. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 39

  40. Introduction to Region Level Annotation • Region level annotation • Assignment of the given image-level annotations to the precise regions automatically • Basic idea of representative work: common annotation are more likely to have similar visual features [Li et al, CVPR, 2009], [Liu et al, ACMMM, 2009] • Visual similarity doesn’t work for all cases Fig. Example of the difficulty to distinguish sky and sea based on visual feature. (a) The original image. (b) The original image with 200 data points. (c) – (f) 128 local features of four random points selected from sky and sea are shown. Multimedia Content Analysis via Computational Human Visual Cognition Analysis

  41. Contextual Cueing in Perception Processing • Contextual cueing • Human brains gather information by incidentally learned associations between spatial configurations and target locations [Chun, CP, 1998] • Spatial invariants [Biederman et al, CoP, 1982]: probability; co-occurrence; size; position; spatial topological relationship • Model spatial invariants by fuzzy theory Multimedia Content Analysis via Computational Human Visual Cognition Analysis

  42. Contextual Cueing Modeling by Fuzzy Theory • The difficulty of modeling contextual cueing • Classical bivalent set theory causes serious semantic loss • Example of imprecise position and ambiguous topological relationship • Fuzzy theory • Measure the degree of the truth • Fuzzy membership to quantize the degree of truth • Fuzzy logic allows decision making using imprecise information (b) Example of topological relationship for object recognition (a) Example of ambiguous topological relationship Multimedia Content Analysis via Computational Human Visual Cognition Analysis

  43. Flowchart Multimedia Content Analysis via Computational Human Visual Cognition Analysis 43

  44. Illustration of Fuzzy Based Contextual Cueing Label Propagation Annotation Sky Water Beach Boat (a) Original image with given image-level annotations (b) Over segmentation result (c) Label propagation inter images (d) Label propagation using fuzzy based contextual cueing Multimedia Content Analysis via Computational Human Visual Cognition Analysis 44

  45. Experiment on MSRC Dataset Table 1. Label-to-region assignment accuracy comparison. • MSRC Dataset • 380 images with 18 categories • Including building, grass, tree, cow, boat, sheep, sky, mountain, aeroplane, water, bird, book, road, car, flower, cat, sign, and dog • Comparison methods • Four baseline methods implemented by binary SVM with different values of maximal patch size • SVM1: 150 pixels, SVM2: 200 pixels, SVM3: 400 pixels, and SVM4: 600 pixels • Two latest techniques [Liu et al , ACM MM09’] • Label propagation with one-layer sparse coding • Label propagation with bi-layer sparse coding • Experimental result Multimedia Content Analysis via Computational Human Visual Cognition Analysis 45

  46. Experiment Analysis on MSRC Dataset Annotation Sky Building Tree Road (a) An image with annotations (b) Bi-layer result (c) FCLP result Annotation Sky Building Tree Road Car (d) An image with annotations (e) Bi-layer result (f) FCLP result Multimedia Content Analysis via Computational Human Visual Cognition Analysis 46

  47. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Bilinear deep learning for image classification [Zhong et al, submitted to ACMMM, 2011] Multimedia Content Analysis via Computational Human Visual Cognition Analysis 47

  48. Diagram of Deep Learning Fig. The proposed work about deep learning for multimedia content analysis. The background color is purple. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 48

  49. Outline of Bilinear Deep Belief Network (BDBN) • Introduction • Research progress on deep learning • Proposed algorithm • Architecture of BDBN • Learning stages of BDBN • Bilinear discriminant initialization • Greedy layer-wise reconstruction • Global fine-tuning • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on the Urban & Natural Scene • Experiments on Face Dataset CMU PIE • Conclusion and future work Multimedia Content Analysis via Computational Human Visual Cognition Analysis

  50. Introduction • Definition of image classification • A classical problem in multimedia content analysis, aims to understand the semantic meaning of visual information and determine the category of the images according to some predefined criteria • Related work on image classification • Parametric classifiers • Require an intensive training phase of the classifier parameters • SVM [Kumar et al, ICCV, 2007], Boosting [Opelt et al, ECCV, 2004], decision trees [Bosch et al, ICCV, 2007], web graphs [Mahaian et al, ACMMM, 2010] • Nonparametric classifiers • Make classification decisions directly on the data, and require no training of parameters [Boiman et al, CVPR, 2008] Multimedia Content Analysis via Computational Human Visual Cognition Analysis

More Related