1 / 85

Multimedia Content Analysis via Computational Human Visual Model

JHU June 25, 2012. Multimedia Content Analysis via Computational Human Visual Model. Shenghua ZHONG Department of Computing The Hong Kong Polytechnic University www.comp.polyu.edu.hk/~csshzhong. Outline.

nayef
Télécharger la présentation

Multimedia Content Analysis via Computational Human Visual Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. JHU June 25, 2012 Multimedia Content Analysis via Computational Human Visual Model Shenghua ZHONG Department of Computing The Hong Kong Polytechnic University www.comp.polyu.edu.hk/~csshzhong

  2. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Proposed deep learning for multimedia content analysis 2

  3. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Proposed deep learning for multimedia content analysis 3

  4. Multimedia Content Analysis Definition of multimedia content analysis Computerized understanding of the semantic meanings in multimedia document [Wang et al, SPM, 2000] Difficulty in multimedia content analysis Semantic gap is the well-known challenge [Jiang et al, ACMMM, 2009] Low-level features computable by computer High-level concepts understandable by human Typical multimedia content analysis tasks [Amit, MIT, 2002] [Liu, et al, HCI, 2001] Quality assessment Object detection and recognition Indexing and annotation Classification and retrieval 4

  5. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Proposed deep learning for multimedia content analysis 5

  6. Introduction to Human Visual System • Definition of cognitive science • Science of mind which may be concisely defined as the study of the nature of intelligence, mainly about the nature of the human mind [Eckardt, MIT, 1995] • Definition of human visual system • One of research focus of cognitive science • The part of the central neural system which enables organisms to process visual information • Four processes of human visual system • Formation of image on retina • Visual processing in visual cortex • Attentional allocation • Perceptual processing

  7. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Proposed deep learning for multimedia content analysis 7

  8. Moment Invariants to Motion Blur for Water Reflection Detection and Recognition Fig. The proposed work about water reflection detection and recognition which background color is purple. 8

  9. Introduction to Water Reflection Definition The change in direction of a wavefront at an interface between two different media so that the wavefront returns into the original medium Special case of imperfect symmetry Importance Fig. Example about the influence of water reflection part. (a) is an example of image with water reflection. (b) is the correct segmentation result of (a). (c) is the actual segmentation result of (a). (d) is the color histogram of the whole image (a). (e) is the color histogram of the object part. 9

  10. Ineffectiveness of Existing Symmetry Technology Failure of Scale-invariant feature transform(SIFT) descriptor [Loy et al, ECCV, 2006] (b) (a) Fig. Examples of ineffectiveness of local feature in images with water reflection. (a) is the correct one, (b) is the result of SIFT descriptor matching result. The red circles are used to denote the SIFT detector results. The green lines are used to denote the matched SIFT descriptor pairs. It is easily to find the SIFT method is ineffective to the water reflection recognition or detection. 10 10

  11. Ineffective of Existing Water Reflection Detection Technology Failure of flip invariant shape detector [Zhang et al, ICPR, 2010] Fig. Examples of sharp detection result of invariant sharp technique. 11 11

  12. Basic Idea • Definition and influence of motion blur • The relative motion of sensor and the scene in this exposure time [Flusser et al, CS, 1996] • A well known degradation factor due to motion changes the image features needed for feature-based recognition techniques 12

  13. Moment Invariants to Motion Blur The geometric moment Central moments The centroid moment The complex moments 13 13

  14. High Frequency Energy Decay in Water Reflection (a) Original water reflection image (b) High frequency information part Fig.Decay of the information and energy in high-frequency band due to motion blur 14 14

  15. Flowchart 15

  16. Experiments and Results on Detection of Reflection Axis Database 100 nature images with water reflection from Google Compared algorithms Matching of SIFT descriptors to detect reflection axis [Loy et al, ECCV, 2006] Matching based on flip invariant shape detector [Zhang et al, ICPR, 2010] Results The accuracy of axis detection is 29% [Loy et al, ECCV, 2006] The accuracy of axis detection is 46% [Zhang et al, ICPR, 2010] The accuracy of axis detection of our algorithm is 87% 16

  17. Detection Results of Two Algorithms SIFT algorithm result Sharp algorithm result Our algorithm result Fig. Thumbnails of some comparision example images with reflection symmetry detection results 17 (i) (j) (k) (l) (m) (n) (o) (p) Fig.4. Thumbnails of some example images with water reflection and without water reflection in experiment I. The first two lines are images with water reflection and the last lines are nature images without water reflection.

  18. Distinguish Object Part and Reflection Part Reflection Part Object Part (a) Reversed water reflection image. (b) Positive Curvelet coefficients of object part (left) and reflection part (right). Fig. Object part and reflection part determined by Curvelet coefficients 18

  19. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Proposed deep learning for multimedia content analysis 19

  20. Top-down and Bottom-up Saliency Map for No-Reference Image Quality Assessment Fig. The proposed work about no-reference image quality assessment. The background color is purple. 20

  21. Introduction to No-Reference Image Quality Assessment • Definition of no-reference image quality assessment • Predefined correct image is not available • Mainly aim to measure the sharpness/blurriness • Difficulty • How to assess the quality in agree with human judgement • Limitation of existing work • Ignore cognitive understanding influences the perceived quality [Wang et al, TIP, 2004] (a) (b) (c) • Fig. Example of images quality influenced by the cognitive understanding. (a) Image without distortion (b) Blurriness mainly on the girl(c) Blurriness mainly on the apple 21

  22. Basic Idea • Combine semantic information from prior information to build the saliency map • Existing bottom-up saliency map does not match actual eye movements • Measure sharpness based on top-down and bottom-up saliency map modeling 22

  23. Target Information Acquisition in Whole Flowchart Fig. Flowchart illustrating of the proposed image sharpness assessment metric. The orange color part is the target informationacquisition. 23

  24. Target Information Acquisition • Tag:People, New York • Remove: New York • (not belong to physical entity) • Remain: People • Target information acquisition 24

  25. Saliency Map Model Learning in Whole Flowchart Fig. Flowchart illustrating of the proposed image sharpness assessment metric. The orange color part is the saliency map model learning. 25

  26. Flowchart of Top-Down & Bottom-Up Saliency Map Model Learning Fig.Flowchart illustrating of the proposed top-down & bottom-up model algorithm. 26

  27. Top-Down & Bottom-Up Saliency Map Model Learning • Learning the saliency map model by SVM • Ground truth map • Created by convolving the function of contrast sensitivity [Wang et al, TIP, 2001] over the fixation locations [Judd et al, ICCV, 2009] • An example of group truth map Function of contrast sensitivity Notations • e: Half resolution eccentricity constant • L: Image width. • v: Viewing distance. • N: fixation locations • M:Number of users In the example, N=15 and M=6. (a) Original image (b)Eye-fixation locations (c) Ground truth map 27

  28. Image Quality Assessment Based on Proposed Saliency Map Fig. Flowchart illustrating of the proposed image sharpness assessment metric. The orange color part is image quality assessment based on proposed saliency map. 28

  29. Experiment Setting of Top-Down and Bottom-Up Saliency Map Model • Dataset • From eye-tracking database[Judd et al, ICCV, 2009] • Training: 200 images • Test: 64 images • Training samples from ground truth map • Positive labelled data • Randomly choose 30 pixels from 10% most salient locations • Negative labelled data • Randomly choose 30 pixels from 10% least salient locations 29

  30. Compare Results of Two Saliency Map Model (a) Original image (b) Eye-fixation locations (c) Fixation points covered by bottom-upsaliency model (d) Fixation points covered by our saliency model Fig. An sample example tocompare coverage of fixation points by different saliency model. Table Evaluation of the proposed saliency map 30

  31. Experiment Setting and Result of Image Quality Assessment • Database • 160 images download from Flickr blurred with eight different Gaussian noises • Subjective image quality assessment • Rate from 1 to 5 corresponding to “very annoying”, “annoying”, “slightly annoying”, “perceptible but not annoying”, and “imperceptible” by 14 subjects • Evaluation Results 31

  32. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Proposed deep learning for multimedia content analysis 32

  33. Fuzzy Based Contextual Cueing for Region Level Annotation Fig. The proposed work about region level annotation. The background color is purple. 33

  34. Introduction to Region Level Annotation Annotation Water Cow Grass (a) An image with given image level annotations (b) An image with automatic region level annotation • Definition of region level annotation • Segment the image to semantic regions • Assign the given image level annotations to precise regions • Motivation of automatic region level annotation • Helpful to achieve reliable content-based image retrieval [Liu, ACM MM 09’] • Substitute tedious manually region-level annotation 34

  35. Representative Work of Region Level Annotation • Early work on region level annotation • Known as simultaneous object recognition and image segmentation • Unsupervised learning • Handle images with single major object or with clean background[Cao, ICCV 07’] • Supervised learning • Focus on special object recognition or special domain [Li, CVPR 09’] • Latest work for real-world applications • Label propagation by bi-layer sparse coding [Liu, ACM MM 09’] • Common annotations are more likely to have similar visual features in the corresponding regions • Show impressive results on nature images

  36. Limitation of Visual Similarity in Region Level Annotation Fig. Example of the difficulty to distinguish sky and sea based on visual feature. (a) The original image. (b) The original image with 200 data points. (c) – (f) 128 local features of four random points selected from sky and sea are shown.

  37. Contextual Cueing in Perception Processing • Contextual cueing • Human brains gather information by incidentally learned associations between spatial configurations and target locations [Chun, CP, 1998] • Spatial invariants [Biederman et al, CoP, 1982]: probability; co-occurrence; size; position; spatial topological relationship

  38. Contextual Cueing Modeling by Fuzzy Theory • The difficulty of modeling contextual cueing • Classical bivalent set theory causes serious semantic loss • Example of imprecise position and ambiguous topological relationship • Fuzzy theory • Measure the degree of the truth • Fuzzy membership to quantize the degree of truth • Fuzzy logic allows decision making using imprecise information (b) Example of topological relationship for object recognition (a) Example of ambiguous topological relationship

  39. Flowchart 39

  40. Illustration of Fuzzy Based Contextual Cueing Label Propagation Annotation Sky Water Beach Boat (a) Original image with given image-level annotations (b) Over segmentation result (c) Label propagation inter images (d) Label propagation using fuzzy based contextual cueing 40

  41. Experiment on MSRC Dataset Table 1. Label-to-region assignment accuracy comparison. • MSRC Dataset • 380 images with 18 categories • Including building, grass, tree, cow, boat, sheep, sky, mountain, aeroplane, water, bird, book, road, car, flower, cat, sign, and dog • Comparison methods • Four baseline methods implemented by binary SVM with different values of maximal patch size • SVM1: 150 pixels, SVM2: 200 pixels, SVM3: 400 pixels, and SVM4: 600 pixels • Two latest techniques [Liu et al , ACM MM09’] • Label propagation with one-layer sparse coding • Label propagation with bi-layer sparse coding • Experimental result 41

  42. Experiment Analysis on MSRC Dataset Annotation Sky Building Tree Road (a) An image with annotations (b) Bi-layer result (c) FCLP result Annotation Sky Building Tree Road Car (d) An image with annotations (e) Bi-layer result (f) FCLP result 42

  43. Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Proposed deep learning for multimedia content analysis 43

  44. Diagram of Deep Learning Fig. The proposed work about deep learning for multimedia content analysis. The background color is purple. 44

  45. Outline of Proposed Deep Learning Model • Introduction • Deep learning • Proposed algorithm • Bilinear deep belief networks • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on the Urban & Natural Scene • Experiments on Face Dataset CMU PIE • Field effect bilinear deep belief networks

  46. Introduction • Image classification is a classical problem • Aim to understand the semantic meaning of visual information • Determine the category of the images according to some predefined criteria • Image classification remains a well-known challenge after more than fifteen years extensive research • Humans do not have difficulty with classifying images • Aim of this paper • Provide human-like judgment by referencing the architecture of the human visual system and the procedure of intelligent perception • Deep architecture is a representative paradigm that has achieved notable success in modeling the human visual system

  47. Outline of Proposed Deep Learning Model • Introduction • Deep learning • Proposed algorithm • Bilinear deep belief networks • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on the Urban & Natural Scene • Experiments on Face Dataset CMU PIE • Field effect bilinear deep belief networks

  48. Research on Deep Learning • Definition of deep learning • Models learning task using deep architectures composed of multiple layers nonlinear modules • Deep belief network (DBN) • A densely-connected between layers • Utilize RBM as the basic block • Two stages: abstract input information layer by layer and fine-tune the whole deep network to the ultimate learning target [Hinton et al, NC, 2006] • Research progress • Deep architectures are thought as the best exemplified by neural networks [Cottrell, science, 2006] • DBN exhibits notable performance in different tasks, such as dimensionality reduction [Hinton et al, science, 2006] and classification [Salakhutdinov et al, AISTATS, 2007]

  49. Architecture of Deep Belief Network 1. The initial weighted connections are randomly constructed. 2. The size of every layer is determined based on intuition. 3. The parameter space is refined by the greedy layer-wise information reconstruction. 4. Repeat first to third stages until the parameter space in all layers is constructed. 5. The whole model is fine-tuned to minimize the classification error based on backpropagation. Fig. Structure of the deep belief network (DBN).

  50. Outline of Proposed Deep Learning Model • Introduction • Deep learning • Proposed algorithm • Bilinear deep belief networks • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on the Urban & Natural Scene • Experiments on Face Dataset CMU PIE • Field effect bilinear deep belief networks

More Related