Multimedia Content Analysis via Computational Human Visual Cognition

Multimedia Content Analysis via Computational Human Visual Cognition Shenghua ZHONG Department of Computing The Hong Kong Polytechnic University

Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Bilinear deep learning for image classification Multimedia Content Analysis via Computational Human Visual Cognition Analysis 2

Multimedia Content Analysis Definition of multimedia content analysis Computerized understanding of the semantic meanings of a multimedia document [Wang et al, SPM, 2000] Difficulty in multimedia content analysis Semantic gap is the well-known challenge [Jiang et al, ACMMM, 2009] Low-level features computable by computer High-level concepts understandable by human Typical multimedia content analysis tasks [Amit, MIT, 2002] [Liu, et al, HCI, 2001] Quality assessment Object detection and recognition Indexing and annotation Classification and retrieval Multimedia Content Analysis via Computational Human Visual Cognition Analysis 4

Introduction to Human Visual System • Definition of cognitive science • Science of mind which may be concisely defined as the study of the nature of intelligence, mainly about the nature of the human mind [Eckardt, MIT, 1995] • Definition of human visual system • One of research focus of cognitive science • The part of the central nervous system which enables organisms to process visual information • Four processes of human visual system • Formation of image on retina • Visual processing in visual cortex • Attentional guidance • Perception processing Multimedia Content Analysis via Computational Human Visual Cognition Analysis

Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition [Zhong et al ICMR, 2011] Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Bilinear deep learning for image classification Multimedia Content Analysis via Computational Human Visual Cognition Analysis 7

Moment Invariants to Motion Blur for Water Reflection Detection and Recognition Fig. The proposed work about water reflection detection and recognition which background color is purple. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 8

Introduction to Water Reflection What is water reflection The change in direction of a wavefront at an interface between two different media so that the wavefront returns into the medium from which it originated Special case of imperfect symmetry Water reflection is very important to multimedia content analysis Fig. Example about the influence of water reflection part. (a) is an example of image with water reflection. (b) is the correct segmentation result of (a). (c) is the actual segmentation result of (a). (d) is the color histogram of the whole image (a). (e) is the color histogram of the object part of (a). Multimedia Content Analysis via Computational Human Visual Cognition Analysis 9

Ineffective of Existing Symmetry Technology Example based on Scale-invariant feature transform(SIFT) descriptor in [Loy et al, ECCV, 2006] (b) (a) Fig. Examples of ineffectiveness of local feature in images with water reflection. (a) is the correct one, (b) is the result of SIFT descriptor matching result. The red circles are used to denote the SIFT detector results. The green lines are used to denote the matched SIFT descriptor pairs. It is easily to find the SIFT method is ineffective to the water reflection recognition or detection. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 10 10

Ineffective of Existing Water Reflection Detection Technology Example based on flip invariant shape detector in [Zhang et al, ICPR, 2010] Fig. Examples of sharp detection result of invariant sharp technique. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 11 11

Basic Idea • Characteristics of image formation • Human eyes expose pictures around 30 times per second • The relative motion of sensor and the scene in this exposure time will lead to motion blur[Flusser et al, CS, 1996] • Definition and influence of motion blur • A well known degradation factor due to motion changes the image features needed for feature-based recognition techniques Multimedia Content Analysis via Computational Human Visual Cognition Analysis 12

Moment Invariants to Motion Blur The geometric moment Central moments The centroid moment The complex moments Multimedia Content Analysis via Computational Human Visual Cognition Analysis 13 13

High Frequency Energy Decay in Water Reflection (a) Original water reflection image (b) High frequency information part Fig. Decay of the information and energy in high-frequency band due to motion blur Multimedia Content Analysis via Computational Human Visual Cognition Analysis 14 14

Flowchart Multimedia Content Analysis via Computational Human Visual Cognition Analysis 15

Experiments and Results on Detection of Reflection Axis Database 100 nature images with water reflection from Google Compared algorithms Matching of SIFT descriptors to detect reflection axis [Loy et al, ECCV, 2006] Matching based on flip invariant shape detector [Zhang et al, ICPR, 2010] Results The accuracy of axis detection is 29% [Loy et al, ECCV, 2006] The accuracy of axis detection is 46% [Zhang et al, ICPR, 2010] The accuracy of axis detection of our algorithm is 87% Multimedia Content Analysis via Computational Human Visual Cognition Analysis 16

Detection Results of Two Algorithms SIFT algorithm result Sharp algorithm result Our algorithm result Fig. Thumbnails of some comparision example images with reflection symmetry detection results Multimedia Content Analysis via Computational Human Visual Cognition Analysis 17 (i) (j) (k) (l) (m) (n) (o) (p) Fig.4. Thumbnails of some example images with water reflection and without water reflection in experiment I. The first two lines are images with water reflection and the last lines are nature images without water reflection.

Distinguish Object Part and Reflection Part Reflection Part Object Part (a) Reversed water reflection image. (b) Positive Curvelet coefficients of object part (left) and reflection part (right). Fig. Object part and reflection part determined by Curvelet coefficients Multimedia Content Analysis via Computational Human Visual Cognition Analysis 18

Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition [Zhong et al ICMR, 2011] Top-down and bottom-up saliency map for no-reference image quality assessment [Zhong et al ICIP, 2010] Fuzzy based contextual cueing for region level annotation Bilinear deep learning for image classification Multimedia Content Analysis via Computational Human Visual Cognition Analysis 19

Top-down and Bottom-up Saliency Map for No-Reference Image Quality Assessment Fig. The proposed work about no-reference image quality assessment. The background color is purple. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 20

Introduction to No-Reference Image Quality Assessment • Definition of no-reference image quality assessment • Predefined correct image is not available • Mainly aim to measure the sharpness/blurriness • Difficulty • How to assess the quality in agree with human judgement • Limitation of existing work • Ignore cognitive understanding influences the perceived quality [Wang et al, TIP, 2004] (a) (b) (c) • Fig. Example of images quality influenced by the cognitive understanding. (a) Image without distortion (b) Blurriness mainly on the girl(c) Blurriness mainly on the apple Multimedia Content Analysis via Computational Human Visual Cognition Analysis 21

Basic Idea • Combine semantic information from prior information to build the saliency map • Existing bottom-up saliency map does not match actual eye movements • Measure sharpness based on top-down and bottom-up saliency map modeling Multimedia Content Analysis via Computational Human Visual Cognition Analysis 22

Target Information Acquisition in Whole Flowchart Fig. Flowchart illustrating of the proposed image sharpness assessment metric. The orange color part is the target informationacquisition. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 23

Target Information Acquisition • Tag:People, New York • Remove: New York • (not belong to physical entity) • Remain: People • Target information acquisition Multimedia Content Analysis via Computational Human Visual Cognition Analysis 24

Saliency Map Model Learning in Whole Flowchart Fig. Flowchart illustrating of the proposed image sharpness assessment metric. The orange color part is the saliency map model learning. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 25

Flowchart of Top-Down & Bottom-Up Saliency Map Model Learning Fig.Flowchart illustrating of the proposed top-down & bottom-up model algorithm. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 26

Top-Down & Bottom-Up Saliency Map Model Learning • Learning the saliency map model by binary SVM • Ground truth map • Created by convolving the function of contrast sensitivity [Wang et al, TIP, 2001] over the fixation locations [Judd et al, ICCV, 2009] • An example of group truth map Function of contrast sensitivity Notations • e: Half resolution eccentricity constant • L: Image width. • v: Viewing distance. • N: fixation locations • M:Number of users In the example, N=15 and M=6. (a) original image (b)Eye-fixation locations (c) Ground truth map Multimedia Content Analysis via Computational Human Visual Cognition Analysis 27

Image Quality Assessment Based on Proposed Saliency Map Fig. Flowchart illustrating of the proposed image sharpness assessment metric. The orange color part is image quality assessment based on proposed saliency map. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 28

Experiment Setting of Top-Down and Bottom-Up Saliency Map Model • Dataset • From eye-tracking database[Judd et al, ICCV, 2009] • Training: 200 images • Testing: 64 images • Training samples from ground truth map • Positive labelled data • Randomly choose 30 saliency pixels from 10% most salient locations • Negative labelled data • Randomly choose 30 saliency pixels from 10% least salient locations Multimedia Content Analysis via Computational Human Visual Cognition Analysis 29

Compare Results of Two Saliency Map Model (a) Original image (b) Eye-fixation locations (c) Fixation points covered by bottom-upsaliency model (d) Fixation points covered by our saliency model Fig. An sample example tocompare coverage of fixation points by different saliency model. Table Evaluation of the proposed saliency map Multimedia Content Analysis via Computational Human Visual Cognition Analysis 30

Experiment Setting and Result of Image Quality Assessment • Database • 160 images download from Flickr blurred with eight different Gaussian masks • Subjective image quality assessment • Fourteen subjects rated the images • Rate from 1 to 5 corresponding to “very annoying”, “annoying”, “slightly annoying”, “perceptible but not annoying”, and “imperceptible” • Evaluation Results Multimedia Content Analysis via Computational Human Visual Cognition Analysis 31

A Better Saliency Map via Unsupervised Learning Fashion • Difference of Gaussian transform • Lack scale-space information • Wavelet transform • Lack directional information • Proposed feature extraction transform [Zhong et al ICIMCS, 2011] • 2D Gabor transform • Frequency and orientation representations are similar to those of the human visual system • Good ability to represent spatial localization information • Curvelets transform • Subtlecapability to resolve directional feature than wavelet transform • Improved ability to represent edges and other singularities along curves Multimedia Content Analysis via Computational Human Visual Cognition Analysis 32

Bottom-up Saliency Detection • Preattentive features • Red/green (RG) color • Blue/yellow (BY) color • Intensity (I) • Feature map • Garbor channel • Curvelet channel • Activation maps • The feature maps are across-scale combined and normalized into activation maps Multimedia Content Analysis via Computational Human Visual Cognition Analysis 33

Framework of Gabor & Curvelets based Saliency Map (GCSMP) Multimedia Content Analysis via Computational Human Visual Cognition Analysis 34

Bottom-up Saliency Detection • Dataset • Including 779 landscape images and 228 portrait images with 15 subjects’ eye tracking data [Tilke et.al, ICCV, 2009] • ROC area comparison • ROC curves can be plotted as the False Positive Rate vs. Hit Rate • ROC area can be calculated as the area under the ROC curve to demonstrate the overall performance of a saliency model Table . ROC area comparison based on bottom-up model • Basic Itti [Itti et. al, Vision Res., 00’] Graph Gabor [Harel et. al, NIPS 06’] • DOG [Achanta et. al, CVPR 09’] Log-Gabor [Wang et. al, ACMMM10’] Multimedia Content Analysis via Computational Human Visual Cognition Analysis 35

Experiment Results Integrated with Center Bias • Performance comparison with center bias • Average ROC curve comparison Table . The comparison of ROC area with center bias Figure .The ROC curves of our model and the other models. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 36

Experiment Results Integrated with Object Information • Face as object information channel • People or its hyponym of words are the most popular tags in Flickr • The inferior temporal cortex (IT) fires when face appears in its receptive field • Performance integrated with face detection channel • Images with face: ROC area increases from 0.8180 to 0.8281 • Images without face: ROC area decrease from 0.8086 to 0.7999 Multimedia Content Analysis via Computational Human Visual Cognition Analysis 37

Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation [Zhong et al ICIMCS, 2010] Bilinear deep learning for image classification Multimedia Content Analysis via Computational Human Visual Cognition Analysis 38

Fuzzy Based Contextual Cueing for Region Level Annotation Fig. The proposed work about region level annotation. The background color is purple. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 39

Introduction to Region Level Annotation • Region level annotation • Assignment of the given image-level annotations to the precise regions automatically • Basic idea of representative work: common annotation are more likely to have similar visual features [Li et al, CVPR, 2009], [Liu et al, ACMMM, 2009] • Visual similarity doesn’t work for all cases Fig. Example of the difficulty to distinguish sky and sea based on visual feature. (a) The original image. (b) The original image with 200 data points. (c) – (f) 128 local features of four random points selected from sky and sea are shown. Multimedia Content Analysis via Computational Human Visual Cognition Analysis

Contextual Cueing in Perception Processing • Contextual cueing • Human brains gather information by incidentally learned associations between spatial configurations and target locations [Chun, CP, 1998] • Spatial invariants [Biederman et al, CoP, 1982]: probability; co-occurrence; size; position; spatial topological relationship • Model spatial invariants by fuzzy theory Multimedia Content Analysis via Computational Human Visual Cognition Analysis

Contextual Cueing Modeling by Fuzzy Theory • The difficulty of modeling contextual cueing • Classical bivalent set theory causes serious semantic loss • Example of imprecise position and ambiguous topological relationship • Fuzzy theory • Measure the degree of the truth • Fuzzy membership to quantize the degree of truth • Fuzzy logic allows decision making using imprecise information (b) Example of topological relationship for object recognition (a) Example of ambiguous topological relationship Multimedia Content Analysis via Computational Human Visual Cognition Analysis

Flowchart Multimedia Content Analysis via Computational Human Visual Cognition Analysis 43

Illustration of Fuzzy Based Contextual Cueing Label Propagation Annotation Sky Water Beach Boat (a) Original image with given image-level annotations (b) Over segmentation result (c) Label propagation inter images (d) Label propagation using fuzzy based contextual cueing Multimedia Content Analysis via Computational Human Visual Cognition Analysis 44

Experiment on MSRC Dataset Table 1. Label-to-region assignment accuracy comparison. • MSRC Dataset • 380 images with 18 categories • Including building, grass, tree, cow, boat, sheep, sky, mountain, aeroplane, water, bird, book, road, car, flower, cat, sign, and dog • Comparison methods • Four baseline methods implemented by binary SVM with different values of maximal patch size • SVM1: 150 pixels, SVM2: 200 pixels, SVM3: 400 pixels, and SVM4: 600 pixels • Two latest techniques [Liu et al , ACM MM09’] • Label propagation with one-layer sparse coding • Label propagation with bi-layer sparse coding • Experimental result Multimedia Content Analysis via Computational Human Visual Cognition Analysis 45

Experiment Analysis on MSRC Dataset Annotation Sky Building Tree Road (a) An image with annotations (b) Bi-layer result (c) FCLP result Annotation Sky Building Tree Road Car (d) An image with annotations (e) Bi-layer result (f) FCLP result Multimedia Content Analysis via Computational Human Visual Cognition Analysis 46

Outline Introduction to multimedia content analysis Introduction to human visual system Moment invariants to motion blur for water reflection detection and recognition Top-down and bottom-up saliency map for no-reference image quality assessment Fuzzy based contextual cueing for region level annotation Bilinear deep learning for image classification [Zhong et al, submitted to ACMMM, 2011] Multimedia Content Analysis via Computational Human Visual Cognition Analysis 47

Diagram of Deep Learning Fig. The proposed work about deep learning for multimedia content analysis. The background color is purple. Multimedia Content Analysis via Computational Human Visual Cognition Analysis 48

Outline of Bilinear Deep Belief Network (BDBN) • Introduction • Research progress on deep learning • Proposed algorithm • Architecture of BDBN • Learning stages of BDBN • Bilinear discriminant initialization • Greedy layer-wise reconstruction • Global fine-tuning • Experiments and results • Experiment on Handwriting Dataset MNIST • Experiment on Complicated Object Dataset Caltech 101 • Experiments on the Urban & Natural Scene • Experiments on Face Dataset CMU PIE • Conclusion and future work Multimedia Content Analysis via Computational Human Visual Cognition Analysis

Introduction • Definition of image classification • A classical problem in multimedia content analysis, aims to understand the semantic meaning of visual information and determine the category of the images according to some predefined criteria • Related work on image classification • Parametric classifiers • Require an intensive training phase of the classifier parameters • SVM [Kumar et al, ICCV, 2007], Boosting [Opelt et al, ECCV, 2004], decision trees [Bosch et al, ICCV, 2007], web graphs [Mahaian et al, ACMMM, 2010] • Nonparametric classifiers • Make classification decisions directly on the data, and require no training of parameters [Boiman et al, CVPR, 2008] Multimedia Content Analysis via Computational Human Visual Cognition Analysis

Multimedia Content Analysis via Computational Human Visual Cognition

Multimedia Content Analysis via Computational Human Visual Cognition

Presentation Transcript

COM 633: Content Analysis Human-coded Content Analysis; The Coding Scheme

Content Design for Multimedia

Computational Human Genetics

Computational Human Genetics

Visual Cognition

Visual Cognition I basic processes

Enhancing Human-Machine Communication via Visual Attributes

Human Auditory Cognition

Investigating Visual Cognition with Electrophysiology

Human Auditory Cognition 2014

Multimedia via Data Networks

Visual Analysis of Human/Object Motion

Visual Cognition II Object Perception

Multimedia Content Analysis on Clusters and Grids

Multimedia Content Analysis via Computational Human Visual Model

Visual Cognition

Computational Models of Emotion and cognition

COM 633: Content Analysis Human-coded Content Analysis; The Coding Scheme

Visual Cognition

Multimedia Content Analysis on Clusters and Grids

Multimedia Content Analysis on Clusters and Grids

Content-Based Multimedia Access