Sensible Visual Search

Sensible Visual Search Shih-Fu Chang Digital Video and Multimedia Lab Columbia University www.ee.columbia.edu/dvmm June 2008 (Joint Work with Eric Zavesky and Lyndon Kennedy)

User Expectation for Web Search • Keyword search is still the primary search method • Straightforward extension to visual search “…type in a few words at most, then expect the engine to bring back the perfect results. More than 95 percent of us never use the advanced search features most engines include, …” – The Search, J. Battelle, 2003

Keyword-based Visual Search Paradigm

Web Image Search • Text Query : “Manhattan Cruise” over Goggle Image • What are in the results? • Why are these images returned? • How to choose better search terms?

Minor Changes in Keywords  Big Difference • Text Query : “Cruise around Manhattan”

+ - Semantic Indexes Anchor Snow Soccer Building Outdoor When metadata are unavailable:Automatic Image Classification • Audio-visual features • Geo, social features • SVM or graph models • Context fusion • Rich semantic description based on content analysis . . . Statistical models

A few good detectors for LSCOM concepts waterfront bridge crowd explosion fire US flag Military personnel Remember there are many not so good detectors.

Keyword Search over Statistical Detector Scores Columbia374: Objects, people, location, scenes, events, etc Concepts defined by expert analysts over news video www.ee.columbia.edu/cuvidsearch

Query “car crash snow” over TRECVID video using LSCOM concepts • How are keywords mapped to concepts? • What classifiers work? What don’t? • How to improve the search terms?

Frustration of Uninformed Users of Keyword Search • Difficult to choose meaningful words/concepts without in-depth knowledge of entire vocabulary

Pains of Uninformed Users • Forced to take “one shot” searches, iterating queries with a trial and error approach...

Challenge: user frustration in visual search ? Research needed to address user frustration A lot of work on content analytics

Proposal: Sensible SearchMake search experience more Sensible • Help users stay “informed” • in selecting effective keywords/concepts • in understanding the search results • in manipulating the search criteria rapidly and flexibly • Keep users engaged • Instant feedback with minimal disruptionas opposed to “trial-and-error”

A prototype CuZero: Zero-Latency Informed Search & Navigation

Informed User: Instant Informed Query Formulation

Informed User for Visual Search:Instant visual concept suggestion Instant Concept Suggestion query time concept mining

Lexical mapping LSCOM Mapping keywords to concept definition, synonyms, sense context, etc

Co-occurrent concepts car images road Baghdad to attend the game I see more goals and the players did not offer great that Beijing Games as the beginning of his brilliance Nayyouf 10 this atmosphere the culture of sports championship George Rizq led Hertha for the local basketball game the wisdom and sports championship of the president Basketballcourts and the American won the Saint Denis on the Phoenix Suns because of the 50 point for 19 in their role within the National Association of Basketball text

results visual mining dominant concepts person suits Query-Time Concept Mining

CuZero Real-Time Query Interface (demo) Auto-completespeech transcripts Instant Concept Suggestion

A prototype CuZero: Zero-Latency Informed Search & Navigation (Zavesky and Chang, MIR2008)

CMU Informedia Concept Filter only outdoor only people Informed User:Intuitive Exploration of Results linear browser restricts inspection flexibility

Informed User:Rapid Exploration of Results Media Mill Rotor Browser

Revisit the user struggle… Car detector Snow detector How did each concept influence the results? Car crash detector Query: {car, snow, car_crash}

“sky” “water” “boat” CuZero:Real-Time Multi-Concept Navigation Map • Create a multi-concept gradient map • Direct user control: “nearness” = “more influence” • Instant display for each location, without new query

Achieve Breadth-Depth Flexibilityby Dual Space Navigation (demo) • Breadth: Quick scan of many permutations • Depth: Instant exploration of results with fixed weights many query permutations Deep exploration of single permutation

Latency Analysis: Workflow Pipeline execute query and download ranked concept list package results with scores transmit to client unpackage results at interface score images by concept weights; guarantee unique positions download images to interface in cached mode time to execute is disproportional!

Pipelined processing for low latency Concept formulation (“car”) concept formulation (“snow”) • Overlap (concept formulation) with (map rendering) • Hide rendering latency during user interaction • Course-to-fine concept map planning/rendering • Speed optimization on-going …

Challenge: user frustration in visual search ? Research needed to address user frustration sensible search:(1) query (2) visualize + (3) analyze

Help Users Make Sense of Image Trend Query: “John Kennedy” • Many re-used content found • How did it occur? • What manipulations? • What distribution path? • Correlation with perspective change?

Manipulationcorrelated withPerspective Anti-Vietnam War,Ronald and Karen Bowen, 1969 Raising the Flag on Iwo Jima Joe Rosenthal, 1945

Reused Images Over Time

Issue a text query Find duplicate images, merge into clusters Explore history/trend Get top 1000 results from web search engine Rank clusters (size?, original rank?) Question for Sensible Search: Insights from Plain Search Results?

Duplicate Clusters Reveal Image Provenance Biggest Clusters Contain Iconic Images Smallest Clusters Contain Marginal Images

Deeper Analysis of Search Results: Visual Migration Map (VMM) (Kennedy and Chang, ACM Multimedia 2008) Visual Migration Map Duplicate Cluster

“Most Original” at the root Visual Migration Map (VMM) Images Derived through Series of Manipulations VMM uncovers history of image manipulation and plausible dissemination paths among content owners and users. “Most Divergent” at the leaves

Ground truth VMM is hard to get • Hypothesis • Approximation of history is feasible by visual analysis. • Detect manipulation types between two images • Derive large scale history among a large image set

Cropped Insertion Original Overlay Scaled Gray Basic Image Manipulation Operators • Each is observable by inspecting the pair • Each implies direction (one image derived from other) • Other possible manipulations: color correction, multiple compression, sharpening, blurring

Detecting Near-Duplicates • Duplicate detection is very useful and relatively reliable • Remaining challenges: scalability/speed; video duplicates; object (sub-image) (TRECVID08) Graph Matching [Zhang & Chang, 2004] Matching SIFT points [Lowe, 1999]

Scale Detection • Draw bounding box around matching points in each image • Compare heights/widths of each box • Relative difference in box size can be used to normalize scales

Color Removal • Simple case: image stored in single channel file • Other cases: image is grayscale, but stored in 3-channel file • Expect little difference in values in various channels within pixels

More Challenging:Overlay Detection? • Given two images, we can observe that a region is different between the two • But how do we know which is the original?

Cropping or Insertion? • Can find differences in image area • But is the smaller-area due to a crop or is the larger area due to an insertion? Original Cropping Insertion

Use Context from Many Duplicates Normalize Scales and Positions Get average value for each pixel “Composite” image

Image B Composite B Residue B Cropping Detection w/ Context • In cropping, we expect the content outside the crop area to be consistent with the composite image Image A Composite A Residue A

Image B Composite B Residue B Overlay Detection w/ Context • Comparing images against composite image reveals portions that differ from typical content • Image with divergent content may have overlay Image A Composite A Residue A

Image B Composite B Residue B Insertion Detection w/ Context • In insertion, we expect the area outside the crop region to be different from the typical content Image A Composite A Residue A

Evaluation: Manipulation Detection Context-Free Context-Dependent • Context-Free detectors have near-perfect performance • Context-Dependent detectors still have errors • Consistency checking can further improve the accuracy • Are these error-prone results sufficient to build manipulation histories?

r Not Plausible Inferring Direction from Consistency

a Plausible Manipulation Direction from Consistency

Sensible Visual Search