Learning Spatial Context: Using stuff to find things

Learning Spatial Context:Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008

Things vs. Stuff From: Forsyth et al. Finding pictures of objects in large collections of images. Object Representation in Computer Vision, 1996. Thing (n): An object with a specific size and shape. Stuff (n): Material defined by a homogeneous or repetitive pattern of fine-scale properties, but has no specific or distinctive spatial extent or shape.

Finding Things Context is key!

Outline • What is Context? • The Things and Stuff (TAS) model • Results

Satellite Detection Example D(W) = 0.8 D(W) = 0.8

Error Analysis Typically… True Positives areIN CONTEXT False Positives areOUT OF CONTEXT We need to look outside the bounding box!

Types of Context gist car “likely” keyboard “unlikely” • Thing-Thing: • Scene-Thing: • Stuff-Stuff: [ Torralba et al., LNCS 2005 ] [ Gould et al., IJCV 2008 ] [ Rabinovich et al., ICCV 2007 ]

Types of Context • Stuff-Thing: • Based on spatial relationships • Intuition: Road = cars here Trees = no cars “Cars drive on roads” “Cows graze on grass” “Boats sail on water” Houses = cars nearby Goal: Unsupervised

Things • Detection “candidates” • Low detector threshold -> “over-detect” • Each candidate has a detector score

Things • Candidate detections • Image Window Wi + Score • Boolean R.V. Ti • Ti = 1: Candidate is a positive detection • Thing model ImageWindowWi Ti

Stuff • Coherent image regions • Coarse “superpixels” • Feature vector Fj in Rn • Cluster label Sj in {1…C} • Stuff model • Naïve Bayes Sj Fj

Relationships • Descriptive Relations • “Near”, “Above”, “In front of”, etc. • Choose set R = {r1…rK} • Rijk=1: Detection i and region j have relation k • Relationship model T1 S72 = Trees S10 = Road S4 = Houses Sj Ti R1,10,in=1 Rijk

The TAS Model Wi: Window Ti: Object Presence Sj: Region Label Fj: Region Features Rijk: Relationship ImageWindowWi K Ti Rijk Sj N Fj J AlwaysObserved AlwaysHidden Supervisedin Training Set

Unrolled Model R1,1,left = 1 S1 T1 R2,1,above = 0 S2 R3,1,left = 1 T2 S3 R1,3,near = 0 S4 T3 R3,3,in = 1 S5 CandidateWindows ImageRegions

Learning the Parameters • Assume we know R • Sj is hidden • Everything else observed • Expectation-Maximization • “Contextual clustering” • Parameters are readily interpretable ImageWindowWi K Ti Rijk Sj N Fj J AlwaysObserved AlwaysHidden Supervisedin Training Set

Learned Satellite Clusters

Which Relationships to Use? • Rijk = spatial relationship between candidate i and region j Rij1 = candidate in region Rij2 = candidate closer than 2 bounding boxes (BBs) to region Rij3 = candidate closer than 4 BBs to region Rij4 = candidate farther than 8 BBs from region Rij5 = candidate 2BBs left of region Rij6 = candidate 2BBs right of region Rij7 = candidate 2BBs below region Rij8 = candidate more than 2 and less than 4 BBs from region … RijK = candidate near region boundary How do we avoid overfitting?

Learning the Relationships • Intuition • “Detached” Rijk = inactive relationship • Structural EM iterates: • Learn parameters • Decide which edge to toggle • Evaluate with l(T|F,W,R) • Requires inference • Better results than using standard E[l(T,S,F,W,R)] Rij1 Rij2 RijK Ti Sj Fj

Inference • Goal: • Block Gibbs Sampling • Easy to sample Ti’s given Sj’s and vice versa

Base Detector - HOG • HOG Detector: [ Dalal & Triggs, CVPR, 2006 ] Feature Vector X SVM Classifier

Results - Satellite Posterior:Detections Prior:Detector Only Posterior:Region Labels

Results - Satellite 1 0.8 ~10% improvement in recall at 40 fppi 0.6 Recall Rate 0.4 TAS Model 0.2 Base Detector 0 40 80 120 160 False Positives Per Image

PASCAL VOC Challenge • 2005 Challenge • 2232 images split into {train, val, test} • Cars, Bikes, People, and Motorbikes • 2006 Challenge • 5304 images plit into {train, test} • 12 classes, we use Cows and Sheep

Base Detector Error Analysis Cows

Discovered Context - Bicycles Bicycles Cluster #3

TAS Results – Bicycles • Examples • Discover “true positives” • Remove “false positives” ? BIKE ? ?

Results – VOC 2005

Results – VOC 2006

Conclusions • Detectors can benefit from context • The TAS model captures an important type of context • We can improve any sliding windowdetector using TAS • The TAS model can be interpreted and matches our intuitions • We can learn which relationships to use

Merci!

Object Detection • Task: Find the things • Example: Find all the cars in this image • Return a “bounding box” for each • Evaluation: • Maximize true positives • Minimize false positives

Sliding Window Detection • Consider every bounding box • All shifts • All scales • Possibly all rotations • Each such window gets a score: • D(W) • Detections: Local peaks in D(W) • Pros: • Covers the entire image • Flexible to allow variety of D(W)’s • Cons: • Brute force – can be slow • Only considers features in box D = 1.5 D = -0.3

Sliding Window Results PASCALVisual Object Classes ChallengeCows 2006 D(W) > T Recall(T) = TP / (TP + FN)Precision(T) = TP / (TP + FP) score(A,B) = |A∩B| / |AUB| A B score(A,B) > 0.5 TRUE POSITIVE score(A,B) ≤ 0.5 FALSE POSITIVE

Quantitative Evaluation 1 0.8 0.6 Recall Rate 0.4 0.2 0 40 80 120 160 False Positives Per Image

Detections in Context Task: Identify all cars in the satellite image Idea: The surrounding context adds info to the local window detector + = Houses Road Region Labels Prior:Detector Only Posterior:TAS Model

Equations

Features: Haar wavelets Haar filters and integral image Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size. BOOSTING!

Features: Edge fragments Opelt, Pinz, Zisserman, ECCV 2006 Weak detector = Match of edge chain(s) from training image to edgemap of test image BOOSTING!

Histograms of oriented gradients • SIFT, D. Lowe, ICCV 1999 • Dalal & Trigs, 2006 SVM!

Learning Spatial Context: Using stuff to find things

Learning Spatial Context: Using stuff to find things

Presentation Transcript

Test Scores Associated with Lessons Designed to Engage Spatial Thinking in Kindergarten and First Grade

(and other stuff I want to talk about)

FIRST THINGS FIRST

Ethics

ECOLOGICAL PROCESSES

Introduction to Spatial Statistics

Spatial Statistics

Supervised Learning

Seeing Patterns and Learning to Do Things and what that has to do with language

Land Navigation

Stuff

Chapter 12 Star Stuff

Spatial Databases: Lecture 9

Comparative psychology Concept learning Number Time Conditional learning

Lecture 2:

Bow

Analogy in Learning and Reasoning

Spatial Dynamical Modeling with TerraME

Chapter 3: Image Enhancement in the Spatial Domain

Chapter4: Spatial Storage and Indexing

Spatial Data Mining