Visual Scene Understanding (CS 598)

Visual Scene Understanding (CS 598) Derek Hoiem Course Number: 46411 Instructor: Derek Hoiem Room: Siebel Center 1109 Class Time: Tuesday and Thursday 11:00am – 12:15pm Office Hours: Tuesday and Thursday 12:15-1pm; by appointment Contact: dhoiem@uiuc.edu, Siebel 3312

Today • Introductions • Overview of logistics • Overview of class material

Vision: What is it good for? Biological (Humans) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Technological (Computers) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Note: Unfortunately, these got erased when my computer crashed

Course Logistics

Class Content Overview • Tutorials and Perspectives • Paper reading • Spatial Inference • Objects • Actions • Context and Integration

Visual Scene Understanding Visual scene understanding is the ability to infer general principles and current situations from imagery in a way that helps achieve goals.

I. Spatial Inference

Getting Around

Spatial Inference: applications Automated Vehicles Household Robots Graphics Applications Predict object size/position

Spatial Inference: open questions • How do we represent space? • Surface orientations, depth maps, voxels? • How do we infer it from available sensory data (image, stereo, motion, laser range finder)?

II. Objects

Finding Things and Observing Them Image classification: Are there any dogs? Photo credit: iansand – flickr.com

Finding Things and Observing Them Object Localization: Where are the dog(s)?

Finding Things and Observing Them Verification: Is this a dog?

Finding Things and Observing Them Description: Furry, small, nice, side view

Finding Things and Observing Them Identification: My friend Sally?

Recognizing Stuff SKY WATER SAND

Object Recognition: applications Photo Search Security Robots

Object Recognition: open questions • How many examples does it take to learn one category well? • How many examples does it take to learn 100 categories well? • How do these answers depend on the level of supervision? • Can recognition be solved with simple methods and massive amounts of data? • How can we quickly recognize an object? • How can we scale up to deal with thousands of categories?

III. Actions

Taking Action [Saxena et al. 2008]

Recognizing Actions KTH Dataset Figure from Laptev et al. 2008

Recognizing Actions Figure from Laptev et al. 2008

Reading Emotions Photo credit: Comstok

Actions: applications Video Search Security

Actions: open questions • How are actions defined? • Does it make sense to categorize them? • If not, how do we recognize them? • What are good visual representations for inferring actions? • How can we recognize activities?

IV. Context and Integration [Hoiem et al. 2008]

Context and Integration • Objects + scene categories  better detection • Movement + objects  action/activity recognition • Space + objects  navigation [Hoiem et al. 2008]

Context and Integration: applications Everything that vision is good for

Context and Integration: open questions • Should context be explicit (e.g., “cars drive on the road”) or implicit (feature-based)? • How do we model and learn the interactions between different processes and scene characteristics? • How do we deal with the growing complexity as more and more pieces are put together?

General Problems in Computer Vision • Better understanding of limitations and their sources • Need new experimental paradigms • Improve generalization • Aim to generalize across datasets, categories, and tasks • Work on knowledge sharing and transfer • Vision as a way of learning about the world • Integration into AI • Systems that acquire knowledge over time

Successes of Computer Vision • Point matching (e.g. 2d3) • Tracking • Structure from motion • Stitching • Product inspection • Multiview 3d reconstruction • Face recognition and modeling • Object recognition on pre-2000 datasets • Interactive segmentation (ongoing)

To Do • Register on bulletin board • Post comments on Thursdays reading (due tomorrow) • Look over schedule and decide which days to present (due next Tues) • Start thinking about projects • Let me know if you want a specific pairing (due Tues)

Questions?

Goals • Make you a better researcher (esp. in vision) • More knowledge • Better critical thinking skills • Improved communication skills • Improved research skills

Grades • Participation: 25% • Posting • Class discussion • Presentation: 25% • Projects: 50% • Proposal, progress report, final paper, and oral

Policies • Attendance required (see syllabus) • Give credit where due • No formal prerequisites • Everything needs to be on time

Reading • Read well • Post comments to bulletin board at least 24 hours before class

Presentations • Presenter • Everyone does two • Good quality coverage of topic (40 min) • See syllabus for guidelines • Sign up by next Tuesday (at latest) • TBAs are your choice (decide at least 4 weeks in advance) • Demonstrator • If all days are taken, pair up • One person’s job will be to demonstrate some aspect of the algorithm (e.g., where it succeeds and fails) by running it on many examples • May require implementation • Note taker

Projects • Timeline • Proposal: Feb 12 (3 ½ weeks!) • Progress report: Mar 19 • Presentation: paper May 5, oral later • Progress report • Presentation • Paper • Oral • In pairs • Can choose partner or be randomly paired • Suggestions on web • Potentially will lead to publication (e.g. NIPS)

To Do • Register on bulletin board • Post comments on Thursdays reading (due tomorrow) • Look over schedule and decide which days to present (due next Tues) • Start thinking about projects • Let me know if you want a specific pairing (due Tues)

Questions?

Visual Scene Understanding (CS 598)