Foundations & Core in Computer Vision: A System Perspective

Foundations & Core in Computer Vision: A System Perspective Ce Liu Microsoft Research New England

Vision vs. Learning • Computer vision: visual application of machine learning? • Data  features  algorithms  data • ML: design algorithms given input and output data • CV: find the best input and output data given available algorithms

Theoretical vs. Experimental • Theoretical analysis of a visual system • Best & worst cases • Average performance • Theoretical analysis is challenging as many visual distributions are hard to model (signal processing: 2nd order processes, machine learning: exponential families) • Experimental approach: full spectrum of system performance as a function of the amount of data, annotation, number of categories, noise, and other conditions

Quality vs. Speed • HD videos, billions of images to index • Real time & 90% vs. one hour per frame & 95%? • Mechanism to balance quality and speed in modeling

Automatic vs. semi-automatic • Common review feedback: parameters are hand-tuned; not clear how to set the parameters • Vision system user feedback: I don’t know how to tweak parameters! • Computer-oriented vs. human-oriented representations • Human-in-the-loop (collaborative) vision • How to optimally use humans (what, which and how accurate) beyond traditional active learning • Model design by crowd-sourcing • Learning by subtraction

Algorithms vs. Sensors • Two approaches to solving a vision problem • Look at images, design algorithms, experiment, improve… • Look at cameras, design new/better sensors, … • Cameras for full-spectrum, high res, low noise, depth, motion, occluding boundary, object, … • What’s the optimal sensor/device for solving a vision problem? • What’s the limit of sensors?

Thank you! Ce Liu Microsoft Research New England

Foundations & Core in Computer Vision: A System Perspective