360 likes | 466 Vues
Explore a faster detection algorithm for recognizing a variety of objects in images with feasibility on consumer hardware. This talk presents initial experiments suggesting scalability and sublinear time complexity. Past work includes part-based detection methods and a proposed algorithm with promising results. Discover ways to combine sublinear appearance correlation with shape searching for further optimization.
E N D
Towards Sublinear TimeMulticlass Object Detection Sam Davies
The Challenge • Recognize objects in images • Many object classes • Many 3D views • Feasible on consumer hardware
Applications • Cars that drive themselves • Other robots… • Assistive devices for the blind
This Talk • Use an existing object representation [Crandall ’05] • Propose a faster detection algorithm • equivalent accuracy • Present initial experiments that suggest • It scales well with #classes x #views • Empirically sublinear
Talk Overview • Past Work • Part-based detection • 1-Fan/Star Model • Proposed Algorithm • Results • Next Steps • Feature Sharing
Past Work: State of the Art • Part-based • Shape • Appearance • Relatively high accuracy • (for this presentation, assume good enough) • Mostly single view, single class • Linear running time in C (#classes x #views) • (or parallelize with N processors -- $$$!) • Multiclass part sharing [Torralba 2004] • Improve running time – empirically O(log C) • Restricted shape model
Past Work: Part-Based Detection • Rigid pieces held together by “springs.” • The springs joining the rigid pieces • Constrain relative movement • Measure the cost of the movement • Cost of an embedding: • Measure the “tension” on each spring, and • A local evaluation of how well each coherent piece is embedded [Fischler, Elschlager 1973]
Past Work: Part-Based Detection • Global measurement (shape) • Constellation / arrangement of part positions • Spring stretching / compressing • Cost / energy associated with relative positions of pairs of parts • Local measurement (appearance) • Rigid local part from image information • Independently measured for each part
Past Work: Part-Based Detection • Find best location of all the parts (highest sum of weighted votes) • minimize spring tension and part matching energies • MAP estimation: maximum probability of part locations for a test image
Past Work: 1-Fan/Star Model • Restrict all parts to only be connected to the center part
Past Work: 1-Fan/Star Model • Restrict all parts to only be connected to the center part • More efficient detection (dynamic programming) • Shown to be reasonably accurate [Crandall 2005, Fergus 2005]
Past Work: 1-Fan/Star Model • Hough Transform • Each part “votes” for location of the center part • Votes are weighted according to spring definitions
Past Work: 1-Fan/Star Model Use Gaussians for shape models [Crandall 2005, Fergus 2005]
Past Work: 1-Fan/Star Model O(N) O(N) + O(N2) O(N) O(N) x O(P) O(PN) + O(PN) (sum) + O(N) (max) O(PN) x O(C) O(CPN) N: # pixels P: # parts C: # classes x # views
Proposed Algorithm • Idea: • Run max, sum, distance transform computations all together • Adaptively • Divide into image pyramids
Proposed Algorithm • Key observation: • We can quickly calculate an upper bound of the distance transform in a desired image pyramid cell • Then refine in the most promising areas
Proposed Algorithm • Start with a coarse approximation • Ignore shape information all together • Think: largest cell in the image pyramid groups all pixels into one • Equivalent to bag-of-words (0-fan)
Proposed Algorithm • For the object that looks most promising, descend down to a finer resolution in the hierarchy, and re-estimate the distance transform. • Based on a hierarchical A* framework [Macallester ’07] • Admissible heuristic based on upper bound estimate for coarse estimates
Next Steps • Recall: • Appearance correlation is still O(PC) • P = # parts, C = #classes x # views • Even if shape matching is sublinear, we still have: O(PC) + o(C) = O(PC) • Need to make correlation sublinear as well.
Past Work: Feature Sharing [Torralba 2004]
Past Work: Feature Sharing empirically “O(log(C))”
Next Steps • Combine • Sublinear appearance correlation (via feature sharing) with • Sublinear shape searching (described here) • We get: • o(C) + o(C) = o(C)