1 / 40

The SUN Database

Explore the SUN Database, a comprehensive collection of scene categories and images with the aim to understand and classify different scenes. This database provides insights into human performance, image feature benchmarking, and sub-scene detection.

mauricec
Télécharger la présentation

The SUN Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Slides by Jennifer Baulier The SUN Database

  2. What is the SUN database? • Scene Understanding Database • Scene = a place humans could act within • Full database as of paper: 899 categories and 130,519 images

  3. Size Representation of the Categories in SUN 908 Many in bedroom, kitchen, and living room, but few in grotto, launch pad, sinkhole, signal box, sunken garden, etc.

  4. Motivation • Huge databases available for objects • Largest scene database had 15 categories • Many questions to answer

  5. Four Objectives • Determine categories with some objectivity • Judge human performance • Feature benchmark • Sub-scene detection

  6. Making The Database

  7. Choosing Scenes for the Database • Terms from tiny image • Differing visual identities, navigable, and not proper nouns • 2500, down to 899

  8. Getting images • Uncontrolled from search engines • Full Color, (200 x 200) + • Checked for correctness • No duplicates

  9. Examples

  10. Human Performance

  11. Human Recognition Task • Test category overlap • Comparison other tests • Avoid too much training

  12. Human Experiment Setup • 397 categories, 20 scenes • Labels in a 3 level tree • First level: indoor, outdoor natural, outdoor man made • Used Amazon Mechanical Turk • 61 seconds / 58.6% accuracy • “Good workers” (100+ hits) 68.5%

  13. Some Easy Categories

  14. Some Confusing Categories (and what people think they are)

  15. Computer Performance Benchmark

  16. Image Feature Comparison Experimental Setup • 1-vs-all SVMs • Both datasets • 12 feature types • “All features” = a weighted sum

  17. GIST • Estimate perceptual dimensions: Naturalness, openness, roughness, expansion, ruggedness • Output energy of 24 filters tuned to 8 orientations at 4 scales • Averaged on a 4x4 grid

  18. HOG (2x2) • Histogram of Oriented Gradients (31 bins) per cell • Cells have 8 pixel steps • stacked 2x2 neighbors into 124 dimensions • 300 visual words using k-means

  19. SIFT (General) • Scale Invariant Feature Transform • A detector for attribute regions • Semi-invariant to viewpoint, illumination, etc. • A descriptor for the appearance

  20. Dense SIFT • Extract SIFT features with both 4 and 8 pixel radii • Stack descriptors of Hue, Saturation, Value color channel • 300 visual words

  21. Sparse SIFT • Hessian-affine and MSER interest points • Cluster both sets into 1000 visual words • 2 histograms

  22. LBP • Histogram of local binary patterns • Texture recognition • Rotation invariant

  23. SSIM • Self Similarity Descriptors compare small patches to neighbors • Grid of 5x5 patches • Radius = 3 bins, Angles = 10 bins, 30 dimensional descriptor per patch • 300 visual How can you tell that these are the same shape?

  24. Tiny Image • Most basic • Greatly scale down both images • One long array

  25. Line Features • Straight lines from Canny edges • 2 unnormalized histograms: lengths and angles

  26. Texton Histogram • “the basic elements in early (pre-attentive) visual perception.” - ucla.edu • Responses to a bank of filters with 8 orientations, 2 scales, and 2 elongations • Defined 512 textons

  27. Color Histogram • Used CIE L*a*b* color space • L* = lightness • a* = red to green • b* = yellow to blue • Histograms: 4 x 14 x 14 bins

  28. Geometric Probability Map • Geometric probability: chance of a point in a region falling into a sub region • Consider four classes: ground, vertical, porous, and sky • Probability map for each class -> 8x8 grid

  29. Geometry Specific Histogram • Color & texton histograms for the 4 classes • Every sample adds to a histogram for each class • Weighted by the likelihood of belonging to that class

  30. Results Discussion • 38% vs 68.5% for good workers • Outdoor natural = 43.2%, indoor = 37.5%, outdoor man-made = 35.8% • Indoor transportation = 51.9%, indoor shopping and dinning = 29%

  31. Humans (left %) vs All Feature SVM(right %)

  32. Localizing Multiple Scenes

  33. Scene Detection • An image may transition between scenes • Find & localize all scenes • Classification vs detection = object terms

  34. Test and Approach • 24 categories from SUN 397 • 104 photos of urban environments • Averages 4 scenes/image • Window scans the image 3 times at different scales

  35. Validation • Bounding box has to overlap >= 15% of ground truth • Space doesn't have well defined edges

  36. Results when training with 200 examples per class

  37. Works Cited • Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). SUN database: Large-scale scene recognition from abbey to zoo. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Retrieved from http://vision.princeton.edu/projects/2010/SUN/paper.pdf • Xiao, J., Ehinger, K. A., Hays, J., Torralba, A., & Oliva, A. (2014). SUN Database: Exploring a Large Collection of Scene Categories. International Journal of Computer Vision Int J Comput Vis. Retrieved from http://vision.princeton.edu/projects/2010/SUN/paperIJCV.pdf • SUN Database. Retrieved March 13, 2016, from http://groups.csail.mit.edu/vision/SUN/ • Oliva, A., & Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Retrieved March 14, 2016, from http://people.csail.mit.edu/torralba/code/spatialenvelope/ • VLFeat.org. Retrieved March 14, 2016, from http://www.vlfeat.org/ • Chatfield, K., Philbin, J., & Zisserman, A. (2009). Efficient retrieval of deformable shape classes using local self-similarities. 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops. Retrieved from http://www.robots.ox.ac.uk/~vgg/publications/2009/Chatfield09/chatfield09.pdf • Green, B. Canny Edge Detection Tutorial. Retrieved March 14, 2016, from http://dasl.mem.drexel.edu/alumni/bGreen/www.pages.drexel.edu/_weg22/can_tut.html • Texton. Retrieved March 14, 2016, from http://vcla.stat.ucla.edu/old/Chengen_Research/texton.htm#base_texton • Lab Color Space. Retrieved March 14, 2016, from https://en.wikipedia.org/wiki/Lab_color_space#CIELAB • Large-scale Scene Understanding Challenge. Retrieved March 14, 2016, from http://lsun.cs.princeton.edu/ • Places2: A Large-Scale Database for Scene Understanding. Retrieved March 14, 2016, from http://places2.csail.mit.edu/challenge.html • MIT Places Database for Scene Recognition. Retrieved March 14, 2016, from http://places.csail.mit.edu/

  38. Images Cited • SUN Logo. Digital image. Princeton Vision Group. Web. <http://vision.princeton.edu/projects/2010/SUN/>. • SUN Categories Size Visualization. Digital image. Princeton.edu.Web. <http://vision.princeton.edu/projects/2010/SUN/paperIJCV.pdf>. • SUN Indoor Mall Image. Digital image. SUN Database. Web. <http://labelme.csail.mit.edu/Release3.0/tool.html? • SUN Lectture Room (for GIST). Digital image. SUN Database. Web. <http://labelme.csail.mit.edu/Release3.0/tool.html? • SUN Category Image Examples. Digital image. Princeton.edu. Web. <http://vision.princeton.edu/projects/2010/SUN/paper.pdf>. • SUN Categories That are Easy to Humans. Digital image. Google Scholar. Web. <https://scholar.google.com/citations?view_op=view_citation&hl=en&user=FNhl50sAAAAJ&citation_for_view=FNhl50sAAAAJ:8k81kl-MbHgC>. • SUN Categories That are Confusing to Humans. Digital image. Google Scholar. Web. <https://scholar.google.com/citations?view_op=view_citation&hl=en&user=FNhl50sAAAAJ&citation_for_view=FNhl50sAAAAJ:8k81kl-MbHgC>. • HOG Base Image. Digital image. VLFeat.org. N.p., n.d. Web. <http://www.vlfeat.org/overview/hog.html>. • HOG Feature Image. Digital image. VLFeat.org. N.p., n.d. Web. <http://www.vlfeat.org/overview/hog.html>. • SIFT Image Base. Digital image. VLFeat.org. Web. <http://www.vlfeat.org/overview/sift.html>. • SIFT Image Feature Points. Digital image. VLFeat.org. Web. <http://www.vlfeat.org/overview/sift.html>. • SIFT Image Feature Descriptors. Digital image. VLFeat.org. Web. <http://www.vlfeat.org/overview/sift.html>.

  39. Images Cited 2 • Description of Facial Expressions with Local Binary Patterns. Digital image. Scholarpedia. Web. <http://www.scholarpedia.org/article/Local_Binary_Patterns>. • MSER Features. Digital image. Mathworks. Web. <http://www.mathworks.com/help/vision/ref/detectmserfeatures.html?refresh=true>. • Affine Covariant Region Detectors. Digital image. Robots.ox.ac.uk. N.p., n.d. Web. <http://www.robots.ox.ac.uk/~vgg/research/affine/detectors.html>. • SSIM Example Diagram. Digital image. Robots.ox.ac.uk. Web. <http://www.robots.ox.ac.uk/~vgg/publications/2009/Chatfield09/chatfield09.pdf>. • Heart Shape Classification Task. Digital image. Robots.ox.ac.uk. Web. <http://www.robots.ox.ac.uk/~vgg/publications/2009/Chatfield09/chatfield09.pdf>. • Canny Edge Base Image. Digital image. Drexel.edu. Web. <http://dasl.mem.drexel.edu/alumni/bGreen/www.pages.drexel.edu/_weg22/can_tut.html>. • Canny Edge Result Image. Digital image. Drexel.edu. Web. <http://dasl.mem.drexel.edu/alumni/bGreen/www.pages.drexel.edu/_weg22/can_tut.html>. • Texton Example. Digital Image. Ucla.edu. Web. <http://vcla.stat.ucla.edu/old/Chengen_Research/texton.htm#base_texton>. • CIE L*a*b* Color Space Examples. Digitial Image. Wikipedia. Web. <wikipedia.org/wiki/Lab_color_space>. • Geometric Probability Dartboard Example. Digital Image. Ck12.org. Web. <www.ck12.org/user:Sample(123)/book/02.-CK-12-Middle-School-Math-Grade-8/sction/11.7/>. • Feature Results on SUN and the 15 Category Database. Digital image. Princeton.edu. Web. <http://vision.princeton.edu/projects/2010/SUN/paper.pdf>. • Human vs All Feature SVM Class Results. Digital image. Princeton.edu. Web. <http://vision.princeton.edu/projects/2010/SUN/paper.pdf>. • Open Concept House. Digital Image. Hzcdn.com. Wev <st.hzcdn.com/simgs/9eb1650b025c57f0_4-8860/traditional-kitchen.jpg>. • SUN Tower Image. Digital Image. SUN Database. Web. <label.csail.mit.edu/Release3.0/tool.html?actions=v&folder=users/antoni/static_sun_database/>. • Image Detections Results Table (Split in 2). Digital Image. Princeton.edu. Web. <http://vision.princeton.edu/projects/2010/SUN/paper.pdf>.

More Related