280 likes | 306 Vues
Leaf Classification from Local Boundary Analysis. Anne Jorstad AMSC 664 University of Maryland Spring 2008 Final Report. Advisor: Dr. David Jacobs, Computer Science. Background. Electronic Field Guide for Plants. Background. Current System: Inner-Distance Shape Context (IDSC)
E N D
Leaf Classification from Local Boundary Analysis Anne Jorstad AMSC 664 University of Maryland Spring 2008 Final Report Advisor: Dr. David Jacobs, Computer Science
Background • Electronic Field Guide for Plants
Background • Current System: • Inner-Distance Shape Context (IDSC) • Measures the shortest distance between two points on a path contained entirely within a figure • Good for detecting similarities between deformable structures
Background • Current System: • All shape information is compared at a global level, no specific consideration of edge types Cephalanthus occidentalis (smooth boundary) Carpinus caroliniana (serrated boundary)
Problem Statement Use local boundary information to make classification decisions that complement the existing system.
The Algorithm • Input: • Capture boundary curve:
The Algorithm: Wavelets • Discrete wavelet transform • In: vector of points • Out: two vectors, each half original length • Approximation coefficients: • general spatial information • Detail coefficients: • local detail information • Repeat for multiple scales
The Algorithm: Wavelets • Model leaf by its detail coefficients over several scales Input Approximations, continually subtracting out detail information
The Algorithm: Data • Forget leaves: • Each boundary point: • Lose one degree of freedom in preserving rotation invariance • For 3 wavelet scales, leaf is ~2000 5-D points • Combine data for all leaves: • #leaves x ~2000 5-D points • Group all points into meaningful clusters
The Algorithm: Clustering • Goal: Sort points into “buckets” to get a unique distribution for each leaf species • K-Means Clustering: group all points into 36 representative clusters
The Algorithm: Distribution Comparison • Distribution of individual leaf’s 2000 points over the 36 clusters represents leaf (a) (b) (c) Leaf image and corresponding histogram for (a) Corylus americana, (b) Corylus americana, different example, (c) Asimina triloba
The Algorithm: Distribution Comparison • Compare distributions between leaves using the chi-squared distance: where • Smallest distance defines best match • New leaf is assigned the species of the closest match
Validation • Training data: 20 species, 10 examples of each → 200 leaves 10 serrated species 10 smooth species
Validation • Test data: same 20 species, 5 new examples of each • Nearest-Neighbor Classification • Species classification: 46% correct • Serration classification: 100% correct • closest match was to species with appropriate serration
Validation • Test data: same 20 species, 5 new examples of each • Nearest-Neighbor Classification • Species classification: 46% correct • Serration classification: 100% correct • closest match was to species with appropriate serration Local serration information IS being captured!
Combining Results • Original IDSC results on same data set: • Species correct: 62% • Serration correct if species wrong: 53% • No better than chance • How to combine wavelet distances with IDSC distances?
Combining Results • Given and • Want to find:
Naïve Bayes Classification • From Bayes’ Rule: • Can now calculate all relevant probabilities from training data
Naïve Bayes Classification • Wavelet distances → binary serration value • Add small linear smoothing term • IDSC distances → species ranked in order from nearest to farthest • Add Gaussian smoothing term
Validation Results • Test on same 20 species, 5 examples of each • Adding serration information has improved overall classification results!
Full Data Set • 245 species, 7481 leaves • Binary serration assignment no longer makes sense:
Linear Optimization • Find best linear weighting of distances: • Train over previous training set % correct alpha
Full Data Set • Nearest-Neighbor Classification over all 7481 leaves • Wavelet alone: 20% correct • IDSC alone: 54% correct • Combined: 64% correct
In Practice • Electronic field guide displays top 5, 10 or 20 matches • Calculate correct % in top n matches, for n = 1, …, 20
In Practice % correct # matches considered
In Practice • Need results in near real-time • Otherwise no benefit over paper field guides • Running time • Preprocessing: (several hours) • Determine cluster centers • Determine distributions for each leaf • On the spot: (0.92 seconds) • Calculate single distribution • Compare to all distributions in system
Conclusions • Wavelets do capture local serration information • Wavelet + IDSC classification does a better overall job than the original IDSC alone • Calculations can be done in real time to make the system realistic to use
References • Gaurav Agarwal, Haibin Ling, David Jacobs, Sameer Shirdhonkar, W. John Kress, Rusty Russell, Peter Belhumeur, Nandan Dixit, Steve Feiner, Dhruv Mahajan, Kalyan Sunkavalli, Ravi Ramamoorthi, Sean White. “First Steps Toward an Electronic Field Guide for Plants”. Taxon, vol. 55, no. 3, Aug. 2006. • Cene C.-H. Chuang, C.-C. Jay Kuo. “Wavelet Descriptor of Planar Curves: Theory and Applications”. IEEE Transactions of Image Processing, Vol. 5, No. 1, January 1996. • Pedro F. Felzenszwalb, Jushua D. Schwartz. “Hierarchical Matching of Deformable Shapes”. IEEE Conference on Computer Vision and Pattern Recognition, 2007. • Haibin Ling, David Jacobs. “Using the Inner-Distance for Classification of Articulated Shapes.” CVPR, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, 2005. • Jitendra Malik, Serge Belongie, Thomas Leung, Jainbo Shi. “Contour and Texture Analysis for Image Segmentation”. International Journal of Computer Vision, vol. 34, no. 1, July 2001. • Stephane Mallat. “A Wavelet Tour of Signal Processing”. Academic Press, Chestnut Hill, Massachusetts, 1999.