Spatial Business Detection and Recognition from Images

Spatial Business Detection and Recognition from Images Alexander Darino Weeks 10 & 11

STR Implementation • STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes” Multiresolution-based potential characters detection Character/layout geometry and color properties analysis Refined Detection Local affine rectification

Refined Detection • One Font per classifier, a-z A-Z • Generate alphabet templates • Resize & center templates; Divide into grid (7x7) • Apply several 2D Gabor filters to each grid patch • Different orientations, frequencies, variances, • For each pixel, yields real/imaginary component of transformation • Feed data into Linear Discriminant Analysis • Reduces features and forms classifier at same time

2D Gabor Filter • Convolution of Gaussian x Sine wave

Training Process

Character Determination • Each grid patch has it’s own LDA classifier; classifier returns vector of probabilities for each symbol • To classify overall character, recursively consider all 9-neighborhoods, multiply corresponding probabilities together • When only one grid-patch remains, highest probability wins

Recognition Process • Color Properties Analysis: Choose channel with highest confidence of best distinguishing foreground from background • Binarization Threshold (50% of Otsu’s Method) • Intermediate Representation: Trim, Resize, and Center Binary Image • Perform OCR on variations of Int. Rep: stretched, eroded (gaussian-based), diluted • Aggregate and return votes

Recognition Process Example:“G” using Trebuchet-MS Classifier Query Character (Actual Size) Intermediate Representation(Actual Size)

abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ

Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: s Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: G

Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: g Variation (Actual Size) Identified Character: B

Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: B

Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: G

Recognition Process Example:“G” using Trebuchet-MS Classifier Variation (Actual Size) Identified Character: B Variation (Actual Size) Identified Character: G Variation (Actual Size) Identified Character: a

Recognition Process Example:“G” using Trebuchet-MS Classifier • Final Results: • B: 5/15 • G: 5/15 • g: 3/15 • a : 1 (6.6%) • s : 1 (6.6%)

“GEORGE” (Trebuchet-MS) • Votes: • E: 14/15 • t: 1/15

“GEORGE” (Trebuchet-MS) • Votes: • j: 13/15 • i: 2/15 • ‘j’ is the default when unable to decide • Should invert during preprocessing

“GEORGE” (Trebuchet-MS) • Votes: • j: 13/15 • i: 1/15 • M: 1/15 • ‘j’ is the default when unable to decide • Should invert during preprocessing

“GEORGE” (Trebuchet-MS) • Votes: • B: 5/15 • G: 5/15 • g: 3/15 • a: 1/15 • s: 1/15

“GEORGE” (Trebuchet-MS) • Votes: • j: 12/15 • Y: 2/15 • X: 1/15 • ‘j’ is the default when unable to decide • Should invert during preprocessing or training

Note on the “Inversion Problem” • Easy to fix; common problem in OCR systems • Will likely detect and correct during preprocessing state as opposed to training • More training data: slower, less reliable • Preprocessing: like trying many different lenses at the eye doctor and taking your best guess with each lense.

“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ

“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • B: 9/15 • j: 3/15 • H: 2/15 • F: 1/15

“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • A: 9/15 • j: 5/15 • n: 1/15

“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • K: 12/15 • j: 2/15 • H: 1/15

“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • E: 5/15 • j: 3/15 • L: 3/15 • r: 2/15 • F: 2/15

“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • p: 12/15 • j: 3/15 PR

“BAKERY”(Actual: ‘Tw-Cen-MT’, Used: ‘Arial’) • Votes: • Y: 12/15 • j: 3/15

“UNIVERSITY”(Used: Times New Roman) abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ

“UNIVERSITY”(Used: Times New Roman) • Votes: • U: 8/15 • C: 3/15 • j: 2/15 • s: 1/15 • O: 1/15

“UNIVERSITY”(Used: Times New Roman) • Votes: • N: 12/15 • j: 3/15

“UNIVERSITY”(Used: Times New Roman) • Votes: • l(‘el’): 9/15 • I(‘eye’): 6/15

“UNIVERSITY”(Used: Times New Roman) • Votes: • v: 9/15 • j: 3/15 • V: 3/15

“UNIVERSITY”(Used: Times New Roman) • Votes: • F: 9/15 • L: 5/15 • l (‘el’): 1/15

“UNIVERSITY”(Used: Times New Roman) • Votes: • G: 9/15 • j: 6/15

“UNIVERSITY”(Used: Times New Roman) • Votes: • j: 12/15 • x: 2/15 • w: 1/15

“UNIVERSITY”(Used: Times New Roman) • Votes: • j: 5/15 • C: 4/15 • O: 4/15 • x: 2/15

“UNIVERSITY”(Used: Times New Roman) • Votes: • T: 9/15 • l: 3/15 • i: 1/15 • j: 1/15 • L: 1/15

“UNIVERSITY”(Used: Times New Roman) • Votes: • Y: 10/15 • j: 3/15 • i: 2/15

Evaluation • Biggest weaknesses in preprocessing stage • OCR sensitive to thresholding/color inversion • Occasionally color modeling chooses a bad channel to use for OCR – happens more often on low-resolution images • Works surprisingly well for low-resolution images • Font does not need to be exact, but proportions need to be roughly the same

How do I use this information?

The Big Picture Business Name Matching Business Spatial Detection Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses BusinessIdentification Image STR Detected Text

Old Approach • Form words from highest-voted characters • Compare to lexicon using Levenshtein distance • Use existing ranking system afterwards BOKFRY > BAKERY (L-DIST = 2) GFQRGF > GEORGE (L-DIST = 3)

New Approach (Lexicon-assisted STR) • Minimize Levenshtein distance with best permutation of voted characters • Use existing ranking system afterwards B O K F P Y G U H E R I >>> BAKERY J A j L I l (L-DIST = 0)

The End Result Bruegger's Bagels Category: Bagels Address:Market Sq Pittsburgh, PA 15222 Phone: (412) 281-2515 Rating: Not Rated

Next Steps • Fix STR Preprocessing • Bug in Color Modeling code found online • Inversion determination • Multiple thresholds • Word matching: Generate templates of words/logos instead of letters • Text detector: fix character/word fragmentation by reading papers that address the issue

Next Steps • Test more images; fix problems as they arise • Ideas to consider: • Feed grid-patch probability vectors into SVM instead of “smoothing” • Generate “disambiguation classifiers” to differentiate: • Between top contending votes. Remember how ‘G’ and ‘B’ got confused? Dynamically create classifier to tell them apart • Between commonly confused letters. Eg. E/F, l/i/j, o/c, etc • Don’t consider statistically insignificant confidences

Next Steps • Text Detection • Look into after more work has been done on STR • Need to address issues: • Intracharacter segmentation • Intercharacter segmentation • Word segmentation • Needed to make STR system automated like before

Thank You

Spatial Business Detection and Recognition from Images