200 likes | 320 Vues
This presentation explores real-time computer vision utilizing Scanning N-Tuple Grids as an efficient method for isolated character and face recognition. Starting with N-Tuple classifiers, the discussion includes the development of scanning grids designed for convolutional applications that recognize patterns regardless of their location. Efficiency is achieved by reducing time complexity, enabling rapid processing of large image datasets. Real-time demonstrations showcase high accuracy rates in OCR tasks, including character recognition and facial recognition, highlighting the potential for broader applications in natural scene text reading and beyond.
E N D
Real-time Computer Visionwith Scanning N-Tuple Grids Simon Lucas Computer Science Dept
Outline • Background: N-Tuple Classifiers • The scanning n-tuple grid • Isolated Character Recognition • Isolated Face Recognition • Convolutional Mode OCR • Real time vision demo • Conclusions
N-Tuple Classifiers • Work by randomly sampling input space • First applied to binary images • Very fast; reasonable accuracy • Scanning N-Tuple classifier (Lucas, 1995) • Applied to sequence recognition • Fast and accurate • Current work • SNT Grid • Specially developed for convolutional (sliding window) applications • Recognise patterns independent of location
SNT-Grid System Architecture Binarise (e.g. Niblack) Scanning Index (SNT-Grid) Likelihood Image Likelihood Image Integrated Likelihoods Further Processing (e.g. Dictionary or Language Model)
Simple Operation • Slide grid over image • Interpret each position as binary number
Efficient Implementation • Very simple idea • Decompose one 2-d scan • Into two 1-d scans! • Reduces time complexity • Suppose image is n x n • Window is m x m • Reduce from O(n2m2) • To O(n2) • Well worth the effort!
SNTGrid Speed on MNist • Java Implementation • Chars are 28 x 28 grey level images • Training (60,000 chars) • 8s (> 7,000 cps) • Testing (10,000 chars) • 3.8s (> 2,600 cps)
ORL Face Data • 40 subjects • 10 images from each • Using 5 for training, 5 for testing • Average around 97.5% accuracy • Competitive with other methods • Much faster!
Museum Archive Cards • Hard to read with conventional OCR
‘2’ Detector – Integrated OP(Uses Integral Array of Viola + Jones)
Real-time Demo • Very efficient • Can use it for real-time expression recognition • Or a ‘video’ joystick! • Bit like EyeToy – but potentially more sophisticated
Conclusions • Basis of simple and efficient computer vision • Trick is the scan decomposition • Also use of integral image to accumulate likelihoods • Currently being applied to reading text in natural scenes • Many other applications also • Further reading: ICDAR 2005 Paper (on my web page)