Introduction

Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan CiresanPolitehnica University of TimisoaraComputer DepartmentTimisoara, Romaniadan.ciresan@ac.upt.ro

Introduction • the objective of the present work is to provide an efficient technique for off-line recognition of handwritten numeral strings • practical applications: • postal code recognition (in USA only, 250 millions of envelops are sorted every day) • information extraction from fields of different forms • best digit recognition rate: 99.6% (Simard [18]) on MNIST • best numeral string recognition rate: 96-97% (Liu, Sako and Fujisawa [12]) on a set from NIST SD 19 • almost all methods use segmentation and training with negative examples

Numeral string recognition system 293854 • the proposed solution uses two Convolution Neural Networks (CNNs), one for digit recognition (1CNN) and one for numeral strings composed from two digits partially overlapped (2CNN) • both classifiers are trained without negative examples and the use of 2CNN completely relieves our method from the usage of segmentation • by comparing the results of the classifiers it can decide if the image contains one digit or two digits • evaluated on NIST SD 19 and the results are comparable with the best results from literature, even if those are using elaborate segmentation

Connected Component Analyzer (CCA) • all the connected components from the input image are extracted using a recursive search in only four directions, horizontally and vertically • for each component there are computed various parameters, like: size, bounding rectangle, width, height, aspect (width/height), distances to other components • the CCA assigns a color to each component • no segmentation is performed because not even a single connected image is split up in parts. We extract only already separated components • in contrast to other methods [12, 6, 7], we can avoid complex segmentation by using a classifier for two connected digits

The Clustering Stage In order to reconstruct the image, the clustering stage performs in sequence the following four operations • all small components, far from any other component are deleted • any two components separated by only one pixel are concatenated if they obey several condition. Repeat this step for components separated by two or three pixels • reconnecting the digit five • any two components that are completely horizontally overlapped are concatenated Reconnecting the digit five The end result of clustering

The Classifier • uses two Convolution Neural Networks (CNNs), one for single digit recognition and one for pairs of two partially overlapped digits • the single digit CNN(1CNN) is similar with that from [18] (Simard) • trained on LNIST set: 66214 digit images • tested with LNIST set: 45398 digit images • recognition rate: 99.34% • architecture: five layer CNN (L0-input, L1 and L2 – convolution, L3 and L4 – fully connected) • the two-digit CNN (2CNN) was presented in previous work [4] • trained with 200000 images automatically generated from digit images from NIST SD 19 • tested with 200000 images (21x13 pixels) • recognition rate: 94.65% • architecture: five layer CNN (L0-input, L1 and L2 – convolution, L3 and L4 – fully connected)

Test set - 3DNS (Three Digit Numeral String) • in order to compare our method with the best from the field [12], we have tested our recognition system on exactly the same data set (derived from NISD SD 19) • we extracted all three-digit images from 300 writers • four pages written with faded ink were discarded • each form contains five fields with three-digit strings, for a total of 1480 images • four images that mismatch the ground truth were eliminated

Recognition system based on maximum score • each component is presented to both 1CNN and 2CNN. The greatest score of each CNN are compared, and the one who generates the maximum will offer the class and the number of digits • from 1476 three-digit images, 126 were incorrectly recognized, which means a 91.46% recognition rate • observations: • because we trained both CNNs without negative examples, we cannot use them directly to detect if a component contains one, two or more digits • by training a NN with only positive examples, the NN will try to map any input image, even those that are incorrect, to the closest resembling class • many images containing digit 0, 1, 3, 4, 7, 8 or 9 were recognized by the 2CNN to be 01, 11, 31, 41, 71, 81 respectively 91. The problem can be attributable to the method of joining ([4]) the digits for the training set of the 2CNN. These cases will be denoted as

Translating the image on the input field of the classifiers • because the images for the 2CNN are only 21x13 pixels and are not mass centered, 2CNN is more sensible to translation than 1CNN • if the image is perfectly centered on the classifier, then the scores are better • we tried to move the image in the input field of the classifiers • 1CNN was trained with images that are mass centered. We have repeatedly applied the 1CNN on the input image translated with ±1 pixel relative to mass center and kept the greatest score of the 3×3 = 9 tests • 2CNN was trained with images that were bounding-box centered. Considering the generation method [4] of the images, we placed the 18x10 pixel image on all positions in the 21x13 input of the 2CNN. There are (21 − 18 + 1) × (13 − 10 + 1) = 16 possibilities. • The recognition rate decreased to only 71.54%

Recognition system based on differences of scores • we try to avoid training of CNNs with negative examples by using the fact that the difference between the best and the second best score of a NN is very large for a correct recognition • for X1 cases the condition is strengthened if at least one of the digits is 1 • the recognition rate is only 76.49% • applying translation for both 1CNN and 2CNN increased the recognition rate to 93.36%

Recognition system based on both maximum score and differences of scores • a combination of the two previous methods • for each classifier we simply add the best score with the difference between it and the second best score • for X1 cases the condition is strengthened if at least one of the digits were 1 • the recognition rate, 83.40%, is greater than that based on differences of scores, but smaller than that based on maximum scores • applying translation for both 1CNN and 2CNN increased the recognition rate to 93.77%

Error analysis The 92 misrecognized images from the 1476 present on the 3DNS test set (93.77% recognition rate) are of four types: • 4 segmentation errors, all caused by the very poorly scanned images • 8 three-digit errors. This case is not addressed by current method • 23 errors on 1CNN. Main causes: under-representation in the training set, very confusing images • 26 errors on 2CNN. Main cause: the two digits are very different (even more than 200%) in height, and 2CNN was trained with pairs of digits that were maximum 10% different in height • 63 errors generated by selecting the wrong classifier. They can be corrected by complicating the rules that select the classifier or, preferably, by training the classifiers with negative examples In order to verify the opportunity of using 2CNN, we deactivated it and we tested the recognition system only with the single digit classifier (1CNN). The recognition rate decreased with more than 4%

Conclusions • we devised a new method for numeral string recognition, based on two CNN, one for digit recognition and the other for pairs of digits • we eliminated the segmentation process by using a two-digit CNN • training with negative examples was avoided by implementing simple rules for choosing the proper CNN • our recognition rate of 93.77% is better than all previous result (Liu, Sako and Fujisawa [12]) obtained with NNs trained without negative examples, and close (3%) to the best result from [12], even if we have neither used segmentation, nor negative examples for training the networks • adding the two-digit classifier to the recognition system increased the recognition rate with more than 4%

Future work • implementing the CNNs on GPU for speed acceleration (more than 20x) => bigger CNNs can be trained (2CNN from 21x13 to 41x29) • adding negative examples to the training process in order to increase the recognition rate and to further simplify the recognition system • devising a method to solve the (rare) cases with three or more joined digits • improving the automated process for generating the training set for the 2CNN (e.g. X1 cases)

Introduction

Introduction

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction