OCR a survey

OCR a survey Csink László 2009

Problems to Solve • Recognize good quality printed text • Recognize neatly written handprinted text • Recognize omnifont machine-printed text • Deal with degarded, bad quality documents • Recognize unconstrained handwritten text • Lower substitution error rates • Lower rejection rates

OCR accprding to Nature of Input

Feature Extraction • Large number of feature extraction methods are available in the literature for OCR • Which method suits which application?

A Typical OCR System • Gray-level scanning (300-600 dpi) • Preprocessing • Binarization using a global or locally adaptive method • Segmentation to isolate individual characters • (optional) conversion to another character representation (e.g. skeleton or contour curve) • Feature extraction • Recognition using classifiers • Contextual verification or post-processing

Feature Extraction (Devivjer and Kittler) • Feature Extraction = the problem of extracting from the raw data the information which is most relevant for classification purposes, in the sense of minimizing the within-class variability while enhancing the between-class pattern variability • Extracted features must be invariant to the expected distortions and variations • Curse of dimensionality= if the training set is small, the number of features cannot be high either • Rule of thumb: number of training patterns = 10×(dim of feature vector)

Some issues • Do the characters have known orientation and size? • Are they handwritten, machine-printed or typed? • Degree of degradation? • If a character may be written in two ways (e.g. ‘a’ or ‘α’), it might be represented by two patterns

Variations of the same character Size invariance can be achieved by normalization, but norming can cause discontinuities in the character Rotation invariance is important if chaarcters may appear in any orientation (P or d ?) Skew invariance is important for hand-printed text or multifont machine-printed text

Features Extracted from Grayscale Images Goal: locate candidate characters. If the image is binarized, one may find the connected components of expected character size by a flood fill typealgorithm (4-way recursive method, 8-way recursive method, non-recursive scanline method etc., check http://www.codeproject.com/KB/GDI/QuickFill.aspx Then the bounding box is found. A grayscale method is typically used when recognition based on the binary representation fails. Then the localization may be difficult. Assuming that there is a standard size for a character, one may simply try all possible locations. In a good case, after localization one has a subimage containing one character and no other objects.

Template Matching(not often used in OCR systems for grayscale characters) • No feature extraction is used, the template character image itself is compared to the input character image:where the character Z and the template Tj are of the same size and summation is taken over all the M pixels of Z. The problem is to find j for which Dj is minimal; then Z is identified with Tj.

Limitations of Template Matching • Characters and templates must be of the same size • The method is not invariant to changes in illumination • Very vulnerable to noise In template matching, all pixels are used as templates. It is a better idea to use unitary (dfistance-preserving) transforms to character images, obtaining a reduction of features while preserving most of the informations of the character shape.

The Radon Transform The Radon transform computes projections of an image matrix along specified directions. A projection of a two-dimensional function f(x,y) is a set of line integrals. The Radon function computes the line integrals from multiple sources along parallel paths, or beams, in a certain direction. The beams are spaced 1 pixel unit apart. To represent an image, the radon function takes multiple, parallel-beam projections of the image from different angles by rotating the source around the center of the image. The following figure shows a single projection at a specified rotation angle.

Projections to Various Axes

Zoning Consider a candidate area (connected set) surrounded by a bounded box. Divide it to 5×5 equal parts and compute the average gray level in each part, yielding a 25-length feature vector.

Thinning • Thinning is possible both for grayscale and for binary images • Thinning= skeletonization of characters • Advantage: few features, easy to extract The informal definition of a skeleton is a line representation of an object that is: i) one-pixel thick, ii) through the "middle" of the object, and, iii) preserves the topology of the object.

When No Skeleton Exists • Impossible to egnerate a one-pixel width skeleton to be in the middle • No pixel can be left out while preserving the connectedness

Possible Defects • Specific defects of data may cause misrecognition • Small holes  loops in skeleton • Single element irregularities  false tails • Acute angles  false tails

How Thinning Works • Most thinning algorithms rely on the erosion of the boundary while maintaining connectivity,seehttp://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip-Morpholo.html for mathematical morphology • To avoid defects, preprocessing is desirable • As an example, in a black and white application • They remove very small holes • They remove black elements having less than 3 black neighbours and having connectivity 1

An Example of Noise Removal This pixel will be removed (N=1; has 1 black neighbour)

Generation of Feature Vectors Using Invariant Moments • Given a grayscale subimage Z containing a character candidate, the moments of order p+q are defined by where the sum is taken over all M pixels of the subimage. The translation-invariant central moments of order p+q are obtained by shifting the origin to the center of gravity: where

Hu’s (1962) Central Moments ηpq –s are scale invariant to scale Mi –s are rotation invariant

K-Nearest Neighbor Classification Example of k-NN classification. The test sample (green circle) should be classified either to the first class of blue squares or to the second class of red triangles. If k = 3 it is classified to the second class because there are 2 triangles and only 1 square inside the inner circle. If k = 5 it is classified to first class (3 squares vs. 2 triangles inside the outer circle). Disadvantage in practice: the distance of the green circle to all blue squares and to all red triangle shave to be computed, this may take much time

From now on we will deal with binary (black and white) images only

Projection Histograms • These methods are typically used for • segmenting characters, words and text lines • detecting if a scanned text page is rotated But they can also provide features for recognition, too! • Using the same number of bins on each axis – and dividing by the total number of pixels - the features can be made scale independent • Projection to the y-axis is slant invariant, but projection to the x-axis is not • Histograms are very sensitive to rotation

Comparision of Histograms It seems plausible to compare two histograms y1 and y2 (where n is the number of bins) in the following way: However, the dissimilarity using cumulative histograms is less sensitive to errors. Define the cumulative histogram Y as follows: For the cumulative histograms Y1 and Y2 define D as:

Zoning for Binary Characters 1 Contour extraction or thinning may be unusable for self-touching characters. This kind of error often occurs to degraded machine-printed texts (generations of photocopying  ) The self-touching problem may be healed by morphological opening.

Zoning for Binary Characters 2 Similarly to the grayscale case, we consider a candidate area (connected set) surrounded by a bounded box. Divide it to 5×5 equal parts and compute the number of black pixels in each part, yielding a 25-length feature vector.

Generation of Moments in the Binary Case Given a binary subimage Z containing a character candidate, the moments of order p+q are defined by where the sum is taken over all black pixels of the subimage The translation-invariant central moments of order p+q are obtained by shifting the origin to the center of gravity: where

The Central Moments can be used similarly to the grayscale case ηpq –s are scale invariant to scale Mi –s are rotation invariant

Contour Profiles The profiles may be outer profiles or inner profiles. To construct profiles, find the uppermost and lowermost pixels on the contour. The contour is split at these points. To obtain the outer profiles, for each y select the outermost x on each contour half. Profiles to the other axis can be constructed similarly.

Features Generated by Contour Profiles First differences of profiles: X’L=XL(y+1)-xL(y) Width: w(y)=xR(y)-xL(y) Height/maxy(w(y)) Location of minima and maxima of the profiles Location of peaksin the first differences (which may indicate discontinuities)

Zoning on Contour Curves 1 (Kimura & Sridhar) Enlarged zone A feature vector of size (4× 4) × 4 isgenerated

Zoning on Contour Curves 2 (Takahashi) Contour codes were extracted from inner contours (if any) as well as outer contours, the feature vector had dimension (4 ×6 ×6 ×6) ×4 ×(2) (size ×four directions × (inner and outer))

Zoning on Contour Curves 3 (Cao) When the contour curve is close to a zone border, small variations in the curve may lead to large variations in the feature vector Solution: Fuzzy border

Zoning of Skeletons Features: length of the character graph in each zone (9 or 3). By dividing the length with the total length of the graph, size independence can be achieved. Additional features: the presence or absence of junctions or endpoints

The Neural Network Approach for Digit Recognition • Le Cun et al: • Each character is scaled to a 16×16 grid • Three intermediate hidden layers • Training on a large set • Advantage: • feature extraction is automatic • Disadvantage: • We do not know how it works • The output set (here 0-19) is small

OCR a survey

OCR a survey

Presentation Transcript

OCR Textiles

OCR

OCR Statutes

OCR Psychology

OCR National

OCR Nationals

OCR Website Navigation: A Virtual Tour

OCR Results

OCR GEOGRAPHY A LEVEL H083

JSTOR & OCR - A Case Study

OCR Definition

A-level MIRACLES [OCR]

OCR Updates

Configure OCR Software on a Lexmark

A Level in Chemistry A (H432) – OCR

OCR Cambridge National

OCR A Level Science Chemistry

OCR Reading

OCR PDF

OCR Shirt

OCR a survey

OCR a survey

Presentation Transcript

OCR Textiles

OCR

OCR Statutes

OCR Psychology

OCR National

OCR Nationals

OCR Website Navigation: A Virtual Tour

OCR Results

OCR GEOGRAPHY A LEVEL H083

JSTOR &amp; OCR - A Case Study

OCR Definition

A-level MIRACLES [OCR]

OCR Updates

Configure OCR Software on a Lexmark

A Level in Chemistry A (H432) – OCR

OCR Cambridge National

OCR A Level Science Chemistry

OCR Reading

OCR PDF

OCR Shirt

JSTOR & OCR - A Case Study