1 / 19

Chinese Character Output

Chinese Character Output. Character 字符 : abstract object recognized by human in communication, it is the representation at the conceptual level. Control characters in computer internal code is not considered characters

gazit
Télécharger la présentation

Chinese Character Output

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chinese Character Output • Character字符:abstract object recognized by human in communication, it is the representation at the conceptual level. Control characters in computer internal code is not considered characters • Glyph字形:character in its concrete form without regards to thickness, style, size, and the computer internal representation(bitmap, outline, etc) • Font (font set)字體/字型庫: specific form of character with all computer internal representation attributes

  2. The three levels of representation External Representation Image Font 圖像 字型 外部表示 Rendering Document Description Glyph GID 字形 (Glyph ID) Association Internal Representation Character Code 字符 內部表示 Human perception

  3. Glyph Representation: Bitmaps • A matrix of 1s and 0s to represent a character • Typical monitor display a character using a 16 x 16 bitmap • Typical sizes and storage demand are shown • (not double size => quadruple storage) • Data compression(a lot of empty space)

  4. Usually store small bitmaps and scale up but there are problems with the quality of slanted edges • Linear scaling: from Old(xold, yold) to New(xnew, ynew), where 0 <= xold<= (WidthOLD -1), 0 <= yold<= (HeightOLD-1) and 0 <= xnew<= (WidthNEW -1), 0 <= ynew<= (HeightNEW -1) assuming Height and Width values are integers • rx= WidthNEW/WidthOLD , ry=HeightNEW /HeightOLD • If rx >1 and ry >1, then it is called scaling up • New(xnew, ynew) = New(x * rx, y* ry) = Old(x, y)

  5. Smoothing techniques for scaling • Ad Hoc Techniques (No underlying model but cheap): • Enlargement (Matrix manipulation) • Thresholding: convert into bitmap (assign 1 if >= 0.4 for unidirectional)

  6. Smoothing spline (齒形) and interpolation嵌入法(costly) • Basis: Character bitmaps are a coarse sample of the original character • Approach: Recover the curves of the character as continuous functions (cubic spline) and then interpolate or generate the bitmaps of another size • Optimization: Minimize the unsmoothing

  7. Bezier Curves • P(t) = (x(t), y(t)): any point in the curve(0<= t <= 1) • Cubic Bezier: 4 points • end points coincide with curve • other points control shape (can specify gradient at end points) • X(t) =X0*(1-t)3 + 3* X1*(1-t)2*t + 3*X2*(1-t) *t2 + X3*t3 • Y(t) =Y0*(1-t)3 + 3* Y1*(1-t)2*t + 3*Y2*(1-t) *t2 + Y3*t3

  8. Glyph Representation: Outline • Characters as shapes enclosed by lines or curves and specify these by parameters (i.e. data as an ASCII file and an interpreter to generate the graphic image) • Line specified by 2 points • Curve: (usually cubic Bezier) specified by 4 points • end points coincide with curve • other points control shape

  9. Advantages comparing to bitmaps: • Scaling does not affect quality (Major) • Does not need to store different sized fonts (a compression of extremely detailed/large fonts) • Compression (as in standard text) • Email transport without encoding and decoding • Example of a Postscript for the Chinese Character 一:

  10. Unit of measurements: 1 point = 1/72 of an inch and the coordinates starts at the bottom left corner and coordinate translation is needed. • Postscript level 1 font(base font) can handle only up to 256 characters in each set. • It maps 256 code into names of fonts in the set. • Postscript Level 0 fonts: Composite Font • Double byte encoding: • 1st byte: index to base font • 2nd byte: code in the particular base font

  11. CID-keyed fonts(pp 288) A technique to make character glyph definitions be independent of codeset. • Each character glyph is given a CID which uniquely defines a glyph shape. • A CMap is a file which contains mapping of character encodings with glyphs(CID). • A CIDFont file contains the pointers to the actual descriptions of the glyphs. A CIDFont file usually keeps character glyphs with the same style. • Other outline fonts include: TrueType fonts and OpenType. They different in the data structures/ header forms.

  12. Bitmap-to-Outline Conversion • Determine outline for all the straight lines • Generate curve list: a curve must begin and end in two different corner (therefore needs to find corners: compute an angle between two vector points along the outline) • Preprocessing for curve-fitting: knee removal, smooth filtering to yield finer co-ordinates of sample points. • Perform curve fitting: iterations try to improve fitting goodness (measured as the least square error) • End point alignment: close end points of two consecutive splines are merged by averaging their positions

  13. Getting outline pixels through erosion • Finding the outline of a bitmap is to find the pixel that is located inside an object, but that has at least one neighbour outside the object • Basic idea • Find the bitmap with its edge pixels removed:erosion( a smaller cross) • Original bitmap with the eroded bitmap removed.

  14. Need more mathematical terms and binary image operation • Translation:The displacement in either the x direction, the y direction or both at once. It is the reposition of the co-ordinate system. • Suppose B is a binary image, • Bxy means to move B by the coordinates(x,y). (x,y) Translated (0,0) origin

  15. Erosion of B(a bitmap): is a set of coordinates (x,y) such that S translated by (x,y), is contained in B. • E = B ⊕ S = {(x,y) | Sxy  B} • S(4 pixels of blacks): • Against • and their rotations • Returns all the points in B whose neighbors are not the boarder (edge) pixels.

  16. Outline pixels: • B - (B S)

More Related